FFmpeg/libavcodec/riscv
Rémi Denis-Courmont 3606e592ea lavc/h264dsp: R-V V 8-bit h264_weight_pixels
There are two implementations here:
- a generic scalable one processing two columns at a time,
- a specialised processing one (fixed-size) row at a time.

Unsurprisingly, the generic one works out better with smaller widths.
With larger widths, the gains from filling vectors are outweighed by
the extra cost of strided loads and stores. In other words, memory
accesses become the bottleneck.

T-Head C908:
h264_weight2_8_c:        54.5
h264_weight2_8_rvv_i32:  13.7
h264_weight4_8_c:       101.7
h264_weight4_8_rvv_i32:  27.5
h264_weight8_8_c:       197.0
h264_weight8_8_rvv_i32:  75.5
h264_weight16_8_c:      385.0
h264_weight16_8_rvv_i32: 74.2

SpacemiT X60:
h264_weight2_8_c:        48.5
h264_weight2_8_rvv_i32:   8.2
h264_weight4_8_c:        90.7
h264_weight4_8_rvv_i32:  16.5
h264_weight8_8_c:       175.0
h264_weight8_8_rvv_i32:  37.7
h264_weight16_8_c:      342.2
h264_weight16_8_rvv_i32: 66.0
2024-07-09 18:03:29 +03:00
..
aacencdsp_init.c lavc/aacencdsp: R-V V quant_bands 2024-06-03 22:43:37 +03:00
aacencdsp_rvv.S lavc/aacencdsp: fix rounding in R-V V quantize_bands 2024-06-08 18:30:43 +03:00
aacpsdsp_init.c
aacpsdsp_rvv.S
ac3dsp_init.c
ac3dsp_rvb.S
ac3dsp_rvv.S
ac3dsp_rvvb.S
alacdsp_init.c
alacdsp_rvv.S
audiodsp_init.c
audiodsp_rvf.S
audiodsp_rvv.S
blockdsp_init.c lavc/riscv: use ff_rv_vlen_least() 2024-05-13 18:36:07 +03:00
blockdsp_rvv.S
bswapdsp_init.c
bswapdsp_rvb.S
bswapdsp_rvv.S
cpu_common.c riscv: probe for Zbb extension at load time 2024-06-11 20:12:37 +03:00
exrdsp_init.c
exrdsp_rvv.S
flacdsp_init.c lavc/flacdsp: R-V Zvl256b lpc33 2024-05-27 22:07:29 +03:00
flacdsp_rvv.S lavc/flacdsp: fix sign extension in R-V V wasted33 2024-06-07 17:53:05 +03:00
fmtconvert_init.c
fmtconvert_rvv.S
g722dsp_init.c lavc/riscv: use ff_rv_vlen_least() 2024-05-13 18:36:07 +03:00
g722dsp_rvv.S
h263dsp_init.c lavc/h263dsp: R-V V {h,v}_loop_filter 2024-05-22 19:15:39 +03:00
h263dsp_rvv.S lavc/h263dsp: R-V V {h,v}_loop_filter 2024-05-22 19:15:39 +03:00
h264_chroma_init_riscv.c lavc/riscv: use ff_rv_vlen_least() 2024-05-13 18:36:07 +03:00
h264_mc_chroma.S
h264dsp_init.c lavc/h264dsp: R-V V 8-bit h264_weight_pixels 2024-07-09 18:03:29 +03:00
h264dsp_rvv.S lavc/h264dsp: R-V V 8-bit h264_weight_pixels 2024-07-09 18:03:29 +03:00
h264idct_rvv.S lavc/h264dsp: R-V V 8-bit h264_idct8_add 2024-07-07 09:34:32 +03:00
huffyuvdsp_init.c lavc/huffyuvdsp: optimise RVV vtype for add_hfyu_left_pred_bgr32 2024-05-19 18:37:33 +03:00
huffyuvdsp_rvv.S lavc/huffyuvdsp: optimise RVV vtype for add_hfyu_left_pred_bgr32 2024-05-19 18:37:33 +03:00
idctdsp_init.c lavc/riscv: use ff_rv_vlen_least() 2024-05-13 18:36:07 +03:00
idctdsp_rvv.S
jpeg2000dsp_init.c
jpeg2000dsp_rvv.S
llauddsp_init.c
llauddsp_rvv.S
llviddsp_init.c
llviddsp_rvv.S
llvidencdsp_init.c
llvidencdsp_rvv.S
lpc_init.c lavc/lpc: optimise RVV vector type for compute_autocorr 2024-05-29 16:57:02 +03:00
lpc_rvv.S riscv: allow passing addend to vtype_vli macro 2024-05-30 18:30:52 +03:00
Makefile lavc/h264dsp: R-V V 8-bit h264_idct_add16 2024-07-05 18:56:02 +03:00
me_cmp_init.c lavc/riscv: use ff_rv_vlen_least() 2024-05-13 18:36:07 +03:00
me_cmp_rvv.S
opusdsp_init.c
opusdsp_rvv.S
pixblockdsp_init.c lavc/pixblockdsp: add scalar get_pixels_unaligned 2024-05-24 17:53:43 +03:00
pixblockdsp_rvi.S
pixblockdsp_rvv.S
rv34dsp_init.c lavc/riscv: use ff_rv_vlen_least() 2024-05-13 18:36:07 +03:00
rv34dsp_rvv.S lavc/rv34dsp: remove stray load immediate 2024-05-26 19:20:45 +03:00
rv40dsp_init.c lavc/riscv: use ff_rv_vlen_least() 2024-05-13 18:36:07 +03:00
rv40dsp_rvv.S
sbrdsp_init.c lavc/sbrdsp: add support for 256-bit vectors 2024-05-31 22:22:43 +03:00
sbrdsp_rvv.S lavc/sbrdsp: fold immediate offset into relocation 2024-05-28 19:44:11 +03:00
startcode_rvb.S lavc/startcode: add R-V Zbb startcode_find_candidate 2024-05-19 10:03:49 +03:00
startcode_rvv.S lavc/startcode: fix RVV return value on no match 2024-05-28 19:43:40 +03:00
svqenc_init.c
svqenc_rvv.S
takdsp_init.c
takdsp_rvv.S
utvideodsp_init.c
utvideodsp_rvv.S
vc1dsp_init.c lavc/vc1dsp: R-V V vc1_inv_trans_4x4 2024-06-07 17:53:05 +03:00
vc1dsp_rvi.S lavc/vc1dsp: R-V V mspel_pixels 2024-05-16 17:08:18 +03:00
vc1dsp_rvv.S lavc/vc1dsp: fuse multiply-adds in R-V V inv_trans_8 2024-07-03 18:16:36 +03:00
vorbisdsp_init.c
vorbisdsp_rvv.S
vp7dsp_init.c lavc/vp7dsp: add R-V V vp7_idct_dc_add4uv 2024-06-04 17:42:07 +03:00
vp7dsp_rvv.S lavc/vp7dsp: add R-V V vp7_idct_dc_add4uv 2024-06-04 17:42:07 +03:00
vp8dsp.h
vp8dsp_init.c lavc/vp8dsp: R-V V vp8_idct_add 2024-06-08 18:30:43 +03:00
vp8dsp_rvi.S
vp8dsp_rvv.S lavc/vp8dsp: R-V V bilin_load to bilin_load_h 2024-06-12 18:38:41 +03:00
vp9_intra_rvi.S lavc/vp9dsp: R-V ipred vert 2024-05-15 19:52:25 +03:00
vp9_intra_rvv.S lavc/vp9_intra: fix another .irp use with LLVM as 2024-05-19 10:22:46 +03:00
vp9_mc_rvi.S lavc/vp9dsp: R-V mc copy 2024-05-15 19:52:28 +03:00
vp9_mc_rvv.S lavc/vp9dsp: R-V V rename ff_avg to ff_vp9_avg 2024-05-30 18:30:52 +03:00
vp9dsp.h lavc/vp9dsp: R-V V rename ff_avg to ff_vp9_avg 2024-05-30 18:30:52 +03:00
vp9dsp_init.c lavc/vp9dsp: R-V V rename ff_avg to ff_vp9_avg 2024-05-30 18:30:52 +03:00