FFmpeg

History

Krzysztof Pyrkosz 9fb97215df avcodec/aarch64/opusdsp_neon: Simplify opus_postfilter_neon This change removes one extra floating point operation and simplifies load operations at the beginning of the loop by using dedicated register for each of the 5 pointers and interleaving it with calculations. The first case seems to be a bit slower, but the performance increase is substantial in the other two. A78 before: postfilter_15_neon: 1684.8 ( 4.23x) postfilter_512_neon: 1395.5 ( 5.10x) postfilter_1022_neon: 1357.0 ( 5.25x) After: postfilter_15_neon: 1742.2 ( 4.09x) postfilter_512_neon: 1169.8 ( 6.09x) postfilter_1022_neon: 1160.0 ( 6.12x) A72 before: postfilter_15_neon: 3144.8 ( 2.39x) postfilter_512_neon: 3141.2 ( 2.39x) postfilter_1022_neon: 3230.0 ( 2.33x) After: postfilter_15_neon: 2847.8 ( 2.64x) postfilter_512_neon: 2877.8 ( 2.61x) postfilter_1022_neon: 2837.2 ( 2.65x) x13s before: postfilter_15_neon: 1615.4 ( 2.61x) postfilter_512_neon: 963.1 ( 4.39x) postfilter_1022_neon: 963.6 ( 4.39x) After: postfilter_15_neon: 1749.6 ( 2.41x) postfilter_512_neon: 707.1 ( 5.97x) postfilter_1022_neon: 706.1 ( 5.99x) Signed-off-by: Martin Storsjö <martin@martin.st>		2025-02-10 14:55:16 +02:00
..
h26x	aarch64: h26x: Fix the indentation of one function	2024-09-26 13:42:11 +03:00
vvc	aarch64/vvc: Add apply_bdof	2024-12-21 11:54:44 +08:00
aacencdsp_init.c	avcodec/aarch64/aacencdsp: NEON implementation	2025-01-28 10:44:40 +02:00
aacencdsp_neon.S	avcodec/aarch64/aacencdsp: NEON implementation	2025-01-28 10:44:40 +02:00
aacpsdsp_init_aarch64.c
aacpsdsp_neon.S	aarch64: Reindent all assembly to 8/24 column indentation	2023-10-21 23:25:54 +03:00
ac3dsp_init_aarch64.c	avcodec/ac3: Implement sum_square_butterfly_float for aarch64 NEON	2024-04-08 13:36:40 +03:00
ac3dsp_neon.S	aarch64/ac3dsp: simplify the end of ff_ac3_sum_square_butterfly_float_neon	2024-04-09 16:50:49 +02:00
cabac.h
fdct.h	lavc/aarch64/fdct: add neon-optimized fdct for aarch64	2024-05-13 14:54:10 +02:00
fdctdsp_init_aarch64.c	lavc/aarch64/fdct: add neon-optimized fdct for aarch64	2024-05-13 14:54:10 +02:00
fdctdsp_neon.S	lavc/aarch64/fdct: add neon-optimized fdct for aarch64	2024-05-13 14:54:10 +02:00
fmtconvert_init.c
fmtconvert_neon.S
h264chroma_init_aarch64.c
h264cmc_neon.S	aarch64: Lowercase UXTW/SXTW and similar flags	2023-10-21 23:25:23 +03:00
h264dsp_init_aarch64.c
h264dsp_neon.S	aarch64: Make the indentation more consistent	2023-10-21 23:25:29 +03:00
h264idct_neon.S	aarch64: Lowercase UXTW/SXTW and similar flags	2023-10-21 23:25:23 +03:00
h264pred_init.c
h264pred_neon.S	lavc/aarch64: Fix ff_pred16x16_plane_neon_10	2024-12-17 14:50:29 +02:00
h264qpel_init_aarch64.c	lavc/aarch64: h264qpel, add 10-bit lowpass_8_10 based functions	2023-12-07 23:20:14 +02:00
h264qpel_neon.S	lavc/aarch64: h264qpel, add 10-bit lowpass_8_10 based functions	2023-12-07 23:20:14 +02:00
hevcdsp_deblock_neon.S	avcodec/aarch64/hevc: add luma deblock NEON	2024-02-28 10:14:58 +01:00
hevcdsp_idct_neon.S	aarch64: Make the indentation more consistent	2023-10-21 23:25:29 +03:00
hevcdsp_init_aarch64.c	aarch64/hevc: Move epel/qpel to h26x directory	2024-09-14 16:36:34 +08:00
hpeldsp_init_aarch64.c
hpeldsp_neon.S	aarch64: Consistently use lowercase for vector element specifiers	2023-10-21 23:25:18 +03:00
idct.h
idctdsp_init_aarch64.c	lavc/aarch64: fix include for cpu.h	2024-05-13 14:50:38 +02:00
idctdsp_neon.S
Makefile	avcodec/aarch64/aacencdsp: NEON implementation	2025-01-28 10:44:40 +02:00
me_cmp_init_aarch64.c	avcodec/aarch64/me_cmp: add dotprod implementations of sse16 and vsse_intra16	2024-08-17 15:31:48 +02:00
me_cmp_neon.S	avcodec/aarch64/me_cmp: add dotprod implementations of sse16 and vsse_intra16	2024-08-17 15:31:48 +02:00
mpegaudiodsp_init.c
mpegaudiodsp_neon.S	lavc/hevcdsp_qpel_neon: using movi.16b instead of movi.2d	2023-11-28 15:54:49 +02:00
mpegvideoencdsp_init.c	avcodec/mpegvideoencdsp: convert stride parameters from int to ptrdiff_t	2024-09-01 13:42:30 +02:00
mpegvideoencdsp_neon.S	avcodec/mpegvideoencdsp: convert stride parameters from int to ptrdiff_t	2024-09-01 13:42:30 +02:00
neon.S	aarch64: Consistently use lowercase for vector element specifiers	2023-10-21 23:25:18 +03:00
neontest.c
opusdsp_init.c	lavc/opus*: move to opus/ subdir	2024-09-02 11:56:53 +02:00
opusdsp_neon.S	avcodec/aarch64/opusdsp_neon: Simplify opus_postfilter_neon	2025-02-10 14:55:16 +02:00
pixblockdsp_init_aarch64.c
pixblockdsp_neon.S
rv40dsp_init_aarch64.c
sbrdsp_init_aarch64.c
sbrdsp_neon.S	aarch64: Consistently use lowercase for vector element specifiers	2023-10-21 23:25:18 +03:00
simple_idct_neon.S	aarch64: Consistently use lowercase for vector element specifiers	2023-10-21 23:25:18 +03:00
synth_filter_init.c	avcodec: Remove DCT, FFT, MDCT and RDFT	2023-10-01 02:25:09 +02:00
synth_filter_neon.S	avcodec: Remove DCT, FFT, MDCT and RDFT	2023-10-01 02:25:09 +02:00
vc1dsp_init_aarch64.c
vc1dsp_neon.S
videodsp.S
videodsp_init.c
vorbisdsp_init.c
vorbisdsp_neon.S
vp8dsp.h
vp8dsp_init_aarch64.c
vp8dsp_neon.S	aarch64: Make the indentation more consistent	2023-10-21 23:25:29 +03:00
vp9dsp_init.h
vp9dsp_init_10bpp_aarch64.c
vp9dsp_init_12bpp_aarch64.c
vp9dsp_init_16bpp_aarch64_template.c
vp9dsp_init_aarch64.c
vp9itxfm_16bpp_neon.S
vp9itxfm_neon.S
vp9lpf_16bpp_neon.S
vp9lpf_neon.S
vp9mc_16bpp_neon.S
vp9mc_aarch64.S
vp9mc_neon.S	aarch64: vp9mc: Load only 12 pixels in the 4 pixel wide horizontal filter	2025-01-03 17:53:46 -05:00