FFmpeg/libavcodec/x86
Christophe Gisquet 9630b3fc06 x86: lossless audio: SSE4 madd 32bits
The unique user so far is wmalossless 24bits. The few samples tested show an
order of 8, so more unrolling or an avx2 version do not make sense.

Timings: 68 -> 49 cycles

Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2016-05-07 23:28:48 +02:00
..
aacpsdsp.asm x86inc: Drop SECTION_TEXT macro 2015-08-04 20:13:09 +02:00
aacpsdsp_init.c x86/aacpsdsp: add SSE and SSE3 optimized functions 2015-07-30 19:01:15 -03:00
ac3dsp.asm
ac3dsp_init.c Merge commit '4f22b13888' 2016-01-27 18:23:31 +00:00
alacdsp.asm x86/alacdsp: add simd optimized functions 2015-10-06 20:22:00 -03:00
alacdsp_init.c x86/alacdsp: add simd optimized functions 2015-10-06 20:22:00 -03:00
audiodsp.asm x86inc: Drop SECTION_TEXT macro 2015-08-11 11:12:01 +02:00
audiodsp_init.c
blockdsp.asm x86inc: Drop SECTION_TEXT macro 2015-08-04 20:13:09 +02:00
blockdsp_init.c blockdsp: reindent after parameter removal 2015-10-03 23:34:56 +02:00
bswapdsp.asm x86inc: Drop SECTION_TEXT macro 2015-08-11 11:12:01 +02:00
bswapdsp_init.c
cabac.h
cavsdsp.c avcodec/x86/cavsdsp: silence -Wunused-variable on --disable-mmx 2015-09-24 04:27:50 +02:00
constants.c avcodec/v210: add avx2 version of the 10-bit line encoder 2016-01-17 16:03:43 +01:00
constants.h avcodec/v210: add avx2 version of the 10-bit line encoder 2016-01-17 16:03:43 +01:00
dcadsp.asm x86/dcadec: add ff_lfe_fir1_float_{sse3,avx} 2016-02-22 21:21:34 -03:00
dcadsp_init.c x86/dcadec: add ff_lfe_fir1_float_{sse3,avx} 2016-02-22 21:21:34 -03:00
dct-test.c x86: dct-test: add more idcts 2015-10-13 16:03:04 +02:00
dct32.asm x86inc: Drop SECTION_TEXT macro 2015-08-11 11:12:01 +02:00
dct_init.c Merge commit 'ebaf571aca' 2015-08-02 12:31:39 +02:00
dirac_dwt.asm dirac_dwt: Make x86 files/functions names consistent 2016-02-05 19:30:23 -08:00
dirac_dwt_init.c dirac_dwt: Make x86 files/functions names consistent 2016-02-05 19:30:23 -08:00
diracdsp.asm diracdsp: Make x86 files/functions names consistent 2016-02-05 19:29:43 -08:00
diracdsp_init.c diracdsp: Make x86 files/functions names consistent 2016-02-05 19:29:43 -08:00
dnxhdenc.asm
dnxhdenc_init.c
fdct.c
fdct.h
fdctdsp_init.c
fft.asm avcodec: Extend fft to size 2^17 2016-03-04 13:51:42 +01:00
fft.h
fft_init.c Merge commit '73ff983e8d' 2016-04-12 15:42:21 +01:00
flac_dsp_gpl.asm x86inc: Drop SECTION_TEXT macro 2015-08-04 20:13:09 +02:00
flacdsp.asm x86: move XOP emulation code back to x86inc 2015-08-03 17:11:13 -03:00
flacdsp_init.c
fmtconvert.asm avcodec/x86/fmtconvert: Add emms to int32_to_float_fmul_array8_sse() 2016-01-15 17:08:37 +01:00
fmtconvert_init.c
fpel.asm
fpel.h x86: fpel: Remove erroneous ff_put_pixels8_mmxext prototype 2015-10-19 16:52:37 -07:00
g722dsp.asm x86inc: Drop SECTION_TEXT macro 2015-08-04 20:13:09 +02:00
g722dsp_init.c
h263_loopfilter.asm x86inc: Drop SECTION_TEXT macro 2015-08-11 11:12:01 +02:00
h263dsp_init.c
h264_chromamc.asm
h264_chromamc_10bit.asm
h264_deblock.asm avcodec/h264: Fix segfault in 4:2:2 chroma deblock with 32-bit msvc 2016-02-05 22:01:38 +01:00
h264_deblock_10bit.asm
h264_i386.h
h264_idct.asm
h264_idct_10bit.asm vp9: 16bpp tm/dc/h/v intra pred simd (mostly sse2) functions. 2015-10-03 14:42:39 -04:00
h264_intrapred.asm
h264_intrapred_10bit.asm vp9: 16bpp tm/dc/h/v intra pred simd (mostly sse2) functions. 2015-10-03 14:42:39 -04:00
h264_intrapred_init.c
h264_qpel.c x86: fpel: Move prototypes for 4-px block functions 2015-10-19 16:52:33 -07:00
h264_qpel_8bit.asm
h264_qpel_10bit.asm vp9: 10/12bpp SIMD (sse2/ssse3/avx) for directional intra prediction. 2015-10-03 14:42:39 -04:00
h264_weight.asm avcodec/x86: add missing colon to labels 2015-07-26 02:50:14 -03:00
h264_weight_10bit.asm
h264chroma_init.c
h264dsp_init.c avcodec/h264: mmxext 4:2:2 chroma deblock/loop filter 2016-02-05 17:26:04 +01:00
hevc_deblock.asm
hevc_idct.asm x86inc: Drop SECTION_TEXT macro 2015-08-04 20:13:09 +02:00
hevc_mc.asm hevcdsp: use a macro for .rodata section 2015-12-11 16:19:30 +01:00
hevc_res_add.asm
hevc_sao.asm x86/hevc_sao: move 10/12bit functions into a separate file 2015-09-30 02:59:55 -03:00
hevc_sao_10bit.asm x86/hevc_sao: add ff_hevc_sao_edge_filter_{8,16}_{10,12} 2015-12-20 17:01:15 -03:00
hevcdsp.h
hevcdsp_init.c x86: hevc: Fix linking with both yasm and optimizations disabled 2016-02-23 11:47:54 +01:00
hpeldsp.asm x86inc: Drop SECTION_TEXT macro 2015-08-11 11:12:01 +02:00
hpeldsp.h
hpeldsp_init.c avcodec/x86/hpeldsp_init: silence -Wunused-function on --disable-mmx 2015-09-19 23:10:52 +02:00
hpeldsp_rnd_template.c avcodec/x86/hpeldsp_rnd_template: silence -Wunused-function on --disable-mmx 2015-10-03 14:24:41 +02:00
huffyuvdsp.asm x86inc: Drop SECTION_TEXT macro 2015-08-11 11:12:01 +02:00
huffyuvdsp_init.c
huffyuvencdsp.asm huffyuvencdsp: Undefine "i" macro after each use 2016-02-07 09:19:17 -08:00
huffyuvencdsp_mmx.c x86: use the new helper macros where useful 2016-02-14 20:00:21 -03:00
idctdsp.asm x86inc: Drop SECTION_TEXT macro 2015-08-04 20:13:09 +02:00
idctdsp.h
idctdsp_init.c x86: simple_idct: 12bits versions 2015-10-13 15:34:32 +02:00
imdct36.asm x86/imdct36: use extractps inside the STORE macro 2016-01-28 13:35:15 -03:00
inline_asm.h
jpeg2000dsp.asm avcodec/x86: add missing colon to labels 2015-07-26 02:50:14 -03:00
jpeg2000dsp_init.c x86: use the new helper macros where useful 2016-02-14 20:00:21 -03:00
lossless_audiodsp.asm x86: lossless audio: SSE4 madd 32bits 2016-05-07 23:28:48 +02:00
lossless_audiodsp_init.c x86: lossless audio: SSE4 madd 32bits 2016-05-07 23:28:48 +02:00
lossless_videodsp.asm x86inc: Drop SECTION_TEXT macro 2015-08-04 20:13:09 +02:00
lossless_videodsp_init.c Replace all remaining occurances of step/depth_minus1 and offset_plus1 2015-09-08 17:10:48 +02:00
lpc.c
Makefile x86/vc1dsp: Split the file into MC and loopfilter 2016-02-29 08:46:53 -08:00
mathops.h
me_cmp.asm avcodec/x86: add missing colon to labels 2015-07-26 02:50:14 -03:00
me_cmp_init.c Merge commit '7c6eb0a1b7' 2015-07-27 22:10:35 +02:00
mlpdsp.asm x86inc: Drop SECTION_TEXT macro 2015-08-04 20:13:09 +02:00
mlpdsp_init.c x86: use the new helper macros where useful 2016-02-14 20:00:21 -03:00
mpegaudiodsp.c avcodec/x86/mpegaudiodsp: silence -Wunused-variable on --disable-mmx 2015-09-22 23:45:03 +02:00
mpegvideo.c avcodec/mpeg12enc: Basic support for encoding non even QPs for -non_linear_quant 1 2015-09-18 02:52:57 +02:00
mpegvideodsp.c
mpegvideoenc.c avcodec/x86/mpegvideoenc: silence -Wunused-function on --disable-mmx 2015-09-19 23:26:57 +02:00
mpegvideoenc_qns_template.c
mpegvideoenc_template.c Merge commit '5d14cf1999' 2015-09-16 11:23:40 +02:00
mpegvideoencdsp.asm
mpegvideoencdsp_init.c Merge commit '7c6eb0a1b7' 2015-07-27 22:10:35 +02:00
pixblockdsp.asm pixblockdsp: x86: Condense diff_pixels_* to a shared macro 2015-11-07 14:31:34 -08:00
pixblockdsp_init.c
pngdsp.asm x86inc: Drop SECTION_TEXT macro 2015-08-11 11:12:01 +02:00
pngdsp_init.c
proresdsp.asm x86inc: Add debug symbols indicating sizes of compiled functions 2016-01-23 20:46:28 +01:00
proresdsp_init.c
qpel.asm
qpeldsp.asm x86inc: Drop SECTION_TEXT macro 2015-08-11 11:12:01 +02:00
qpeldsp_init.c
rnd_template.c avcodec/x86/rnd_template: silence -Wunused-function on --disable-mmx 2015-09-29 19:37:26 +02:00
rv34dsp.asm
rv34dsp_init.c
rv40dsp.asm
rv40dsp_init.c all: fix -Wextra-semi reported on clang 2015-10-24 17:58:17 -04:00
sbrdsp.asm avcodec/x86/sbrdsp: Fix using uninitialized upper 32bit of noise 2015-09-29 13:23:25 +02:00
sbrdsp_init.c
simple_idct.c
simple_idct.h x86: simple_idct: 12bits versions 2015-10-13 15:34:32 +02:00
simple_idct10.asm x86inc: Add debug symbols indicating sizes of compiled functions 2016-01-21 23:19:46 +01:00
simple_idct10_template.asm x86: simple_idct10_template: use const 2015-10-13 22:52:33 +02:00
snowdsp.c
svq1enc.asm x86inc: Drop SECTION_TEXT macro 2015-08-04 20:13:09 +02:00
svq1enc_init.c
synth_filter.asm avcodec/synth_filter: split off remaining code from dcadec files 2016-01-25 14:57:38 -03:00
synth_filter_init.c x86: use the new helper macros where useful 2016-02-14 20:00:21 -03:00
takdsp.asm x86/takdsp: use arithmetic shift instructions 2015-10-09 23:52:39 -03:00
takdsp_init.c avcodec/takdec: add x86 SIMD for rest of decorrelation modes 2015-10-09 21:38:15 +02:00
ttadsp.asm
ttadsp_init.c
v210-init.c avcodec/x86/v210-init: fix unused variable warning 2015-08-21 17:06:27 +02:00
v210.asm avcodec/x86: add missing colon to labels 2015-07-26 02:50:14 -03:00
v210enc.asm Merge commit 'eafb05fcf3' 2016-02-16 17:02:56 +00:00
v210enc_init.c Merge commit 'e280fe1329' 2016-02-16 17:23:32 +00:00
vc1dsp.h
vc1dsp_init.c x86: vc1dsp: Convert vc1_inv_trans_*_dc to NASM format 2016-02-01 17:01:11 -08:00
vc1dsp_loopfilter.asm x86/vc1dsp: Split the file into MC and loopfilter 2016-02-29 08:46:53 -08:00
vc1dsp_mc.asm x86/vc1dsp: Split the file into MC and loopfilter 2016-02-29 08:46:53 -08:00
vc1dsp_mmx.c x86/vc1dsp: Port vc1_*_hor_16b_shift2 to NASM format 2016-02-14 11:11:02 -08:00
videodsp.asm videodsp: fix 1-byte overread in top/bottom READ_NUM_BYTES iterations. 2016-01-18 11:12:47 -05:00
videodsp_init.c
vorbisdsp.asm
vorbisdsp_init.c
vp3dsp.asm avcodec/x86: add missing colon to labels 2015-07-26 02:50:14 -03:00
vp3dsp_init.c Merge commit '7c6eb0a1b7' 2015-07-27 22:10:35 +02:00
vp6dsp.asm
vp6dsp_init.c
vp8dsp.asm
vp8dsp_init.c
vp8dsp_loopfilter.asm
vp9dsp_init.c x86: use the new helper macros where useful 2016-02-14 20:00:21 -03:00
vp9dsp_init.h all: fix -Wextra-semi reported on clang 2015-10-24 17:58:17 -04:00
vp9dsp_init_10bpp.c vp9: add subpel MC SIMD for 10/12bpp. 2015-09-16 21:11:34 -04:00
vp9dsp_init_12bpp.c vp9: add subpel MC SIMD for 10/12bpp. 2015-09-16 21:11:34 -04:00
vp9dsp_init_16bpp.c x86: use the new helper macros where useful 2016-02-14 20:00:21 -03:00
vp9dsp_init_16bpp_template.c x86: use the new helper macros where useful 2016-02-14 20:00:21 -03:00
vp9intrapred.asm
vp9intrapred_16bpp.asm vp9: don't keep a stack pointer if we don't need it. 2015-10-07 08:55:19 -04:00
vp9itxfm.asm vp9: refactor itx coefficients and share between 8 and 10/12bpp. 2015-10-13 11:06:01 -04:00
vp9itxfm_16bpp.asm x86/vp9itxfm: fix register clobbering in ff_vp9_idct_idct_4x4_add_12_sse2 2015-10-13 20:21:33 -03:00
vp9itxfm_template.asm vp9: add x86 simd (sse2/ssse3) for iadst4 10bpp functions. 2015-10-13 11:05:58 -04:00
vp9lpf.asm
vp9lpf_16bpp.asm vp9: sse2/ssse3/avx 16bpp loopfilter x86 simd. 2015-10-03 14:42:39 -04:00
vp9mc.asm x86/vp9mc: fix string concatenation of fullpel function names 2015-09-20 12:32:27 -03:00
vp9mc_16bpp.asm vp9: sse2/ssse3/avx 16bpp loopfilter x86 simd. 2015-10-03 14:42:39 -04:00
vp56_arith.h
w64xmmtest.c avcodec/x86/w64xmmtest: Fix another build failure 2015-09-05 22:15:53 +02:00
xvididct.asm
xvididct.h
xvididct_init.c