James Almer
6171f178e7
x86/hevc_add_res: merge last remaining changes from 3d65359832
...
See https://lists.libav.org/pipermail/libav-devel/2016-October/079829.html
2017-03-31 20:49:45 -03:00
Clément Bœsch
1ea0df14c3
Merge commit ' 0361e4dcb4'
...
* commit '0361e4dcb4 ':
h264_qpel: x86: Move function with only one instance out of template macro
Note: warning is present with clang.
Merged-by: Clément Bœsch <cboesch@gopro.com>
2017-03-31 09:44:04 +02:00
Ronald S. Bultje
f8c019944d
vp9: re-split the decoder/format/dsp interface header files.
...
The advantage here is that the internal software decoder interface is
not exposed to the DSP functions or the hardware accelerations.
2017-03-28 18:04:26 -04:00
Clément Bœsch
1c9f4b5078
lavc/vp9: split into vp9{block,data,mvs}
...
This is following Libav layout to ease merges.
2017-03-27 21:38:21 +02:00
Michael Niedermayer
73fb40dc87
avcodec/x86/idctdsp: Remove duplicate include
...
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-03-26 19:17:30 +02:00
James Almer
ac42f08099
x86/hevc_add_res: merge missing changes from 3d65359832
...
Unrolling the loops triplicates the size of the assembled output
while not generating any gain in performance.
2017-03-24 11:24:18 -03:00
Clément Bœsch
3d65359832
Merge commit ' 6d5636ad9a'
...
* commit '6d5636ad9a ':
hevc: x86: Add add_residual() SIMD optimizations
See a6af4bf64d
This merge is only cosmetics (renames, space shuffling, etc).
The functionnal changes in the ASM are *not* merged:
- unrolling with %rep is kept
- ADD_RES_MMX_4_8 is left untouched: this needs investigation
Merged-by: Clément Bœsch <u@pkh.me>
2017-03-24 12:33:25 +01:00
Clément Bœsch
40ac226014
lavc/x86/hevc: rename hevc_res_add to hevc_add_res
...
This will simplify incoming merge.
2017-03-24 11:45:23 +01:00
James Almer
bac44a5020
Merge commit ' b89804da9b'
...
* commit 'b89804da9b ':
x86: videodsp: Add parentheses to expression to work around warning
Merged-by: James Almer <jamrial@gmail.com>
2017-03-23 18:35:49 -03:00
James Almer
29db87af52
Merge commit ' 6be7944ee2'
...
* commit '6be7944ee2 ':
x86: Add missing colons after assembly labels
Merged-by: James Almer <jamrial@gmail.com>
2017-03-23 18:05:27 -03:00
Clément Bœsch
947230837c
Merge commit ' 112cee0241'
...
* commit '112cee0241 ':
hevc: Add SSE2 and AVX IDCT
Merged-by: Clément Bœsch <u@pkh.me>
2017-03-23 15:58:46 +01:00
Clément Bœsch
733b13ad66
Merge commit ' e4128c08d7'
...
* commit 'e4128c08d7 ':
Revert "hevc: x86: Refactor IDCT macro declarations"
So apparently this was technically correct be reverted due to
authorship. Reverted as well in FFmpeg for now...
See http://lists.libav.org/pipermail/libav-devel/2016-October/079560.html
Merged-by: Clément Bœsch <u@pkh.me>
2017-03-23 12:03:25 +01:00
Clément Bœsch
4bb4fa28e3
Merge commit ' 5801f9ed24'
...
* commit '5801f9ed24 ':
h264_intrapred: x86: Update comments left behind in 95c89da36e
Merged-by: Clément Bœsch <u@pkh.me>
2017-03-23 11:58:01 +01:00
Clément Bœsch
9954d5b44e
Merge commit ' d9dccc0389'
...
* commit 'd9dccc0389 ':
hevc: x86: Refactor IDCT macro declarations
Merged-by: Clément Bœsch <u@pkh.me>
2017-03-23 11:54:53 +01:00
James Almer
30cadfe071
avcodec/lossless_videodsp: use ptrdiff_t for length parameters
...
Signed-off-by: James Almer <jamrial@gmail.com>
2017-03-22 18:38:35 -03:00
Clément Bœsch
af607b7e07
lavc/huffyuvdsp: only transmit the pix_fmt instead of the whole avctx
...
Only the pixel format is required in that init function. This will also
simplify the incoming merge.
2017-03-22 16:22:20 +01:00
Clément Bœsch
c66bd8f3ff
Merge commit ' b57e38f52c'
...
* commit 'b57e38f52c ':
ac3dsp: x86: Replace inline asm for in-decoder downmixing with standalone asm
Merged-by: Clément Bœsch <u@pkh.me>
2017-03-22 12:49:29 +01:00
Clément Bœsch
e39d4ff150
Merge commit ' 43717469f9'
...
* commit '43717469f9 ':
ac3dsp: Reverse matrix in/out order in downmix()
Merged-by: Clément Bœsch <u@pkh.me>
2017-03-22 11:29:46 +01:00
James Almer
aee046a895
x86/audiodsp: remove an unnecessary movss
2017-03-22 00:14:56 -03:00
James Almer
9a0fbb9ca9
Merge commit ' 2caa93b813'
...
* commit '2caa93b813 ':
mpegaudiodsp: Change type of array stride parameters to ptrdiff_t
Merged-by: James Almer <jamrial@gmail.com>
2017-03-21 16:04:22 -03:00
James Almer
a8474df944
Merge commit ' e4a94d8b36'
...
* commit 'e4a94d8b36 ':
h264chroma: Change type of stride parameters to ptrdiff_t
Merged-by: James Almer <jamrial@gmail.com>
2017-03-21 15:20:45 -03:00
James Almer
5a49097b42
Merge commit ' 2ec9fa5ec6'
...
* commit '2ec9fa5ec6 ':
idct: Change type of array stride parameters to ptrdiff_t
Merged-by: James Almer <jamrial@gmail.com>
2017-03-21 14:29:52 -03:00
Clément Bœsch
f54da138e9
Merge commit ' 009adfd4fb'
...
* commit '009adfd4fb ':
x86: fpel: Remove unnecessary sign extend
Merged-by: Clément Bœsch <u@pkh.me>
2017-03-21 15:02:31 +01:00
Clément Bœsch
ad98af27f7
Merge commit ' de2ae3c1fa'
...
* commit 'de2ae3c1fa ':
lavc: add clobber tests for the new encoding/decoding API
The merge only re-order what we already have.
Merged-by: Clément Bœsch <u@pkh.me>
2017-03-21 14:43:53 +01:00
Clément Bœsch
83cd80d10a
Merge commit ' 12004a9a7f'
...
* commit '12004a9a7f ':
audiodsp/x86: yasmify vector_clipf_sse
audiodsp: reorder arguments for vector_clipf
Merged the version from Libav after a discussion with James Almer on
IRC:
19:22 <ubitux> jamrial: opinion on 12004a9a7f ?
19:23 <ubitux> it was apparently yasmified differently
19:23 <ubitux> (it depends on the previous commit arg shuffle)
19:24 <ubitux> i don't see the magic movsxdifnidn in your port btw
19:24 <ubitux> it's a port from 1d36defe94
19:25 <jamrial> seems better thanks to said arg shuffle
19:25 <jamrial> the loop is the same, but init is simpler
19:25 <jamrial> probably worth merging
19:25 <ubitux> OK
19:25 <ubitux> thanks
19:26 <jamrial> curious they didn't make len ptrdiff_t after the previous bunch of commits, heh
19:26 <ubitux> yeah indeed
Both commits are merged at the same time to prevent a conflict with our
existing yasmified ff_vector_clipf_sse.
Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 22:35:07 +01:00
Clément Bœsch
43a4c729d4
Merge commit ' 75d98e30af'
...
* commit '75d98e30af ':
audiodsp/x86: clear the high bits of the order parameter on 64bit
Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 18:44:00 +01:00
Clément Bœsch
072fad7cf5
Merge commit ' 1d6c76e11f'
...
* commit '1d6c76e11f ':
audiodsp/x86: fix ff_vector_clip_int32_sse2
No functionnal changes, only cosmetics. This issue was fixed in
9a9e2f1c8a .
Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 18:42:37 +01:00
Clément Bœsch
e07fa3008b
Merge commit ' de452e5037'
...
* commit 'de452e5037 ':
pixblockdsp: Change type of stride parameters to ptrdiff_t
Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 15:58:32 +01:00
Ilia
2f3d10a01a
avcodec/vp9: avx2 implementation of ipred_dl_16x16_16
...
vp9_diag_downleft_16x16_10bpp_c: 263.0
vp9_diag_downleft_16x16_10bpp_sse2: 44.7
vp9_diag_downleft_16x16_10bpp_ssse3: 32.5
vp9_diag_downleft_16x16_10bpp_avx: 31.9
vp9_diag_downleft_16x16_10bpp_avx2: 25.7
vp9_diag_downleft_16x16_12bpp_c: 264.7
vp9_diag_downleft_16x16_12bpp_sse2: 44.4
vp9_diag_downleft_16x16_12bpp_ssse3: 32.0
vp9_diag_downleft_16x16_12bpp_avx: 32.4
vp9_diag_downleft_16x16_12bpp_avx2: 25.5
Benchmarked with 10000 runs
Signed-off-by: Ilia <zakne0ne@gmail.com>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2017-03-20 09:47:43 -04:00
Mirage Abeysekara
5eb4f95bef
h264pred: added AVX2 implementation for tm_vp8 16x16.
...
checkasm --bench results with 5000 runs
pred16x16_tm_vp8_c: 302.8
pred16x16_tm_vp8_mmx: 101.4
pred16x16_tm_vp8_mmxext: 95.5
pred16x16_tm_vp8_sse2: 95.1
pred16x16_tm_vp8_avx2: 38.2
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2017-03-20 09:45:42 -04:00
James Almer
6966a5e4d7
Merge commit ' 721d57e608'
...
* commit '721d57e608 ':
vp56: Separate VP5 and VP6 dsp initialization
Merged-by: James Almer <jamrial@gmail.com>
2017-03-19 17:15:24 -03:00
James Almer
663640d745
Merge commit ' 3fd22538bc'
...
* commit '3fd22538bc ':
prores: Change type of stride parameters to ptrdiff_t
Merged-by: James Almer <jamrial@gmail.com>
2017-03-19 15:30:13 -03:00
James Almer
aec42ebc27
Merge commit ' f81be06cf6'
...
* commit 'f81be06cf6 ':
cavs: Change type of stride parameters to ptrdiff_t
Merged-by: James Almer <jamrial@gmail.com>
2017-03-19 15:23:52 -03:00
James Almer
4e4dfcac58
Merge commit ' 802727b538'
...
* commit '802727b538 ':
vp8: Update some assembly comments left unchanged in bd66f073fe
Merged-by: James Almer <jamrial@gmail.com>
2017-03-19 15:18:31 -03:00
James Almer
4004d33fcb
Merge commit ' d9d26a3674'
...
* commit 'd9d26a3674 ':
vp56: Change type of stride parameters to ptrdiff_t
Merged-by: James Almer <jamrial@gmail.com>
2017-03-19 14:54:25 -03:00
Clément Bœsch
6a42a54b9d
Merge commit ' 6892df9294'
...
* commit '6892df9294 ':
vp3: Change type of stride parameters to ptrdiff_t
Merged-by: Clément Bœsch <u@pkh.me>
2017-03-19 18:41:26 +01:00
Clément Bœsch
8695ce73ca
Merge commit ' e2b9993558'
...
* commit 'e2b9993558 ':
simple_idct: x86: Drop disabled IDCT implementation
Merged-by: Clément Bœsch <u@pkh.me>
2017-03-19 16:11:11 +01:00
Clément Bœsch
8286c359ad
Merge commit ' e99ecda550'
...
* commit 'e99ecda550 ':
checkasm: add vp9 MC tests.
vp9mc/x86: sse2 MC assembly.
vp9mc/x86: add AVX and AVX2 MC
vp9mc/x86: rename ff_* to ff_vp9_*
vp9mc/x86: rename ff_avg[48]_sse to ff_avg[48]_mmxext
vp9mc/x86: simplify a few inits.
vp9mc/x86: add 16px functions (64bit only).
Noop (aside from a formatting comment in vp9mc.asm). We already have all
of this. We should consider making a final diff between the two projects
when the dust comes down.
Merged-by: Clément Bœsch <u@pkh.me>
2017-03-16 20:25:39 +01:00
Clément Bœsch
a4f5e79f7c
Merge commit ' 89466de4ae'
...
* commit '89466de4ae ':
vp9/x86: rename vp9dsp to vp9mc
File was already renamed, only the top description is updated.
Merged-by: Clément Bœsch <u@pkh.me>
2017-03-16 20:10:47 +01:00
James Almer
e632fe9bab
Merge commit ' 3c504bc359'
...
* commit '3c504bc359 ':
x86: deduplicate some constants
Merged-by: James Almer <jamrial@gmail.com>
2017-03-15 22:07:28 -03:00
Michael Niedermayer
835d9f299c
avcodec/x86/cavsdsp: Put MMX code under mmx check
...
Without this the FPU state becomes trashed and causes mysterious
fate failures with cpuflags=0
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-03-06 16:47:17 +01:00
James Darnley
33de0fee2c
avcodec/h264: enable sse2 chroma deblock/loop filter functions
...
Between 1.00 and 1.16 times faster on Intel Yorkfield Core 2 Quad.
Between 1.11 and 1.39 times faster on Intel Kaby Lake Pentium.
2017-02-27 13:22:06 +01:00
James Darnley
cd893b9307
avcodec/h264: add avx 8-bit 4:2:2 chroma h intra deblock/loop filter
...
~1.37x faster (147 vs. 108 cycles) compared to mmxext function
2017-02-27 13:22:06 +01:00
James Darnley
0e16b3e2be
avcodec/h264: add avx 8-bit 4:2:0 chroma h intra deblock/loop filter
...
~1.10x faster (69 vs. 63 cycles) compared to mmxext function
2017-02-27 13:22:06 +01:00
James Darnley
987ffe4b8d
avcodec/h264: add avx 8-bit chroma v intra deblock/loop filter
...
~1.14x faster (90 vs 78 cycles) compared with mmxext
2017-02-27 13:22:06 +01:00
James Darnley
88307b3eec
avcodec/h264: add avx 8-bit 4:2:2 chroma h deblock/loop filter
...
~1.21x faster (68 vs. 56 cycles) compared with mmxext function
2017-02-27 13:22:06 +01:00
James Darnley
ac096fc82d
avcodec/h264: add avx 8-bit 4:2:0 chroma h deblock/loop filter
...
~1.14x faster (93 vs. 81 cycles) compared with mmxext function
2017-02-27 13:22:06 +01:00
James Darnley
5c56758843
avcodec/h264: add avx 8-bit chroma v deblock/loop filter
...
~1.24x faster (101 vs. 81 cycles) compared with mmxext function
2017-02-27 13:22:06 +01:00
James Darnley
5336887867
avcodec/h264: sse2, avx h luma mbaff deblock/loop filter
...
x86-64 only
Yorkfield:
- sse2: ~2.17x (434 vs. 200 cycles)
Nehalem:
- sse2: ~2.94x (409 vs. 139 cycles)
Skylake:
- sse2: ~3.10x (370 vs. 119 cycles)
- avx: ~3.29x (370 vs. 112 cycles)
2017-02-18 20:26:52 +01:00
James Darnley
e18bc2114f
avcodec/h264: add named parameters to x86 function
2017-02-18 20:26:50 +01:00