Commit graph

2313 commits

Author SHA1 Message Date
James Almer
6171f178e7 x86/hevc_add_res: merge last remaining changes from 3d65359832
See https://lists.libav.org/pipermail/libav-devel/2016-October/079829.html
2017-03-31 20:49:45 -03:00
Clément Bœsch
1ea0df14c3 Merge commit '0361e4dcb4'
* commit '0361e4dcb4':
  h264_qpel: x86: Move function with only one instance out of template macro

Note: warning is present with clang.

Merged-by: Clément Bœsch <cboesch@gopro.com>
2017-03-31 09:44:04 +02:00
Ronald S. Bultje
f8c019944d vp9: re-split the decoder/format/dsp interface header files.
The advantage here is that the internal software decoder interface is
not exposed to the DSP functions or the hardware accelerations.
2017-03-28 18:04:26 -04:00
Clément Bœsch
1c9f4b5078 lavc/vp9: split into vp9{block,data,mvs}
This is following Libav layout to ease merges.
2017-03-27 21:38:21 +02:00
Michael Niedermayer
73fb40dc87 avcodec/x86/idctdsp: Remove duplicate include
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-03-26 19:17:30 +02:00
James Almer
ac42f08099 x86/hevc_add_res: merge missing changes from 3d65359832
Unrolling the loops triplicates the size of the assembled output
while not generating any gain in performance.
2017-03-24 11:24:18 -03:00
Clément Bœsch
3d65359832 Merge commit '6d5636ad9a'
* commit '6d5636ad9a':
  hevc: x86: Add add_residual() SIMD optimizations

See a6af4bf64d

This merge is only cosmetics (renames, space shuffling, etc).

The functionnal changes in the ASM are *not* merged:
- unrolling with %rep is kept
- ADD_RES_MMX_4_8 is left untouched: this needs investigation

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-24 12:33:25 +01:00
Clément Bœsch
40ac226014 lavc/x86/hevc: rename hevc_res_add to hevc_add_res
This will simplify incoming merge.
2017-03-24 11:45:23 +01:00
James Almer
bac44a5020 Merge commit 'b89804da9b'
* commit 'b89804da9b':
  x86: videodsp: Add parentheses to expression to work around warning

Merged-by: James Almer <jamrial@gmail.com>
2017-03-23 18:35:49 -03:00
James Almer
29db87af52 Merge commit '6be7944ee2'
* commit '6be7944ee2':
  x86: Add missing colons after assembly labels

Merged-by: James Almer <jamrial@gmail.com>
2017-03-23 18:05:27 -03:00
Clément Bœsch
947230837c Merge commit '112cee0241'
* commit '112cee0241':
  hevc: Add SSE2 and AVX IDCT

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-23 15:58:46 +01:00
Clément Bœsch
733b13ad66 Merge commit 'e4128c08d7'
* commit 'e4128c08d7':
  Revert "hevc: x86: Refactor IDCT macro declarations"

So apparently this was technically correct be reverted due to
authorship. Reverted as well in FFmpeg for now...

See http://lists.libav.org/pipermail/libav-devel/2016-October/079560.html

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-23 12:03:25 +01:00
Clément Bœsch
4bb4fa28e3 Merge commit '5801f9ed24'
* commit '5801f9ed24':
  h264_intrapred: x86: Update comments left behind in 95c89da36e

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-23 11:58:01 +01:00
Clément Bœsch
9954d5b44e Merge commit 'd9dccc0389'
* commit 'd9dccc0389':
  hevc: x86: Refactor IDCT macro declarations

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-23 11:54:53 +01:00
James Almer
30cadfe071 avcodec/lossless_videodsp: use ptrdiff_t for length parameters
Signed-off-by: James Almer <jamrial@gmail.com>
2017-03-22 18:38:35 -03:00
Clément Bœsch
af607b7e07 lavc/huffyuvdsp: only transmit the pix_fmt instead of the whole avctx
Only the pixel format is required in that init function. This will also
simplify the incoming merge.
2017-03-22 16:22:20 +01:00
Clément Bœsch
c66bd8f3ff Merge commit 'b57e38f52c'
* commit 'b57e38f52c':
  ac3dsp: x86: Replace inline asm for in-decoder downmixing with standalone asm

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-22 12:49:29 +01:00
Clément Bœsch
e39d4ff150 Merge commit '43717469f9'
* commit '43717469f9':
  ac3dsp: Reverse matrix in/out order in downmix()

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-22 11:29:46 +01:00
James Almer
aee046a895 x86/audiodsp: remove an unnecessary movss 2017-03-22 00:14:56 -03:00
James Almer
9a0fbb9ca9 Merge commit '2caa93b813'
* commit '2caa93b813':
  mpegaudiodsp: Change type of array stride parameters to ptrdiff_t

Merged-by: James Almer <jamrial@gmail.com>
2017-03-21 16:04:22 -03:00
James Almer
a8474df944 Merge commit 'e4a94d8b36'
* commit 'e4a94d8b36':
  h264chroma: Change type of stride parameters to ptrdiff_t

Merged-by: James Almer <jamrial@gmail.com>
2017-03-21 15:20:45 -03:00
James Almer
5a49097b42 Merge commit '2ec9fa5ec6'
* commit '2ec9fa5ec6':
  idct: Change type of array stride parameters to ptrdiff_t

Merged-by: James Almer <jamrial@gmail.com>
2017-03-21 14:29:52 -03:00
Clément Bœsch
f54da138e9 Merge commit '009adfd4fb'
* commit '009adfd4fb':
  x86: fpel: Remove unnecessary sign extend

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-21 15:02:31 +01:00
Clément Bœsch
ad98af27f7 Merge commit 'de2ae3c1fa'
* commit 'de2ae3c1fa':
  lavc: add clobber tests for the new encoding/decoding API

The merge only re-order what we already have.

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-21 14:43:53 +01:00
Clément Bœsch
83cd80d10a Merge commit '12004a9a7f'
* commit '12004a9a7f':
  audiodsp/x86: yasmify vector_clipf_sse
  audiodsp: reorder arguments for vector_clipf

Merged the version from Libav after a discussion with James Almer on
IRC:

19:22 <ubitux> jamrial: opinion on 12004a9a7f?
19:23 <ubitux> it was apparently yasmified differently
19:23 <ubitux> (it depends on the previous commit arg shuffle)
19:24 <ubitux> i don't see the magic movsxdifnidn in your port btw
19:24 <ubitux> it's a port from 1d36defe94
19:25 <jamrial> seems better thanks to said arg shuffle
19:25 <jamrial> the loop is the same, but init is simpler
19:25 <jamrial> probably worth merging
19:25 <ubitux> OK
19:25 <ubitux> thanks
19:26 <jamrial> curious they didn't make len ptrdiff_t after the previous bunch of commits, heh
19:26 <ubitux> yeah indeed

Both commits are merged at the same time to prevent a conflict with our
existing yasmified ff_vector_clipf_sse.

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 22:35:07 +01:00
Clément Bœsch
43a4c729d4 Merge commit '75d98e30af'
* commit '75d98e30af':
  audiodsp/x86: clear the high bits of the order parameter on 64bit

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 18:44:00 +01:00
Clément Bœsch
072fad7cf5 Merge commit '1d6c76e11f'
* commit '1d6c76e11f':
  audiodsp/x86: fix ff_vector_clip_int32_sse2

No functionnal changes, only cosmetics. This issue was fixed in
9a9e2f1c8a.

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 18:42:37 +01:00
Clément Bœsch
e07fa3008b Merge commit 'de452e5037'
* commit 'de452e5037':
  pixblockdsp: Change type of stride parameters to ptrdiff_t

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 15:58:32 +01:00
Ilia
2f3d10a01a avcodec/vp9: avx2 implementation of ipred_dl_16x16_16
vp9_diag_downleft_16x16_10bpp_c: 263.0
vp9_diag_downleft_16x16_10bpp_sse2: 44.7
vp9_diag_downleft_16x16_10bpp_ssse3: 32.5
vp9_diag_downleft_16x16_10bpp_avx: 31.9
vp9_diag_downleft_16x16_10bpp_avx2: 25.7
vp9_diag_downleft_16x16_12bpp_c: 264.7
vp9_diag_downleft_16x16_12bpp_sse2: 44.4
vp9_diag_downleft_16x16_12bpp_ssse3: 32.0
vp9_diag_downleft_16x16_12bpp_avx: 32.4
vp9_diag_downleft_16x16_12bpp_avx2: 25.5

Benchmarked with 10000 runs

Signed-off-by: Ilia <zakne0ne@gmail.com>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2017-03-20 09:47:43 -04:00
Mirage Abeysekara
5eb4f95bef h264pred: added AVX2 implementation for tm_vp8 16x16.
checkasm --bench results with 5000 runs

pred16x16_tm_vp8_c: 302.8
pred16x16_tm_vp8_mmx: 101.4
pred16x16_tm_vp8_mmxext: 95.5
pred16x16_tm_vp8_sse2: 95.1
pred16x16_tm_vp8_avx2: 38.2

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2017-03-20 09:45:42 -04:00
James Almer
6966a5e4d7 Merge commit '721d57e608'
* commit '721d57e608':
  vp56: Separate VP5 and VP6 dsp initialization

Merged-by: James Almer <jamrial@gmail.com>
2017-03-19 17:15:24 -03:00
James Almer
663640d745 Merge commit '3fd22538bc'
* commit '3fd22538bc':
  prores: Change type of stride parameters to ptrdiff_t

Merged-by: James Almer <jamrial@gmail.com>
2017-03-19 15:30:13 -03:00
James Almer
aec42ebc27 Merge commit 'f81be06cf6'
* commit 'f81be06cf6':
  cavs: Change type of stride parameters to ptrdiff_t

Merged-by: James Almer <jamrial@gmail.com>
2017-03-19 15:23:52 -03:00
James Almer
4e4dfcac58 Merge commit '802727b538'
* commit '802727b538':
  vp8: Update some assembly comments left unchanged in bd66f073fe

Merged-by: James Almer <jamrial@gmail.com>
2017-03-19 15:18:31 -03:00
James Almer
4004d33fcb Merge commit 'd9d26a3674'
* commit 'd9d26a3674':
  vp56: Change type of stride parameters to ptrdiff_t

Merged-by: James Almer <jamrial@gmail.com>
2017-03-19 14:54:25 -03:00
Clément Bœsch
6a42a54b9d Merge commit '6892df9294'
* commit '6892df9294':
  vp3: Change type of stride parameters to ptrdiff_t

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-19 18:41:26 +01:00
Clément Bœsch
8695ce73ca Merge commit 'e2b9993558'
* commit 'e2b9993558':
  simple_idct: x86: Drop disabled IDCT implementation

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-19 16:11:11 +01:00
Clément Bœsch
8286c359ad Merge commit 'e99ecda550'
* commit 'e99ecda550':
  checkasm: add vp9 MC tests.
  vp9mc/x86: sse2 MC assembly.
  vp9mc/x86: add AVX and AVX2 MC
  vp9mc/x86: rename ff_* to ff_vp9_*
  vp9mc/x86: rename ff_avg[48]_sse to ff_avg[48]_mmxext
  vp9mc/x86: simplify a few inits.
  vp9mc/x86: add 16px functions (64bit only).

Noop (aside from a formatting comment in vp9mc.asm). We already have all
of this. We should consider making a final diff between the two projects
when the dust comes down.

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-16 20:25:39 +01:00
Clément Bœsch
a4f5e79f7c Merge commit '89466de4ae'
* commit '89466de4ae':
  vp9/x86: rename vp9dsp to vp9mc

File was already renamed, only the top description is updated.

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-16 20:10:47 +01:00
James Almer
e632fe9bab Merge commit '3c504bc359'
* commit '3c504bc359':
  x86: deduplicate some constants

Merged-by: James Almer <jamrial@gmail.com>
2017-03-15 22:07:28 -03:00
Michael Niedermayer
835d9f299c avcodec/x86/cavsdsp: Put MMX code under mmx check
Without this the FPU state becomes trashed and causes mysterious
fate failures with cpuflags=0

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-03-06 16:47:17 +01:00
James Darnley
33de0fee2c avcodec/h264: enable sse2 chroma deblock/loop filter functions
Between 1.00 and 1.16 times faster on Intel Yorkfield Core 2 Quad.
Between 1.11 and 1.39 times faster on Intel Kaby Lake Pentium.
2017-02-27 13:22:06 +01:00
James Darnley
cd893b9307 avcodec/h264: add avx 8-bit 4:2:2 chroma h intra deblock/loop filter
~1.37x faster (147 vs. 108 cycles) compared to mmxext function
2017-02-27 13:22:06 +01:00
James Darnley
0e16b3e2be avcodec/h264: add avx 8-bit 4:2:0 chroma h intra deblock/loop filter
~1.10x faster (69 vs. 63 cycles) compared to mmxext function
2017-02-27 13:22:06 +01:00
James Darnley
987ffe4b8d avcodec/h264: add avx 8-bit chroma v intra deblock/loop filter
~1.14x faster (90 vs 78 cycles) compared with mmxext
2017-02-27 13:22:06 +01:00
James Darnley
88307b3eec avcodec/h264: add avx 8-bit 4:2:2 chroma h deblock/loop filter
~1.21x faster (68 vs. 56 cycles) compared with mmxext function
2017-02-27 13:22:06 +01:00
James Darnley
ac096fc82d avcodec/h264: add avx 8-bit 4:2:0 chroma h deblock/loop filter
~1.14x faster (93 vs. 81 cycles) compared with mmxext function
2017-02-27 13:22:06 +01:00
James Darnley
5c56758843 avcodec/h264: add avx 8-bit chroma v deblock/loop filter
~1.24x faster (101 vs. 81 cycles) compared with mmxext function
2017-02-27 13:22:06 +01:00
James Darnley
5336887867 avcodec/h264: sse2, avx h luma mbaff deblock/loop filter
x86-64 only

Yorkfield:
- sse2: ~2.17x (434 vs. 200 cycles)

Nehalem:
- sse2: ~2.94x (409 vs. 139 cycles)

Skylake:
- sse2: ~3.10x (370 vs. 119 cycles)
- avx:  ~3.29x (370 vs. 112 cycles)
2017-02-18 20:26:52 +01:00
James Darnley
e18bc2114f avcodec/h264: add named parameters to x86 function 2017-02-18 20:26:50 +01:00