Michael Niedermayer
a665704402
Merge commit ' 4d6ee07255'
...
* commit '4d6ee07255 ':
libavutil: x86: Add AVX2 capable CPU detection.
Conflicts:
libavutil/cpu.c
libavutil/cpu.h
libavutil/x86/cpu.c
See: 865b70bc5d
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-26 02:36:36 +02:00
Kieran Kunhya
865b70bc5d
Add AVX2 capable CPU detection. Patch based on x264's AVX2 detection
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-26 02:34:22 +02:00
Kieran Kunhya
4d6ee07255
libavutil: x86: Add AVX2 capable CPU detection.
...
Patch based on x264's AVX2 detection
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-25 19:36:55 +01:00
Michael Niedermayer
f9bef2bec9
Merge remote-tracking branch 'qatar/master'
...
* qatar/master:
x86: more AVX2 framework
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-14 16:13:57 +02:00
Michael Niedermayer
e3e0e3d0c9
Merge commit ' c6908d6b4b'
...
* commit 'c6908d6b4b ':
x86inc: FMA3/4 Support
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-14 16:06:22 +02:00
Michael Niedermayer
9ac124c889
Merge commit ' 206895708e'
...
* commit '206895708e ':
x86inc: Remove our FMA4 support
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-14 15:54:23 +02:00
Michael Niedermayer
12e4493f9c
Merge commit ' c108ba0175'
...
* commit 'c108ba0175 ':
x86inc: Use VEX-encoded instructions in AVX functions
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-14 15:48:34 +02:00
Jason Garrett-Glaser
a3fabc6cb3
x86: more AVX2 framework
...
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-14 12:41:56 +01:00
Jason Garrett-Glaser
c6908d6b4b
x86inc: FMA3/4 Support
...
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-14 12:41:54 +01:00
Derek Buitenhuis
206895708e
x86inc: Remove our FMA4 support
...
This is so we can sync to x264's version of FMA4 support.
This partialy reverts commit 79687079a9 .
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-14 12:39:29 +01:00
Henrik Gramner
c108ba0175
x86inc: Use VEX-encoded instructions in AVX functions
...
Automatically use VEX-encoding in AVX/AVX2/XOP/FMA3/FMA4
functions for all instructions that exists in a VEX-encoded
version.
This change makes it easier to extend existing code to use AVX2.
Also add support for AVX emulation of a few instructions that
were missing before.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-14 12:36:11 +01:00
Michael Niedermayer
31d0d35560
Merge remote-tracking branch 'qatar/master'
...
* qatar/master:
x86inc: Remove .rodata kludges
Conflicts:
libavutil/x86/x86inc.asm
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-09 14:29:42 +02:00
Henrik Gramner
ad7d7d4f6a
x86inc: Remove .rodata kludges
...
The Mach-O bug was fixed in yasm 0.8.0 and we don't
support versions that old anymore.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-09 07:44:30 -04:00
Michael Niedermayer
19c3890819
Merge commit ' 3e2fa991db'
...
* commit '3e2fa991db ':
x86inc: remove misaligned cpu flag
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-08 12:02:21 +02:00
Michael Niedermayer
31d9aa6b2e
Merge commit ' 7115566541'
...
* commit '7115566541 ':
x86inc: various minor backports from x264
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-08 11:57:39 +02:00
Michael Niedermayer
3f965ab95d
Merge commit ' 47f9d7ce54'
...
* commit '47f9d7ce54 ':
x86inc: Check for __OUTPUT_FORMAT__ having a value of "x64"
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-08 11:37:22 +02:00
Michael Niedermayer
1f17619fe4
Merge commit ' bbe4a6db44'
...
* commit 'bbe4a6db44 ':
x86inc: Utilize the shadow space on 64-bit Windows
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-08 11:23:00 +02:00
Michael Niedermayer
17d9c7c208
Merge commit ' 3fb78e99a0'
...
* commit '3fb78e99a0 ':
x86inc: create xm# and ym#, analagous to m#
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-08 11:15:17 +02:00
Michael Niedermayer
3352fdb292
Merge commit ' 49ebe3f9fe'
...
* commit '49ebe3f9fe ':
x86inc: fix some corner cases of SWAP
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-08 11:07:03 +02:00
Michael Niedermayer
006c0fcfea
Merge commit ' 63f0d62310'
...
* commit '63f0d62310 ':
x86inc: Use SSE instead of SSE2 for copying data
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-08 11:01:40 +02:00
Michael Niedermayer
faafffaf82
Merge commit ' ad76e6e7e1'
...
* commit 'ad76e6e7e1 ':
x86inc: Set ELF hidden visibility for global constants
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-08 10:52:51 +02:00
Michael Niedermayer
c1488fab3d
Merge commit ' 25cb0c1a1e'
...
* commit '25cb0c1a1e ':
x86inc: activate REP_RET automatically
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-08 10:27:30 +02:00
Henrik Gramner
3e2fa991db
x86inc: remove misaligned cpu flag
...
Prevents a crash if the misaligned exception mask bit is
cleared for some reason.
Misaligned SSE functions are only used on AMD Phenom CPUs
and the benefit is miniscule. They also require modifying
the MXCSR control register and by removing those functions
we can get rid of that complexity altogether.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:27:38 -04:00
Jason Garrett-Glaser
7115566541
x86inc: various minor backports from x264
...
Small backports that sneaked into other asm commits in x264.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:27:22 -04:00
Derek Buitenhuis
47f9d7ce54
x86inc: Check for __OUTPUT_FORMAT__ having a value of "x64"
...
This is also a valid value for WIN64.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:27:08 -04:00
Henrik Gramner
bbe4a6db44
x86inc: Utilize the shadow space on 64-bit Windows
...
Store XMM6 and XMM7 in the shadow space in functions that
clobbers them. This way we don't have to adjust the stack
pointer as often, reducing the number of instructions as
well as code size.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:25:35 -04:00
Loren Merritt
3fb78e99a0
x86inc: create xm# and ym#, analagous to m#
...
For when we want to mix simd sizes within one function.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:25:19 -04:00
Loren Merritt
49ebe3f9fe
x86inc: fix some corner cases of SWAP
...
SWAP with >=3 named (rather than numbered) args
PERMUTE followed by SWAP with 2 named args
used to produce the wrong permutation
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:25:06 -04:00
Henrik Gramner
63f0d62310
x86inc: Use SSE instead of SSE2 for copying data
...
Reduces code size because movaps/movups is one byte
shorter than movdqa/movdqu.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:24:33 -04:00
Henrik Gramner
ad76e6e7e1
x86inc: Set ELF hidden visibility for global constants
...
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:24:13 -04:00
Loren Merritt
25cb0c1a1e
x86inc: activate REP_RET automatically
...
Now RET checks whether it immediately follows a branch, so the
programmer dosen't have to keep track of that condition. REP_RET
is still needed manually when it's a branch target, but that's
much rarer.
The implementation involves lots of spurious labels, but that's OK
because we strip them.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:17:59 -04:00
Ronald S. Bultje
c07ac8d467
VP9 MC (ssse3) optimizations.
...
Decoding time of ped1080p.webm goes from 20.7sec to 11.3sec.
2013-10-02 21:03:15 -04:00
Michael Niedermayer
361bc70731
Merge remote-tracking branch 'qatar/master'
...
* qatar/master:
avutil: Fix compilation with inline asm disabled on mingw
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-09-22 11:51:38 +02:00
Alex Smith
08fa828b3f
avutil: Fix compilation with inline asm disabled on mingw
...
Because of -Werror=implicit-function-declaration the build will fail.
Signed-off-by: Martin Storsjö <martin@martin.st>
2013-09-22 00:50:32 +03:00
Thilo Borgmann
d814a839ac
Reinstate proper FFmpeg license for all files.
2013-08-30 15:47:38 +00:00
Michael Niedermayer
f0a3562382
Merge commit ' 79aec43ce8'
...
* commit '79aec43ce8 ':
x86: Add and use more convenience macros to check CPU extension availability
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-08-30 11:57:35 +02:00
Michael Niedermayer
2a60666d1d
Merge commit ' 8410d6e93c'
...
* commit '8410d6e93c ':
avutil: Refactor CPU extension availability macros
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-08-29 14:15:10 +02:00
Michael Niedermayer
c83d794936
Merge commit ' b78b10c4b7'
...
* commit 'b78b10c4b7 ':
avutil: Move internal CPU detection function declarations to private header
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-08-29 14:05:15 +02:00
Diego Biurrun
79aec43ce8
x86: Add and use more convenience macros to check CPU extension availability
2013-08-29 13:07:37 +02:00
Diego Biurrun
8410d6e93c
avutil: Refactor CPU extension availability macros
2013-08-28 23:54:14 +02:00
Diego Biurrun
b78b10c4b7
avutil: Move internal CPU detection function declarations to private header
2013-08-28 23:54:14 +02:00
Michael Niedermayer
9d01bf7d66
Merge remote-tracking branch 'qatar/master'
...
* qatar/master:
Consistently use "cpu_flags" as variable/parameter name for CPU flags
Conflicts:
libavcodec/x86/dsputil_init.c
libavcodec/x86/h264dsp_init.c
libavcodec/x86/hpeldsp_init.c
libavcodec/x86/motion_est.c
libavcodec/x86/mpegvideo.c
libavcodec/x86/proresdsp_init.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-07-18 09:53:47 +02:00
Diego Biurrun
3ac7fa81b2
Consistently use "cpu_flags" as variable/parameter name for CPU flags
2013-07-18 00:31:35 +02:00
Michael Niedermayer
a478e99a60
avutil/x86: reenable ff_update_lls_avx()
...
The bug has been fixed in c8b920a9b7 by Loren Merritt
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-07-02 12:02:08 +02:00
Michael Niedermayer
d1fa671895
Merge commit ' c8b920a9b7'
...
* commit 'c8b920a9b7 ':
lls/x86: use 3-operator vaddpd in ADDPD_MEM
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-07-02 11:40:44 +02:00
Loren Merritt
c8b920a9b7
lls/x86: use 3-operator vaddpd in ADDPD_MEM
...
Fixes build with yasm-1.1
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2013-07-02 10:15:09 +02:00
Michael Niedermayer
a6e46ed51a
Revert "avutil/x86: disable ff_evaluate_lls_sse2() for 32bit"
...
This reverts commit 247425241c .
2013-07-01 02:27:47 +02:00
Michael Niedermayer
4e488ac5f5
Merge remote-tracking branch 'qatar/master'
...
* qatar/master:
x86: lpc: fix a segfault in av_evaluate_lls_sse2()
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-07-01 02:26:22 +02:00
Loren Merritt
1221bb6239
x86: lpc: fix a segfault in av_evaluate_lls_sse2()
2013-06-30 23:11:19 +00:00
Michael Niedermayer
247425241c
avutil/x86: disable ff_evaluate_lls_sse2() for 32bit
...
It just segfaults on 32bit, thus its disabled until someone fixes it.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-06-30 19:03:57 +02:00