No description

Find a file

Martin Storsjö f1212e472b aarch64: vp9: Implement NEON loop filters This work is sponsored by, and copyright, Google. These are ported from the ARM version; thanks to the larger amount of registers available, we can do the loop filters with 16 pixels at a time. The implementation is fully templated, with a single macro which can generate versions for both 8 and 16 pixels wide, for both 4, 8 and 16 pixels loop filters (and the 4/8 mixed versions as well). For the 8 pixel wide versions, it is pretty close in speed (the v_4_8 and v_8_8 filters are the best examples of this; the h_4_8 and h_8_8 filters seem to get some gain in the load/transpose/store part). For the 16 pixels wide ones, we get a speedup of around 1.2-1.4x compared to the 32 bit version. Examples of runtimes vs the 32 bit version, on a Cortex A53: ARM AArch64 vp9_loop_filter_h_4_8_neon: 144.0 127.2 vp9_loop_filter_h_8_8_neon: 207.0 182.5 vp9_loop_filter_h_16_8_neon: 415.0 328.7 vp9_loop_filter_h_16_16_neon: 672.0 558.6 vp9_loop_filter_mix2_h_44_16_neon: 302.0 203.5 vp9_loop_filter_mix2_h_48_16_neon: 365.0 305.2 vp9_loop_filter_mix2_h_84_16_neon: 365.0 305.2 vp9_loop_filter_mix2_h_88_16_neon: 376.0 305.2 vp9_loop_filter_mix2_v_44_16_neon: 193.2 128.2 vp9_loop_filter_mix2_v_48_16_neon: 246.7 218.4 vp9_loop_filter_mix2_v_84_16_neon: 248.0 218.5 vp9_loop_filter_mix2_v_88_16_neon: 302.0 218.2 vp9_loop_filter_v_4_8_neon: 89.0 88.7 vp9_loop_filter_v_8_8_neon: 141.0 137.7 vp9_loop_filter_v_16_8_neon: 295.0 272.7 vp9_loop_filter_v_16_16_neon: 546.0 453.7 The speedup vs C code in checkasm tests is around 2-7x, which is pretty much the same as for the 32 bit version. Even if these functions are faster than their 32 bit equivalent, the C version that we compare to also became around 1.3-1.7x faster than the C version in 32 bit. Based on START_TIMER/STOP_TIMER wrapping around a few individual functions, the speedup vs C code is around 4-5x. Examples of runtimes vs C on a Cortex A57 (for a slightly older version of the patch): A57 gcc-5.3 neon loop_filter_h_4_8_neon: 256.6 93.4 loop_filter_h_8_8_neon: 307.3 139.1 loop_filter_h_16_8_neon: 340.1 254.1 loop_filter_h_16_16_neon: 827.0 407.9 loop_filter_mix2_h_44_16_neon: 524.5 155.4 loop_filter_mix2_h_48_16_neon: 644.5 173.3 loop_filter_mix2_h_84_16_neon: 630.5 222.0 loop_filter_mix2_h_88_16_neon: 697.3 222.0 loop_filter_mix2_v_44_16_neon: 598.5 100.6 loop_filter_mix2_v_48_16_neon: 651.5 127.0 loop_filter_mix2_v_84_16_neon: 591.5 167.1 loop_filter_mix2_v_88_16_neon: 855.1 166.7 loop_filter_v_4_8_neon: 271.7 65.3 loop_filter_v_8_8_neon: 312.5 106.9 loop_filter_v_16_8_neon: 473.3 206.5 loop_filter_v_16_16_neon: 976.1 327.8 The speed-up compared to the C functions is 2.5 to 6 and the cortex-a57 is again 30-50% faster than the cortex-a53. This is an adapted cherry-pick from libav commits `9d2afd1eb8` and `31756abe29`. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>		2016-11-15 15:10:03 -05:00
compat	compat/w32dlfcn.h: Add safe win32 dlopen/dlclose/dlsym functions.	2016-11-05 18:08:32 +11:00
doc	doc/filters: add metadata information for blackframe	2016-11-14 11:59:52 -09:00
libavcodec	aarch64: vp9: Implement NEON loop filters	2016-11-15 15:10:03 -05:00
libavdevice	lavd/xcbgrab: do not try to create refcounted packets.	2016-11-03 21:23:55 +01:00
libavfilter	lavfi/ebur128: use ff_ prefix	2016-11-13 19:11:07 -06:00
libavformat	lavf/Makefile: Fix rule for the data muxer.	2016-11-14 13:33:22 +01:00
libavresample	Bump minor versions after 3.2 branchpoint to seperate release	2016-10-26 20:52:42 +02:00
libavutil	aarch64: Add an offset parameter to the movrel macro	2016-11-15 15:10:03 -05:00
libpostproc	Bump minor versions after 3.2 branchpoint to seperate release	2016-10-26 20:52:42 +02:00
libswresample	Bump minor versions after 3.2 branchpoint to seperate release	2016-10-26 20:52:42 +02:00
libswscale	lsws: Add GRAY10 conversion.	2016-11-14 10:35:06 +01:00
presets
tests	Merge commit '`f8d17d5395`'	2016-11-14 15:29:08 +01:00
tools	tools: add loudnorm script example to use loudnorm	2016-11-11 19:22:52 +01:00
.gitattributes	Treat all '*.pnm' files as non-text file	2014-11-28 17:52:43 -05:00
.gitignore	Merge commit '`6641819fee`'	2016-06-26 15:43:05 +02:00
.travis.yml	Merge commit '`eda1832874`'	2015-11-22 17:12:24 +00:00
arch.mak	mips: rename mipsdspr1 to mipsdsp	2015-12-04 02:35:42 +01:00
Changelog	avformat: Add Pro-MPEG CoP #3-R2 FEC protocol	2016-11-13 11:38:15 +01:00
cmdutils.c	cmdutils: add show_demuxers and show_muxers	2016-11-08 01:56:31 +01:00
cmdutils.h	cmdutils: add show_demuxers and show_muxers	2016-11-08 01:56:31 +01:00
cmdutils_common_opts.h	cmdutils: add show_demuxers and show_muxers	2016-11-08 01:56:31 +01:00
cmdutils_opencl.c	all: use FFDIFFSIGN to resolve possible undefined behavior in comparators	2015-11-03 16:28:30 -05:00
common.mak	Merge commit '`c5fd4b5061`'	2016-06-27 19:39:46 +02:00
configure	Merge commit '`8c929037ec`'	2016-11-14 10:09:44 +01:00
CONTRIBUTING.md	Add CONTRIBUTING.md	2016-09-18 10:02:13 +01:00
COPYING.GPLv2
COPYING.GPLv3
COPYING.LGPLv2.1
COPYING.LGPLv3
CREDITS
ffmpeg.c	Merge commit '`b55566db4c`'	2016-11-14 14:56:52 +01:00
ffmpeg.h	Merge commit '`50722b4f0c`'	2016-11-13 15:33:39 +01:00
ffmpeg_cuvid.c	doc: fix spelling errors	2016-10-21 23:58:47 +02:00
ffmpeg_dxva2.c	Merge commit '`18c506e9e6`'	2016-06-26 15:34:01 +02:00
ffmpeg_filter.c	Merge commit '`50722b4f0c`'	2016-11-13 15:33:39 +01:00
ffmpeg_opt.c	Merge commit '`50722b4f0c`'	2016-11-13 15:33:39 +01:00
ffmpeg_qsv.c	ffmpeg_qsv: Fix hwaccel transcoding	2016-11-13 17:49:48 +00:00
ffmpeg_vaapi.c	ffmpeg_vaapi: fix choice of decoder_format	2016-09-29 01:23:52 +02:00
ffmpeg_vdpau.c	Merge commit '`f72db3f2f3`'	2016-06-26 15:29:39 +02:00
ffmpeg_videotoolbox.c	ffmpeg/videotoolbox: protect UTGetOSTypeFromString on both VDA and VT	2015-10-15 10:22:31 +02:00
ffplay.c	Merge commit '`beb62dac62`'	2016-10-07 13:16:36 +02:00
ffprobe.c	lavf: add AV_DISPOSITION_TIMED_THUMBNAILS	2016-10-24 05:47:05 -05:00
ffserver.c	ffserver: use AVStream.codecpar in open_input_stream()	2016-11-08 12:12:19 +01:00
ffserver_config.c	ffserver: Throw ffm.h out its not used except for a constant that is part of the format	2016-11-07 19:27:40 +01:00
ffserver_config.h	ffserver: Throw ffm.h out its not used except for a constant that is part of the format	2016-11-07 19:27:40 +01:00
INSTALL.md
library.mak	Merge commit '`c5fd4b5061`'	2016-06-27 19:39:46 +02:00
LICENSE.md	lavc: remove libfaac wrapper	2016-10-01 19:58:04 +01:00
MAINTAINERS	MAINTAINERS: Add myself to flvenc	2016-11-09 17:49:19 +01:00
Makefile	Merge commit '`6641819fee`'	2016-06-26 15:43:05 +02:00
README.md	Add CONTRIBUTING.md	2016-09-18 10:02:13 +01:00
RELEASE	RELEASE: Update for past 3.2 branch	2016-10-26 20:52:43 +02:00
version.sh	version.sh: Fix spurious rebuilds.	2016-03-10 09:53:10 +01:00

README.md

FFmpeg README

FFmpeg is a collection of libraries and tools to process multimedia content such as audio, video, subtitles and related metadata.

Libraries

libavcodec provides implementation of a wider range of codecs.
libavformat implements streaming protocols, container formats and basic I/O access.
libavutil includes hashers, decompressors and miscellaneous utility functions.
libavfilter provides a mean to alter decoded Audio and Video through chain of filters.
libavdevice provides an abstraction to access capture and playback devices.
libswresample implements audio mixing and resampling routines.
libswscale implements color conversion and scaling routines.

Tools

ffmpeg is a command line toolbox to manipulate, convert and stream multimedia content.
ffplay is a minimalistic multimedia player.
ffprobe is a simple analysis tool to inspect multimedia content.
ffserver is a multimedia streaming server for live broadcasts.
Additional small tools such as aviocat, ismindex and qt-faststart.

Documentation

The offline documentation is available in the doc/ directory.

The online documentation is available in the main website and in the wiki.

Examples

Coding examples are available in the doc/examples directory.

License

FFmpeg codebase is mainly LGPL-licensed with optional components licensed under GPL. Please refer to the LICENSE file for detailed information.

Contributing

Patches should be submitted to the ffmpeg-devel mailing list using git format-patch or git send-email. Github pull requests should be avoided because they are not part of our review process and will be ignored.