No description
This work is sponsored by, and copyright, Google.
The plain pixel put/copy functions are used from the 8 bit version,
for the double size (e.g. put16 uses ff_vp9_copy32_neon), and a new
copy128 is added.
Compared with the 8 bit version, the filters can no longer use the
trick to accumulate in 16 bit with only saturation at the end, but now
the accumulators need to be 32 bit. This avoids the need to keep track
of which filter index is the largest though, reducing the size of the
executable code for these filters.
For the horizontal filters, we only do 4 or 8 pixels wide in parallel
(while doing two rows at a time), since we don't have enough register
space to filter 16 pixels wide.
For the vertical filters, we still do 4 and 8 pixels in parallel just
as in the 8 bit case, but we need to store the output after every 2
rows instead of after every 4 rows.
Examples of relative speedup compared to the C version, from checkasm:
Cortex A7 A8 A9 A53
vp9_avg4_10bpp_neon: 2.25 2.44 3.05 2.16
vp9_avg8_10bpp_neon: 3.66 8.48 3.86 3.50
vp9_avg16_10bpp_neon: 3.39 8.26 3.37 2.72
vp9_avg32_10bpp_neon: 4.03 10.20 4.07 3.42
vp9_avg64_10bpp_neon: 4.15 10.01 4.13 3.70
vp9_avg_8tap_smooth_4h_10bpp_neon: 3.38 6.22 3.41 4.75
vp9_avg_8tap_smooth_4hv_10bpp_neon: 3.89 6.39 4.30 5.32
vp9_avg_8tap_smooth_4v_10bpp_neon: 5.32 9.73 6.34 7.31
vp9_avg_8tap_smooth_8h_10bpp_neon: 4.45 9.40 4.68 6.87
vp9_avg_8tap_smooth_8hv_10bpp_neon: 4.64 8.91 5.44 6.47
vp9_avg_8tap_smooth_8v_10bpp_neon: 6.44 13.42 8.68 8.79
vp9_avg_8tap_smooth_64h_10bpp_neon: 4.66 9.02 4.84 7.71
vp9_avg_8tap_smooth_64hv_10bpp_neon: 4.61 9.14 4.92 7.10
vp9_avg_8tap_smooth_64v_10bpp_neon: 6.90 14.13 9.57 10.41
vp9_put4_10bpp_neon: 1.33 1.46 2.09 1.33
vp9_put8_10bpp_neon: 1.57 3.42 1.83 1.84
vp9_put16_10bpp_neon: 1.55 4.78 2.17 1.89
vp9_put32_10bpp_neon: 2.06 5.35 2.14 2.30
vp9_put64_10bpp_neon: 3.00 2.41 1.95 1.66
vp9_put_8tap_smooth_4h_10bpp_neon: 3.19 5.81 3.31 4.63
vp9_put_8tap_smooth_4hv_10bpp_neon: 3.86 6.22 4.32 5.21
vp9_put_8tap_smooth_4v_10bpp_neon: 5.40 9.77 6.08 7.21
vp9_put_8tap_smooth_8h_10bpp_neon: 4.22 8.41 4.46 6.63
vp9_put_8tap_smooth_8hv_10bpp_neon: 4.56 8.51 5.39 6.25
vp9_put_8tap_smooth_8v_10bpp_neon: 6.60 12.43 8.17 8.89
vp9_put_8tap_smooth_64h_10bpp_neon: 4.41 8.59 4.54 7.49
vp9_put_8tap_smooth_64hv_10bpp_neon: 4.43 8.58 5.34 6.63
vp9_put_8tap_smooth_64v_10bpp_neon: 7.26 13.92 9.27 10.92
For the larger 8tap filters, the speedup vs C code is around 4-14x.
Signed-off-by: Martin Storsjö <martin@martin.st>
|
||
|---|---|---|
| compat | ||
| doc | ||
| libavcodec | ||
| libavdevice | ||
| libavfilter | ||
| libavformat | ||
| libavresample | ||
| libavutil | ||
| libpostproc | ||
| libswresample | ||
| libswscale | ||
| presets | ||
| tests | ||
| tools | ||
| .gitattributes | ||
| .gitignore | ||
| .travis.yml | ||
| arch.mak | ||
| Changelog | ||
| cmdutils.c | ||
| cmdutils.h | ||
| cmdutils_common_opts.h | ||
| cmdutils_opencl.c | ||
| common.mak | ||
| configure | ||
| CONTRIBUTING.md | ||
| COPYING.GPLv2 | ||
| COPYING.GPLv3 | ||
| COPYING.LGPLv2.1 | ||
| COPYING.LGPLv3 | ||
| CREDITS | ||
| ffmpeg.c | ||
| ffmpeg.h | ||
| ffmpeg_cuvid.c | ||
| ffmpeg_dxva2.c | ||
| ffmpeg_filter.c | ||
| ffmpeg_opt.c | ||
| ffmpeg_qsv.c | ||
| ffmpeg_vaapi.c | ||
| ffmpeg_vdpau.c | ||
| ffmpeg_videotoolbox.c | ||
| ffplay.c | ||
| ffprobe.c | ||
| ffserver.c | ||
| ffserver_config.c | ||
| ffserver_config.h | ||
| INSTALL.md | ||
| library.mak | ||
| LICENSE.md | ||
| MAINTAINERS | ||
| Makefile | ||
| README.md | ||
| RELEASE | ||
| version.sh | ||
FFmpeg README
FFmpeg is a collection of libraries and tools to process multimedia content such as audio, video, subtitles and related metadata.
Libraries
libavcodecprovides implementation of a wider range of codecs.libavformatimplements streaming protocols, container formats and basic I/O access.libavutilincludes hashers, decompressors and miscellaneous utility functions.libavfilterprovides a mean to alter decoded Audio and Video through chain of filters.libavdeviceprovides an abstraction to access capture and playback devices.libswresampleimplements audio mixing and resampling routines.libswscaleimplements color conversion and scaling routines.
Tools
- ffmpeg is a command line toolbox to manipulate, convert and stream multimedia content.
- ffplay is a minimalistic multimedia player.
- ffprobe is a simple analysis tool to inspect multimedia content.
- ffserver is a multimedia streaming server for live broadcasts.
- Additional small tools such as
aviocat,ismindexandqt-faststart.
Documentation
The offline documentation is available in the doc/ directory.
The online documentation is available in the main website and in the wiki.
Examples
Coding examples are available in the doc/examples directory.
License
FFmpeg codebase is mainly LGPL-licensed with optional components licensed under GPL. Please refer to the LICENSE file for detailed information.
Contributing
Patches should be submitted to the ffmpeg-devel mailing list using
git format-patch or git send-email. Github pull requests should be
avoided because they are not part of our review process and will be ignored.