Add an optional filter_line3 to the available optimisations. filter_line3 is equivalent to filter_line, memcpy, filter_line filter_line shares quite a number of loads and some calculations in common with its next iteration and testing shows that using aarch64 neon filter_line3s performance is 30% better than two filter_lines and a memcpy. Adds a test for vf_bwdif filter_line3 to checkasm Rounds job start lines down to a multiple of 4. This means that if filter_line3 exists then filter_line will not sometimes be called once at the end of a slice depending on thread count. The final slice may do up to 3 extra lines but filter_edge is faster than filter_line so it is unlikely to create any noticable thread load variation. Signed-off-by: John Cox <jc@kynesim.co.uk> Signed-off-by: Martin Storsjö <martin@martin.st> |
||
|---|---|---|
| .. | ||
| api | ||
| checkasm | ||
| fate | ||
| filtergraphs | ||
| ref | ||
| .gitignore | ||
| audiogen.c | ||
| audiomatch.c | ||
| base64.c | ||
| copycooker.sh | ||
| extended.ffconcat | ||
| fate-run.sh | ||
| fate-valgrind.supp | ||
| fate.sh | ||
| Makefile | ||
| md5.sh | ||
| refcmp-metadata.awk | ||
| reference.pnm | ||
| rotozoom.c | ||
| simple1.ffconcat | ||
| simple2.ffconcat | ||
| test.ffmeta | ||
| tiny_psnr.c | ||
| tiny_ssim.c | ||
| utils.c | ||
| videogen.c | ||