Commit graph

50131 commits

Author SHA1 Message Date
Rémi Denis-Courmont
f883746587 lavc/flacdsp: do not assume maximum R-V VL
This loop correctly assumes that VLMAX=16 (4x128-bit vectors
with 32-bit elements) and 32 >= pred_order > 16. We need to alternate
between VL=16 and VL=t2=pred_order-16 elements to add up to pred_order.

The current code requests AVL=a2=pred_order elements. In QEMU and on
thte K230 hardware, this sets VL=16 as we need. But the specification
merely guarantees that we get: ceil(AVL / 2) <= VL <= VLMAX. For
instance, if pred_order equals 27, we could end up with VL=14 or VL=15
instead of VL=16. So instead, request literally VLMAX=16.
2024-05-25 10:31:50 +03:00
Andreas Rheinhardt
aff24c1658 avcodec/flacdec: Remove unused variable
Forgotten in 0380a03f1f.

Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-05-24 19:05:57 +02:00
Rémi Denis-Courmont
ba38d0e328 lavc/pixblockdsp: add scalar get_pixels_unaligned
The code is already there, we just need to use it.

get_pixels_unaligned_c: 2.2
get_pixels_unaligned_misaligned: 1.7
2024-05-24 17:53:43 +03:00
James Almer
0380a03f1f avcodec/flacdsp: split off lpc33 into a dsp function
Signed-off-by: James Almer <jamrial@gmail.com>
2024-05-24 09:23:00 -03:00
Haihao Xiang
8155808ce6 libavcodec/x86/vvc/vvc_sad: fix assembler error
X86ASM    libavcodec/x86/vvc/vvc_sad.o
libavcodec/x86/vvc/vvc_sad.asm:85: error: invalid number of operands
libavcodec/x86/vvc/vvc_sad.asm:87: error: invalid number of operands

Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2024-05-23 09:12:50 -03:00
Stone Chen
0e52a4e434 libavcodec/x86/vvc: Add AVX2 DMVR SAD functions for VVC
Implements AVX2 DMVR (decoder-side motion vector refinement) SAD functions. DMVR SAD is only calculated if w >= 8, h >= 8, and w * h > 128. To reduce complexity, SAD is only calculated on even rows. This is calculated for all video bitdepths, but the values passed to the function are always 16bit (even if the original video bitdepth is 8). The AVX2 implementation uses min/max/sub.

Additionally this changes parameters dx and dy from int to intptr_t. This allows dx & dy to be used as pointer offsets without needing to use movsxd.

Benchmarks ( AMD 7940HS )
Before:
BQTerrace_1920x1080_60_10_420_22_RA.vvc | 106.0 |
Chimera_8bit_1080P_1000_frames.vvc | 204.3 |
NovosobornayaSquare_1920x1080.bin | 197.3 |
RitualDance_1920x1080_60_10_420_37_RA.266 | 174.0 |

After:
BQTerrace_1920x1080_60_10_420_22_RA.vvc | 109.3 |
Chimera_8bit_1080P_1000_frames.vvc | 216.0 |
NovosobornayaSquare_1920x1080.bin | 204.0|
RitualDance_1920x1080_60_10_420_37_RA.266 | 181.7 |

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2024-05-22 20:36:21 -03:00
Rémi Denis-Courmont
910d281b21 lavc/h263dsp: R-V V {h,v}_loop_filter
Since the horizontal and vertical filters are identical except for a
transposition, this uses a common subprocedure with an ad-hoc ABI.
To preserve return-address stack prediction, a link register has to be
used (c.f. the "Control Transfer Instructions" from the
RISC-V ISA Manual). The alternate/temporary link register T0 is used
here, so that the normal RA is preserved (something Arm cannot do!).

To load the strength value based on `qscale`, the shortest possible
and PIC-compatible sequence is used: AUIPC; ADD; LBU. The classic
LLA; ADD; LBU sequence would add one more instruction since LLA is a
convenience alias for AUIPC; ADDI. To ensure that this trick works,
relocation relaxation is disabled.

To implement the two signed divisions by a power of two toward zero:
 (x / (1 << SHIFT))
the code relies on the small range of integers involved, computing:
 (x + (x >> (16 - SHIFT))) >> SHIFT
rather than the more general:
 (x + ((x >> (16 - 1)) & ((1 << SHIFT) - 1))) >> SHIFT
Thus one ANDI instruction is avoided.

T-Head C908:
h263dsp.h_loop_filter_c:       228.2
h263dsp.h_loop_filter_rvv_i32: 144.0
h263dsp.v_loop_filter_c:       242.7
h263dsp.v_loop_filter_rvv_i32: 114.0
(C is probably worse in real use due to less predictible branches.)
2024-05-22 19:15:39 +03:00
James Almer
3d1597d3e2 x86/vvc_alf: use the x86inc instruction macros
Let its magic figure out the correct mnemonic based on target instruction set.

Signed-off-by: James Almer <jamrial@gmail.com>
2024-05-22 20:51:30 +08:00
sunyuechi
0c1304ae11 lavc/vp9dsp: R-V V mc avg
C908:
vp9_avg4_8bpp_c: 1.2
vp9_avg4_8bpp_rvv_i64: 1.0
vp9_avg8_8bpp_c: 3.7
vp9_avg8_8bpp_rvv_i64: 1.5
vp9_avg16_8bpp_c: 14.7
vp9_avg16_8bpp_rvv_i64: 3.5
vp9_avg32_8bpp_c: 57.7
vp9_avg32_8bpp_rvv_i64: 10.0
vp9_avg64_8bpp_c: 229.0
vp9_avg64_8bpp_rvv_i64: 31.7

Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
2024-05-21 21:28:14 +03:00
Rémi Denis-Courmont
7591eb4055 Revert "lavc/sbrdsp: R-V V neg_odd_64"
While this function can easily be written with vectors, it just fails to
get any performance improvement.

For reference, this is a simpler loop-free implementation that does get
better performance than the current one depending on hardware, but still
more or less the same metrics as the C code:

 func ff_sbr_neg_odd_64_rvv, zve64x
         li      a1, 32
         addi    a0, a0, 7
         li      t0, 8
         vsetvli zero, a1, e8, m2, ta, ma
         li      t1, 0x80
         vlse8.v v8, (a0), t0
         vxor.vx v8, v8, t1
         vsse8.v v8, (a0), t0
         ret
 endfunc

This reverts commit d06fd18f8f.
2024-05-21 21:26:39 +03:00
Rémi Denis-Courmont
d452db8410 lavc/vc1dsp: R-V V vc1_unescape_buffer
Notes:
- The loop is biased toward no unescaped bytes as that should be most common.
- The input byte array is slid rather than the (8 times smaller) bit-mask,
  as RISC-V V does not provide a bit-mask (or bit-wise) slide instruction.
- There are two comparisons with 0 per iteration, for the same reason.
- In case of match, bytes are copied until the first match, and the loop is
  restarted after the escape byte. Vector compression (vcompress.vm) could
  discard all escape bytes but that is slower if escape bytes are rare.

Further optimisations should be possible, e.g.:
- processing 2 bytes fewer per iteration to get rid of a 2 slides,
- taking a short cut if the input vector contains less than 2 zeroes.
But this is a good starting point:

T-Head C908:
vc1dsp.vc1_unescape_buffer_c:      12749.5
vc1dsp.vc1_unescape_buffer_rvv_i32: 6009.0

SpacemiT X60:
vc1dsp.vc1_unescape_buffer_c:      11038.0
vc1dsp.vc1_unescape_buffer_rvv_i32: 2061.0
2024-05-21 21:16:30 +03:00
Nuo Mi
1b33c9a50a avcodec/vvcdec: support Reference Picture Resampling
passed clips:
    RPR_A_Alibaba_4.bit
    RPR_B_Alibaba_3.bit
    RPR_C_Alibaba_3.bit
    RPR_D_Qualcomm_1.bit
    VVC_HDR_UHDTV1_OpenGOP_Max3840x2160_50fps_HLG10_res_change_with_RPR.ts
2024-05-21 20:20:25 +08:00
Nuo Mi
cae0b01282 avcodec/vvcdec: increase edge_emu_buffer for RPR 2024-05-21 20:20:25 +08:00
Nuo Mi
7904ec2d34 avcodec/vvcdec: refact, remove hf_idx and vf_idx from mc_xxx's param list 2024-05-21 20:20:25 +08:00
Nuo Mi
77d971c348 avcodec/vvcdec: refact out luma_prof from luma_prof_bi 2024-05-21 20:20:25 +08:00
Nuo Mi
ac4575594f avcodec/vvcdec: fix dmvr, bdof, cb_prof for RPR 2024-05-21 20:20:25 +08:00
Nuo Mi
77acd0a0dd avcodec/vvcdec: inter, wait reference with a different resolution
For RPR, the current frame may reference a frame with a different resolution.
Therefore, we need to consider frame scaling when we wait for reference pixels.
2024-05-21 20:20:25 +08:00
Nuo Mi
deda59a996 avcodec/vvcdec: add RPR dsp 2024-05-21 20:20:25 +08:00
Nuo Mi
e70225e0a8 avcodec/vvcdec: emulated_edge, use reference frame's sps and pps
a preparation for Reference Picture Resampling
2024-05-21 20:20:25 +08:00
Nuo Mi
aa8d5c6e7e avcodec/vvcdec: add vvc inter filters for RPR 2024-05-21 20:20:25 +08:00
Nuo Mi
08ad51ece6 avcodec/vvcdec: refact, pred_get_refs return VVCRefPic instead of VVCFrame 2024-05-21 20:20:25 +08:00
Nuo Mi
66c6bee061 avcodec/vvcdec: refact out VVCRefPic from RefPicList 2024-05-21 20:20:25 +08:00
Nuo Mi
44bbafb69f avcodec/vvcdec: refact, unify pred_regular_{luma, chroma} to pred_regular 2024-05-21 20:20:25 +08:00
Nuo Mi
875fa9692c avcodec/vvcdec: misc, remove unused EMULATED_EDGE_{LUMA, CHROMA}, EMULATED_EDGE_DMVR_{LUAM, CHROMA} 2024-05-21 20:20:25 +08:00
Nuo Mi
84a93d91d1 avcodec/vvcdec: refact, unify {luma, chroma}_mc_bi to mc_bi 2024-05-21 20:20:25 +08:00
Nuo Mi
6769fe1614 avcodec/vvcdec: refact, unify {luma, chroma}_mc_uni to mc_uni 2024-05-21 20:20:25 +08:00
Nuo Mi
bc099afc8d avcodec/vvcdec: refact, unify {luma, chroma}_mc to mc 2024-05-21 20:20:25 +08:00
Nuo Mi
1289da9244 avcodec/vvcdec: misc, inter, use is_chroma instead of is_luma 2024-05-21 20:20:25 +08:00
David Rosca
f7a1453f27 lavc/vaapi_decode: Reject decoding of frames with no slices
Matches other hwaccels.
2024-05-21 16:57:46 +08:00
James Almer
b113050d96 avcodec/cbs_h266: read vps_ptl_max_tid before using it
Reviewed-by: Nuo Mi <nuomi2021@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2024-05-20 10:29:30 -03:00
Andreas Rheinhardt
2c94b1bbf1 avcodec/tiff: Fix leak on error
Fixes Coverity issue #1516957.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-05-20 14:15:48 +02:00
Andreas Rheinhardt
59b1838e09 avcodec/ac3enc: Move transient PutBitContext to stack
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-05-20 14:11:25 +02:00
Andreas Rheinhardt
e863cbceae avcodec/ac3enc_template: Avoid always-true check
This might also help Coverity with issue #1596532.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-05-20 14:11:03 +02:00
Andreas Rheinhardt
482afe8f3f avcodec/lib*, avformat/tee: Simplify iterating over AVDictionary
Reviewed-by: epirat07@gmail.com
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-05-20 13:51:55 +02:00
Andreas Rheinhardt
a2874c5721 avcodec/aac_ac3_parser: Use ff_adts_header_parse_buf()
instead of avpriv_adts_header_parse(). Using the former avoids
an indirection.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-05-20 12:06:50 +02:00
Andreas Rheinhardt
12ded9cd85 avcodec/adts_header: Add ff_adts_header_parse_buf()
Most users of ff_adts_header_parse() don't already have
an opened GetBitContext for the header, so add a convenience
function for them.
Also use a forward declaration of GetBitContext in adts_header.h
as this avoids (implicit) inclusion of get_bits.h in some of
the users that now no longer use a GetBitContext of their own.

Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-05-20 12:06:31 +02:00
Andreas Rheinhardt
ae937c4902 avcodec/aac_ac3_parser: Untangle AAC and AC3 parsing error codes
Also remove the (unused) AAC_AC3_PARSE_ERROR_CHANNEL_CFG while at it;
furthermore, fix the documentation of ff_ac3_parse_header()
and (ff|avpriv)_adts_header_parse().

Reviewed-by: Andrew Sayers <ffmpeg-devel@pileofstuff.org>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-05-20 11:58:07 +02:00
Andreas Rheinhardt
6c812a80dd avcodec/adts_parser: Don't presume buffer to be padded
The documentation of av_adts_header_parse() does not require
the buffer to be padded at all.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-05-20 11:48:03 +02:00
Haihao Xiang
a00cfc6c24 lavc/qsvdec: require a dynamic frame pool if possible
This allows a downstream element stores more frames from qsv decoders
and fixes error in get_buffer().

$ ffmpeg -hwaccel qsv -hwaccel_output_format qsv -i input.mp4 -vf
reverse -f null -

[vist#0:0/h264 @ 0x562248f12c50] Decoding error: Cannot allocate memory
[h264_qsv @ 0x562248f66b10] get_buffer() failed

Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
2024-05-20 09:30:49 +08:00
Haihao Xiang
75015f9b0e lavc/qsvenc: use the right info for encoding
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
2024-05-20 09:30:49 +08:00
Haihao Xiang
cda721e01d lavc/qsv: fix the mfx allocator to support dynamic frame pool
When the external allocator is used for dynamic frame allocation, only
video memory is supported, the SDK doesn't lock/unlock the memory block
via Lock()/Unlock() calls.

Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
2024-05-20 09:30:49 +08:00
Michael Niedermayer
e35fe3d8b9
avcodec/mscc & mwsc: Check loop counts before use
This could cause timeouts

Fixes: CID1439568 Untrusted loop bound

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-05-19 22:14:39 +02:00
Michael Niedermayer
b6b2b01025
avcodec/mpegvideo_enc: Fix potential overflow in RD
Fixes: CID1500285 Unintentional integer overflow

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-05-19 22:14:38 +02:00
Michael Niedermayer
8fc649b931
avcodec/mpeg4videodec: assert impossible wrap points
Helps: CID1473517 Uninitialized scalar variable
Helps: CID1473497 Uninitialized scalar variable

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-05-19 22:14:37 +02:00
Michael Niedermayer
4c725df059
avcodec/mpeg12dec: Use 64bit in bit computation
I dont think this can actually overflow but 64bit seems reasonable to use

Fixes: CID1521983 Unintentional integer overflow

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-05-19 22:14:37 +02:00
Michael Niedermayer
6a9302739f
avcodec/vqcdec: Check init_get_bits8() for failure
Fixes: CID1516090 Unchecked return value

Sponsored-by: Sovereign Tech Fund
Reviewed-by: Peter Ross <pross@xvid.org>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-05-19 22:12:55 +02:00
Michael Niedermayer
4a8506c794
avcodec/vvc/dec: Check init_get_bits8() for failure
Fixes: CID1560042 Unchecked return value

Sponsored-by: Sovereign Tech Fund
Reviewed-by: Nuo Mi <nuomi2021@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-05-19 22:12:55 +02:00
Michael Niedermayer
dd5379db5d
avcodec/vble: Check av_image_get_buffer_size() for failure
Fixes: CID1461482 Improper use of negative value

Sponsored-by: Sovereign Tech Fund
Reviewed-.by: "Xiang, Haihao" <haihao.xiang@intel.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-05-19 22:12:54 +02:00
Michael Niedermayer
1b991e77b9
avcodec/vp3: Replace check by assert
Fixes: CID1452425 Logically dead code

Sponsored-by: Sovereign Tech Fund
Reviewed-by: Peter Ross <pross@xvid.org>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-05-19 22:12:54 +02:00
Michael Niedermayer
63feed1519
avcodec/vp8: Forward return of ff_vpx_init_range_decoder()
Fixes: CID1507483 Unchecked return value

Sponsored-by: Sovereign Tech Fund
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-05-19 22:12:54 +02:00