This will not overflow for normal values
Fixes: CID1500280 Unintentional integer overflow
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
(cherry picked from commit bfc22f364d31d8f2dc2acae1bd03d5894a00b8c5)
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: signed integer overflow: -1082982400 + -1079364728 cannot be represented in type 'int'
Fixes: 67910/clusterfuzz-testcase-minimized-ffmpeg_SWS_fuzzer-5329011971522560
The input is 9bit in 16bit, the fuzzer fills all 16bit thus generating "invalid" input
No overflow should happen with valid input.
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
(cherry picked from commit 1330a73ccadd855542ac4386f75fd72ff0ab5ea1)
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: signed integer overflow: -831176 * 9539 cannot be represented in type 'int'
Fixes: 67869/clusterfuzz-testcase-minimized-ffmpeg_SWS_fuzzer-5117342091640832
The input is 9bit in 16bit, the fuzzer fills all 16bit thus generating "invalid" input
No overflow should happen with valid input.
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
(cherry picked from commit a56559e688ffde40fcda5588123ffcb978da86d7)
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: signed integer overflow: 2 * 1073741824 cannot be represented in type 'int'
Fixes: 67802/clusterfuzz-testcase-minimized-ffmpeg_SWS_fuzzer-6249515855183872
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
(cherry picked from commit 1a9eda65d027e0167f7363e0514f71311ac5d8d1)
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: out of array access.
Earlier code assumes that a unscaled bayer to yuvj420 converter exists
but the later code then skips yuvj420
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
(cherry picked from commit e9cc9e492f987ce23ce8c514258a17952dd20401)
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: out of array read
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
(cherry picked from commit 18f26f8a2f8dc3b9ec3ac3ab8e03fce15cc8c88d)
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Same principle as previous commit, with sufficiently huge rgb2yuv table
values this produces wrong results and undefined behavior.
The unsigned produces the same incorrect results. That is probably
ok as these cases with huge values seem not to occur in any real
use case.
Fixes: signed integer overflow
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
(cherry picked from commit ba209e3d5142fd31bb6c3e05c5b183118a278afc)
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Treat the 32 bit stride registers as signed.
Alternatively, we could make the stride arguments ptrdiff_t instead
of int, and changing all of the assembly to operate on these
registers with their full 64 bit width, but that's a much larger
and more intrusive change (and risks missing some operation, which
would clamp the intermediates to 32 bit still).
Fixes: https://trac.ffmpeg.org/ticket/9985
Signed-off-by: Martin Storsjö <martin@martin.st>
(cherry picked from commit cb803a0072cb98945dcd3f1660bd2a975650ce42)
Signed-off-by: Martin Storsjö <martin@martin.st>
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
(cherry picked from commit 06d67265881249566f385309e2fb5a9449720b6e)
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
(cherry picked from commit 7874d40f10cca922797a8da14189a53ee52f0156)
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Regression since fc6a5883d6af8cae0e96af84dda0ad74b360a084 on SSSE3 enabled
CPUs.
Fixes ticket #8747
Signed-off-by: James Almer <jamrial@gmail.com>
(cherry picked from commit ba3e771a42c29ee02c34e7769cfc1b2dbc5c760a)
The NEON hscale function only supports X8 filter sizes and should only
be selected when these are being used. At the moment filterAlign is
set to 8 but in the future when extra NEON assembly for specific sizes is
added they will need to have checks here too.
The immediate usecase for this change is making the hscale checkasm
test easier and without NEON specific edge-cases (x86 already has these
guards).
This applies the same fix from 718c8f9aa59751bb490e2688acf2b5cb68fd5ad1
on the 32 bit arm version of the function, fixing fate-checkasm-sw_scale
there.
Signed-off-by: Martin Storsjö <martin@martin.st>
The NEON hscale function only supports X8 filter sizes and should only
be selected when these are being used. At the moment filterAlign is
set to 8 but in the future when extra NEON assembly for specific sizes is
added they will need to have checks here too.
The immediate usecase for this change is making the hscale checkasm
test easier and without NEON specific edge-cases (x86 already has these
guards).
Signed-off-by: Josh de Kock <josh@itanimul.li>
libswscale/vscale.c makes extensive use of function pointers and in
doing so it converts these function pointers to and from a pointer to
void. Yet this is actually against the C standard:
C90 only guarantees that one can convert a pointer to any incomplete
type or object type to void* and back with the result comparing equal
to the original which makes pointers to void generic pointers to
incomplete or object type. Yet C90 lacks a generic function pointer
type.
C99 additionally guarantees that a pointer to a function of one type may
be converted to a pointer to a function of another type with the result
and the original comparing equal when converting back.
This makes any function pointer type a generic function pointer type.
Yet even this does not make pointers to void generic function pointers.
Both GCC and Clang emit warnings for this when in pedantic mode.
This commit fixes this by using a union that can hold one member of any
of the required function pointer types to store the function pointer.
This works even for C90.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
The x18 is a reserved platform register on Darwin and Windows.
x8/w8 seems to be unused in this function though (and same about
x10 and x14), so there's really no reason to use x18 here - just change
the uses of x18/w18 into x8/w8 instead without any further rewrites.
Signed-off-by: Martin Storsjö <martin@martin.st>
Fixes: signed integer overflow: 1169365504 + 981452800 cannot be represented in type 'int'
Fixes: ticket8293
Found-by: Suhwan
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: signed integer overflow: 524280 * 4432 cannot be represented in type 'int'
Fixes: ticket8322
Found-by: Suhwan
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Add swscale input support for Y210LE, output support and fate
test could be added later if there is requirement for software
CSC to this packed format.
Signed-off-by: Linjie Fu <linjie.fu@intel.com>
Tested using this command:
/ffmpeg -pix_fmt yuv420p -s 1920*1080 -i ArashRawYuv420.yuv \
-vcodec rawvideo -s 1920*1080 -pix_fmt rgb24 -f null /dev/null
The fps increase from 389 to 640 on Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz
Signed-off-by: Ting Fu <ting.fu@intel.com>
Bug #8255 points out a double free error in libwscale/utils.c file.
The double free is because the pointer to cascaded_context of an
sw_context is not set to NULL after freeing it. When the sw_context
is later freed, sws_freeContext is called on the cascaded_context,
causing a double free.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The original inline assembly and nasm code have the same fps when called by command.
NASM code almost has no impact on the perfromance.
Signed-off-by: Ting Fu <ting.fu@intel.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This patch rewrites the innermost loop of ff_yuv2planeX_8_neon to avoid zips and
horizontal adds by using fused multiply adds. The patch also uses ld1r to load
one element and replicate it across all lanes of the vector. The patch also
improves the clipping code by removing the shift right instructions and
performing the shift with the shift-right narrow instructions.
I see 8% difference on an m6g instance with neoverse-n1 CPUs:
$ ffmpeg -nostats -f lavfi -i testsrc2=4k:d=2 -vf bench=start,scale=1024x1024,bench=stop -f null -
before: t:0.014015 avg:0.014096 max:0.015018 min:0.013971
after: t:0.012985 avg:0.013013 max:0.013996 min:0.012818
Tested with `make check` on aarch64-linux.
Signed-off-by: Sebastian Pop <spop@amazon.com>
Reviewed-by: Clément Bœsch <u@pkh.me>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This patch implements ff_hscale_8_to_15_neon with NEON fused multiply accumulate
and bumps the vectorization factor from 2 to 4.
The speedup is of 25% on Graviton1 A1 instances based on A-72 cpus:
$ ffmpeg -nostats -f lavfi -i testsrc2=4k:d=2 -vf bench=start,scale=1024x1024,bench=stop -f null -
before: t:0.040303 avg:0.040287 max:0.040371 min:0.039214
after: t:0.032168 avg:0.032215 max:0.033081 min:0.032146
The speedup is of 39% on Graviton2 m6g instances based on Neoverse-N1 cpus:
$ ffmpeg -nostats -f lavfi -i testsrc2=4k:d=2 -vf bench=start,scale=1024x1024,bench=stop -f null -
before: t:0.019446 avg:0.019423 max:0.019493 min:0.019181
after: t:0.014015 avg:0.014096 max:0.015018 min:0.013971
Tested with `make check` on aarch64-linux.
Signed-off-by: Sebastian Pop <spop@amazon.com>
Reviewed-by: Jean-Baptiste Kempf <jb@videolan.org>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This also reverts 21838cad2fc44023ad85e35d5c677e2f8d29a0ef
The revert is in this commit to avoid 2 fate updates
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The argument to vec_splat_u16 must be a literal. By making the
function always inline and marking the arguments const, gcc can
turn those into literals, and avoid build errors like:
swscale_vsx.c:165:53: error: argument 1 must be a 5-bit signed literal
Fixes#7861.
Signed-off-by: Daniel Kolesa <daniel@octaforge.org>
Signed-off-by: Lauri Kasanen <cand@gmx.com>
While this technically compiles in current ffmpeg, this is only
because ffmpeg is compiled in strict ISO C mode, which disables
the builtin 'vector' keyword for AltiVec/VSX. Instead this gets
replaced with a macro inside altivec.h, which defines vector to
be actually __vector, which accepts random types.
Normally, the vector keyword should be used only with plain
scalar non-typedef types, such as unsigned int. But we have the
vec_(s|u)(8|16|32) macros, which can be used in a portable manner,
in util_altivec.h in libavutil.
This is also consistent with other AltiVec/VSX code elsewhere in
the tree.
Fixes#7861.
Signed-off-by: Daniel Kolesa <daniel@octaforge.org>
Signed-off-by: Lauri Kasanen <cand@gmx.com>