2250 Commits

Author SHA1 Message Date
Michael Niedermayer
c0bb8e0e62
swscale/utils: Fix xInc overflow
Fixes: signed integer overflow: 2 * 1073741824 cannot be represented in type 'int'
Fixes: 67802/clusterfuzz-testcase-minimized-ffmpeg_SWS_fuzzer-6249515855183872

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
(cherry picked from commit 1a9eda65d027e0167f7363e0514f71311ac5d8d1)
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-04-14 22:12:50 +02:00
Michael Niedermayer
2c2dbde45e
libswscale/utils: Fix bayer to yuvj
Fixes: out of array access.

Earlier code assumes that a unscaled bayer to yuvj420 converter exists
but the later code then skips yuvj420

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
(cherry picked from commit e9cc9e492f987ce23ce8c514258a17952dd20401)
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-04-14 22:12:43 +02:00
Michael Niedermayer
22eee37d23
swscale/swscale: Check srcSliceH for bayer
Fixes: Assertion srcSliceH > 1 failed at libswscale/swscale_unscaled.c:1359
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
(cherry picked from commit 64098d0cd8ab1d27f78a335ca684f00a419b2160)
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-04-14 22:12:43 +02:00
Michael Niedermayer
84ed2a2b5a
swscale/utils: Allocate more dithererror
Fixes: out of array read
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
(cherry picked from commit 18f26f8a2f8dc3b9ec3ac3ab8e03fce15cc8c88d)
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-04-14 22:12:43 +02:00
Michael Niedermayer
a604063ede
swscale/input: Use more unsigned intermediates
Same principle as previous commit, with sufficiently huge rgb2yuv table
values this produces wrong results and undefined behavior.
The unsigned produces the same incorrect results. That is probably
ok as these cases with huge values seem not to occur in any real
use case.

Fixes: signed integer overflow
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
(cherry picked from commit ba209e3d5142fd31bb6c3e05c5b183118a278afc)
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-04-21 01:55:09 +02:00
Michael Niedermayer
87df8385b8
swscale/output: Bias 16bps output calculations to improve non overflowing range
Fixes: integer overflow
Fixes: ./ffmpeg   -f rawvideo -video_size 66x64 -pixel_format yuva420p10le   -i ~/videos/overflow_input_w66h64.yuva420p10le   -filter_complex "scale=flags=bicubic+full_chroma_int+full_chroma_inp+bitexact+accurate_rnd:in_color_matrix=bt2020:out_color_matrix=bt2020:in_range=full:out_range=full,format=rgba64[out]"   -pixel_format rgba64 -map '[out]'   -y overflow_w66h64.png

Found-by: Drew Dunne <asdunne@google.com>
Tested-by: Drew Dunne <asdunne@google.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
(cherry picked from commit 0f0afc7fb5d30c40108d81b320823d8f5c9fbedc)
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-04-21 01:55:08 +02:00
Martin Storsjö
9d5450b514 swscale: aarch64: Fix yuv2rgb with negative strides
Treat the 32 bit stride registers as signed.

Alternatively, we could make the stride arguments ptrdiff_t instead
of int, and changing all of the assembly to operate on these
registers with their full 64 bit width, but that's a much larger
and more intrusive change (and risks missing some operation, which
would clamp the intermediates to 32 bit still).

Fixes: https://trac.ffmpeg.org/ticket/9985

Signed-off-by: Martin Storsjö <martin@martin.st>
(cherry picked from commit cb803a0072cb98945dcd3f1660bd2a975650ce42)
Signed-off-by: Martin Storsjö <martin@martin.st>
2022-11-04 14:32:34 +02:00
Michael Niedermayer
ff87b7bd2f swscale/alphablend: Fix slice handling
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
(cherry picked from commit 06d67265881249566f385309e2fb5a9449720b6e)
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-10-06 13:59:34 +02:00
Michael Niedermayer
d3f9206997 swscale/slice: Fix wrong return on error
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
(cherry picked from commit 7874d40f10cca922797a8da14189a53ee52f0156)
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-10-06 13:54:16 +02:00
Michael Niedermayer
b72df5e492 swscale/slice: Check slice for allocation failure
Fixes: null pointer dereference
Fixes: alloc_slice.mp4

Found-by: Rafael Dutra <rafael.dutra@cispa.de>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
(cherry picked from commit 997f9cfc1295769be8d3180860ceebbc16f59069)
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2021-10-06 13:54:16 +02:00
Andreas Rheinhardt
4b75d960b6 swscale/utils: Fix invalid left shifts of negative numbers
Affected the FATE-tests vsynth_lena-dv-411, vsynth1-dv-411,
vsynth2-dv-411 and hevc-paramchange-yuv420p.yuv420p10.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
(cherry picked from commit e2646e23be69bdef1e41d4decee1a4298701b8d1)
2020-05-20 03:44:07 +02:00
Andreas Rheinhardt
b694403ef9 swscale/x86/swscale: Fix undefined left shifts of negative numbers
This affected many FATE-tests: The number of failing tests went down
from 663 to 344. (Both numbers exclude tests that failed because of
unaligned accesses in code that is inside #if HAVE_FAST_UNALIGNED.)

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
(cherry picked from commit 736c7c20e7819811dc59f43490563789b192eb6e)
2020-05-20 03:42:42 +02:00
Michael Niedermayer
01628af26d swscale/yuv2rgb: Fix vertical dither offset with slices
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
(cherry picked from commit be3c29e3795cb2499e3b96335286d6a8423c0bcf)
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-05-19 17:17:36 +02:00
Michael Niedermayer
c3b5c1423e swscale/output: Fix integer overflow in yuv2rgb_write_full() with out of range input
Fixes: signed integer overflow: 1169365504 + 981452800 cannot be represented in type 'int'
Fixes: ticket8293

Found-by: Suhwan
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
(cherry picked from commit e057e83a4ff4c0eeeb78dffe58e21af951c056b6)
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-05-19 17:17:36 +02:00
Michael Niedermayer
824c773263 swscale/output: Fix integer overflow in alpha computation in yuv2gbrp16_full_X_c()
Fixes: signed integer overflow: 524280 * 4432 cannot be represented in type 'int'
Fixes: ticket8322

Found-by: Suhwan
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
(cherry picked from commit 49ba1879add99d3f64d70d34fb0255c8a49d4b28)
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-05-19 17:17:36 +02:00
Michael Niedermayer
9430ad3e21 swscale/input: Fix several invalid shifts related to rgb2yuv constants
Fixes: Invalid shifts
Fixes: #8140
Fixes: #8146

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
(cherry picked from commit d48e510124d0fea24e2ec27271687c92e4428a18)
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-05-19 17:17:35 +02:00
Michael Niedermayer
ea7a818c95 swscale/output: Fix several invalid shifts in yuv2rgb_full_1_c_template()
Fixes: Invalid shifts
Fixes: #8320

Reviewed-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
(cherry picked from commit 7b7f97532b2ac8836d8d8e3c71dd026e35ae1ca7)
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-05-19 17:17:35 +02:00
Michael Niedermayer
8a9c9711cf swscale/swscale: Fix several invalid shifts related to vChrDrop
Fixes: Invalid shifts
Fixes: #8166
Fixes: filter-crop_scale_vflip FATE-test

Reviewed-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
(cherry picked from commit a6ca22c11834c0ff075592e3f051d41068c407db)
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2020-05-19 17:17:35 +02:00
Michael Niedermayer
22db337a40 Bump minor versions to separate 4.2 from master
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-07-21 18:36:18 +02:00
Michael Niedermayer
9d269301f0 swscale/tests/swscale: Lengthen pixfmt name buffer to 21 bytes
Some formats use longer names than 12.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-05-13 13:39:49 +02:00
Adam Richter
b8ed493061 libswcale: Fix possible string overflow in test.
In libswcale/tests/swcale.c, the function fileTest() calls sscanf in
an argument of "%12s" on character srcStr[] and dstStr[], which are
only 12 bytes.  So, if the input string is 12 characters, a
terminating null byte can be written past the end of these arrays.

This bug was found by cppcheck.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-05-13 13:39:40 +02:00
Philip Langdale
4fa4f1d7a9 swscale: Add test for isSemiPlanarYUV to pixdesc_query
Lauri had asked me what the semi planar formats were and that reminded
me that we could add it to pixdesc_query so we know exactly what the
list is.
2019-05-12 07:51:02 -07:00
Philip Langdale
cd48318035 swscale: Add support for NV24 and NV42
The implementation is pretty straight-forward. Most of the existing
NV12 codepaths work regardless of subsampling and are re-used as is.
Where necessary I wrote the slightly different NV24 versions.

Finally, the one thing that confused me for a long time was the
asm specific x86 path that did an explicit exclusion check for NV12.
I replaced that with a semi-planar check and also updated the
equivalent PPC code, which Lauri kindly checked.
2019-05-12 07:51:02 -07:00
Lauri Kasanen
e25bddf5fc swscale/ppc: Shorten power8 tests via a var 2019-05-07 10:08:16 +03:00
Lauri Kasanen
a2a16206aa swscale/ppc: VSX-optimize hScale16To*
./ffmpeg -loop 1 -s 1200x1440 -i tux16.png \
    -s 2400x720 -f rawvideo -y -vframes 5 -pix_fmt yuv420p16le -nostats test.raw

./ffmpeg -loop 1 -s 1200x1440 -i tux16.png \
    -s 2400x720 -f rawvideo -y -vframes 5 -pix_fmt yuv420p -nostats test.raw

32-bit mul, power8 only

2x speedup for hScale8To19_vsx (x86 SSE2 is 2.37):
  30896 UNITS in hscale,    8192 runs,      0 skips
  63956 UNITS in hscale,    8192 runs,      0 skips

2.06 for hScale16To15_vsx:
  30531 UNITS in hscale,    8192 runs,      0 skips
  63161 UNITS in hscale,    8192 runs,      0 skips
2019-05-07 10:08:16 +03:00
Lauri Kasanen
3437111f17 swscale/ppc: Indent 2019-05-07 10:08:16 +03:00
Lauri Kasanen
9456adc223 swscale/ppc: VSX-optimize hScale8To19
./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 \
    -s 2400x720 -f rawvideo -y -vframes 5 -pix_fmt yuv420p16le -nostats test.raw

2.26 speedup (x86 SSE2 is 2.32):
  23772 UNITS in hscale,    4096 runs,      0 skips
  53862 UNITS in hscale,    4096 runs,      0 skips
2019-05-07 10:08:16 +03:00
Lauri Kasanen
d0e4d0429e swscale/ppc: VSX-optimize hscale_fast
./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags fast_bilinear \
        -s 2400x720 -f rawvideo -vframes 5 -pix_fmt abgr -nostats test.raw

4.27 speedup for hyscale_fast:
  24796 UNITS in hyscale_fast,    4096 runs,      0 skips
   5797 UNITS in hyscale_fast,    4096 runs,      0 skips

4.48 speedup for hcscale_fast:
  19911 UNITS in hcscale_fast,    4095 runs,      1 skips
   4437 UNITS in hcscale_fast,    4096 runs,      0 skips
2019-04-30 14:41:28 +03:00
Lauri Kasanen
ce92ee4b4f swscale/ppc: VSX-optimize non-full-chroma yuv2rgb_2
./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags fast_bilinear \
        -s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \
        -cpuflags 0 -v error -

32-bit mul, power8 only.

~2x speedup:

rgb24
  24431 UNITS in yuv2packed2,   16384 runs,      0 skips
  13783 UNITS in yuv2packed2,   16383 runs,      1 skips
bgr24
  24396 UNITS in yuv2packed2,   16384 runs,      0 skips
  14059 UNITS in yuv2packed2,   16384 runs,      0 skips
rgba
  26815 UNITS in yuv2packed2,   16383 runs,      1 skips
  12797 UNITS in yuv2packed2,   16383 runs,      1 skips
bgra
  27060 UNITS in yuv2packed2,   16384 runs,      0 skips
  13138 UNITS in yuv2packed2,   16384 runs,      0 skips
argb
  26998 UNITS in yuv2packed2,   16384 runs,      0 skips
  12728 UNITS in yuv2packed2,   16381 runs,      3 skips
bgra
  26651 UNITS in yuv2packed2,   16384 runs,      0 skips
  13124 UNITS in yuv2packed2,   16384 runs,      0 skips

This is a low speedup, but the x86 mmx version also gets only ~2x. The mmx version
is also heavily inaccurate, while the vsx version has high accuracy.
2019-04-11 09:08:51 +03:00
Lauri Kasanen
8607e29fa3 swscale/ppc: VSX-optimize yuv2rgb_full_X
./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 \
                -s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \
                -cpuflags 0 -v error -

32-bit mul, power8 only.

~6.4x speedup:

rgb24
 214278 UNITS in yuv2packedX,   16384 runs,      0 skips
  33249 UNITS in yuv2packedX,   16384 runs,      0 skips
bgr24
 214616 UNITS in yuv2packedX,   16384 runs,      0 skips
  33233 UNITS in yuv2packedX,   16384 runs,      0 skips
rgba
 214517 UNITS in yuv2packedX,   16384 runs,      0 skips
  33271 UNITS in yuv2packedX,   16384 runs,      0 skips
bgra
 214973 UNITS in yuv2packedX,   16384 runs,      0 skips
  33397 UNITS in yuv2packedX,   16384 runs,      0 skips
argb
 214613 UNITS in yuv2packedX,   16384 runs,      0 skips
  33310 UNITS in yuv2packedX,   16384 runs,      0 skips
bgra
 214637 UNITS in yuv2packedX,   16384 runs,      0 skips
  33330 UNITS in yuv2packedX,   16384 runs,      0 skips
2019-04-07 09:20:34 +03:00
Lauri Kasanen
3256e949be swscale/ppc: VSX-optimize yuv2rgb_full_2
./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags area \
            -s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \
            -cpuflags 0 -v error -

32-bit mul, power8 only.

~4x speedup:

rgb24
  52763 UNITS in yuv2packed2,   16384 runs,      0 skips
  13453 UNITS in yuv2packed2,   16384 runs,      0 skips
bgr24
  53144 UNITS in yuv2packed2,   16384 runs,      0 skips
  13616 UNITS in yuv2packed2,   16384 runs,      0 skips
rgba
  52796 UNITS in yuv2packed2,   16384 runs,      0 skips
  12904 UNITS in yuv2packed2,   16384 runs,      0 skips
bgra
  52732 UNITS in yuv2packed2,   16384 runs,      0 skips
  13262 UNITS in yuv2packed2,   16384 runs,      0 skips
argb
  52661 UNITS in yuv2packed2,   16384 runs,      0 skips
  12879 UNITS in yuv2packed2,   16384 runs,      0 skips
bgra
  52662 UNITS in yuv2packed2,   16384 runs,      0 skips
  12932 UNITS in yuv2packed2,   16384 runs,      0 skips
2019-04-07 09:20:33 +03:00
Lauri Kasanen
50e672bc54 swscale/ppc: VSX-optimize non-full-chroma yuv2rgb_1
./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags fast_bilinear \
        -s 1200x1440 -f null -vframes 100 -pix_fmt $i -nostats \
        -cpuflags 0 -v error -

32-bit mul, power8 only.

1.8-2.3x speedup:

rgb24
  18192 UNITS in yuv2packed1,   32767 runs,      1 skips
   9983 UNITS in yuv2packed1,   32760 runs,      8 skips
bgr24
  18665 UNITS in yuv2packed1,   32766 runs,      2 skips
   9925 UNITS in yuv2packed1,   32763 runs,      5 skips
rgba
  20239 UNITS in yuv2packed1,   32767 runs,      1 skips
   8794 UNITS in yuv2packed1,   32759 runs,      9 skips
bgra
  20354 UNITS in yuv2packed1,   32768 runs,      0 skips
   8770 UNITS in yuv2packed1,   32761 runs,      7 skips
argb
  20185 UNITS in yuv2packed1,   32768 runs,      0 skips
   8761 UNITS in yuv2packed1,   32761 runs,      7 skips
bgra
  20360 UNITS in yuv2packed1,   32766 runs,      2 skips
   8759 UNITS in yuv2packed1,   32764 runs,      4 skips

This is a low speedup, but the x86 mmx version also gets only ~2x. The mmx version
is also heavily inaccurate, while the vsx version has high accuracy.
2019-04-07 09:20:31 +03:00
Lauri Kasanen
7adce3e64c swscale/ppc: VSX-optimize yuv2422_X
./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 \
          -s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \
          -cpuflags 0 -v error -

7.2x speedup:

yuyv422
 126354 UNITS in yuv2packedX,   16384 runs,      0 skips
  16383 UNITS in yuv2packedX,   16382 runs,      2 skips
yvyu422
 117669 UNITS in yuv2packedX,   16384 runs,      0 skips
  16271 UNITS in yuv2packedX,   16379 runs,      5 skips
uyvy422
 117310 UNITS in yuv2packedX,   16384 runs,      0 skips
  16226 UNITS in yuv2packedX,   16382 runs,      2 skips
2019-03-31 12:41:34 +03:00
Lauri Kasanen
9a2db4dc61 swscale/ppc: VSX-optimize yuv2422_2
./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags area \
                -s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \
                -cpuflags 0 -v error -

5.1x speedup:

yuyv422
  19339 UNITS in yuv2packed2,   16384 runs,      0 skips
   3718 UNITS in yuv2packed2,   16383 runs,      1 skips
yvyu422
  19438 UNITS in yuv2packed2,   16384 runs,      0 skips
   3800 UNITS in yuv2packed2,   16380 runs,      4 skips
uyvy422
  19128 UNITS in yuv2packed2,   16384 runs,      0 skips
   3721 UNITS in yuv2packed2,   16380 runs,      4 skips
2019-03-31 12:41:33 +03:00
Lauri Kasanen
a6a31ca3d9 swscale/ppc: VSX-optimize yuv2422_1
./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 \
            -s 1200x1440 -f null -vframes 100 -pix_fmt $i -nostats \
            -cpuflags 0 -v error -

15.3x speedup:

yuyv422
  14513 UNITS in yuv2packed1,   32768 runs,      0 skips
    949 UNITS in yuv2packed1,   32767 runs,      1 skips
yvyu422
  14516 UNITS in yuv2packed1,   32767 runs,      1 skips
    943 UNITS in yuv2packed1,   32767 runs,      1 skips
uyvy422
  14530 UNITS in yuv2packed1,   32767 runs,      1 skips
    941 UNITS in yuv2packed1,   32766 runs,      2 skips
2019-03-31 12:41:32 +03:00
Michael Niedermayer
8865ae959b swscale/swscale_unscaled: Fix chroma slice height
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-03-28 22:47:32 +01:00
Dong, Jerry
c47fada298 swscale/swscale_unscaled: fixed the issue that when width/height is not 2-multiple, transition of nv12 to u/v planes is not completed.
Signed-off-by: Dong, Jerry <jerry.dong@intel.com>
Signed-off-by: Decai Lin <decai.lin@intel.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-03-28 20:28:43 +01:00
Lauri Kasanen
681957b88d swscale/ppc: VSX-optimize yuv2rgb_full
./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 \
        -s 1200x1440 -f null -vframes 100 -pix_fmt $i -nostats \
        -cpuflags 0 -v error -

This uses 32-bit mul, so POWER8 only.

The following output formats get about 4.5x speedup:

rgb24
  39980 UNITS in yuv2packed1,   32768 runs,      0 skips
   8774 UNITS in yuv2packed1,   32768 runs,      0 skips
bgr24
  40069 UNITS in yuv2packed1,   32768 runs,      0 skips
   8772 UNITS in yuv2packed1,   32766 runs,      2 skips
rgba
  39759 UNITS in yuv2packed1,   32768 runs,      0 skips
   8681 UNITS in yuv2packed1,   32767 runs,      1 skips
bgra
  39729 UNITS in yuv2packed1,   32768 runs,      0 skips
   8696 UNITS in yuv2packed1,   32766 runs,      2 skips
argb
  39766 UNITS in yuv2packed1,   32768 runs,      0 skips
   8672 UNITS in yuv2packed1,   32766 runs,      2 skips
bgra
  39784 UNITS in yuv2packed1,   32768 runs,      0 skips
   8659 UNITS in yuv2packed1,   32767 runs,      1 skips
2019-03-27 09:05:08 +02:00
Lauri Kasanen
81a4719d8e swscale: Remove duplicated code
In this function, the exact same clamping happens both in the if and unconditionally.
2019-03-27 09:00:06 +02:00
Lauri Kasanen
6b5ea90eac swscale/ppc: Add av_unused to template vars only used in one includer 2019-03-20 10:21:55 +02:00
Lauri Kasanen
ac3062f1a4 swscale/ppc: Clean up some mixed decl warnings 2019-03-20 10:21:53 +02:00
Lauri Kasanen
8522d219ce libswscale/ppc: VSX-optimize 9-16 bit yuv2planeX
./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt yuv420p16be \
-s 1920x1728 -f null -vframes 100 -v error -nostats -

9-14 bit funcs get about 6x speedup, 16-bit gets about 15x.
Fate passes, each format tested with an image to video conversion.

Only POWER8 includes 32-bit vector multiplies, so POWER7 is locked out
of the 16-bit function. This includes the vec_mulo/mule functions too,
not just vmuluwm.

With TIMER_REPORT skips disabled:
yuv420p9le
  12412 UNITS in planarX,  131072 runs,      0 skips
  73136 UNITS in planarX,  131072 runs,      0 skips
yuv420p9be
  12481 UNITS in planarX,  131072 runs,      0 skips
  73410 UNITS in planarX,  131072 runs,      0 skips
yuv420p10le
  12322 UNITS in planarX,  131072 runs,      0 skips
  72546 UNITS in planarX,  131072 runs,      0 skips
yuv420p10be
  12291 UNITS in planarX,  131072 runs,      0 skips
  72935 UNITS in planarX,  131072 runs,      0 skips
yuv420p12le
  12316 UNITS in planarX,  131072 runs,      0 skips
  72708 UNITS in planarX,  131072 runs,      0 skips
yuv420p12be
  12319 UNITS in planarX,  131072 runs,      0 skips
  72577 UNITS in planarX,  131072 runs,      0 skips
yuv420p14le
  12259 UNITS in planarX,  131072 runs,      0 skips
  72516 UNITS in planarX,  131072 runs,      0 skips
yuv420p14be
  12440 UNITS in planarX,  131072 runs,      0 skips
  72962 UNITS in planarX,  131072 runs,      0 skips
yuv420p16le
  10548 UNITS in planarX,  131072 runs,      0 skips
  73429 UNITS in planarX,  131072 runs,      0 skips
yuv420p16be
  10634 UNITS in planarX,  131072 runs,      0 skips
 150959 UNITS in planarX,  131072 runs,      0 skips

Signed-off-by: Lauri Kasanen <cand@gmx.com>
2019-02-05 09:34:53 +02:00
Michael Niedermayer
fe17f9b956 swscale/yuv2rgb: Return a more specific error code from ff_yuv2rgb_c_init_tables()
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-01-01 21:11:47 +01:00
Lauri Kasanen
8dd9df9ecd swscale/output: Altivec-optimize float yuv2plane1
This function wouldn't benefit from VSX instructions, so I put it
under altivec.

./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt grayf32le \
-f null -vframes 100 -v error -nostats -

3743 UNITS in planar1,   65495 runs,     41 skips

-cpuflags 0

23511 UNITS in planar1,   65530 runs,      6 skips

grayf32be

4647 UNITS in planar1,   65449 runs,     87 skips

-cpuflags 0

28608 UNITS in planar1,   65530 runs,      6 skips

The native speedup is 6.28133, and the bswapping one 6.15623.
Fate passes, each format tested with an image to video conversion.

Signed-off-by: Lauri Kasanen <cand@gmx.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2018-12-26 20:28:58 +01:00
Lauri Kasanen
b4c8c03b00 swscale/output: VSX-optimize 16-bit yuv2plane1
./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt yuv420p16le \
-f null -vframes 100 -v error -nostats -

2120 UNITS in planar1,   65393 runs,    143 skips

-cpuflags 0

19157 UNITS in planar1,   65512 runs,     24 skips

9.03632 speedup, 16be similarly.

Fate passes, each format tested with an image to video conversion.

Signed-off-by: Lauri Kasanen <cand@gmx.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2018-12-14 19:09:11 +01:00
Lauri Kasanen
1046cba24b swscale/output: VSX-optimize nbps yuv2plane1
./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt yuv420p9le \
-f null -vframes 100 -v error -nostats -

Speedups:
yuv2plane1_9BE_vsx	11.2042
yuv2plane1_9LE_vsx	11.156
yuv2plane1_10BE_vsx	9.89428
yuv2plane1_10LE_vsx	10.3637
yuv2plane1_12BE_vsx	9.71923
yuv2plane1_12LE_vsx	11.0404
yuv2plane1_14BE_vsx	10.1763
yuv2plane1_14LE_vsx	11.2728

Fate passes, each format tested with an image to video conversion.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2018-12-12 01:56:57 +01:00
Lauri Kasanen
78c7ff7d25 swscale/ppc: Move VSX-using code to its own file
Passes fate on LE (with "lavc/jrevdct: Avoid an aliasing violation" applied).

Signed-off-by: Lauri Kasanen <cand@gmx.com>
Tested-by: Michael Kostylev on BE
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2018-12-04 02:59:07 +01:00
Lauri Kasanen
46c5693ea3 swscale/output: Altivec-optimize yuv2plane1_8
./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt yuv420p \
-f null -vframes 100 -v error -nostats -

1158 UNITS in planar1,   65528 runs,      8 skips

-cpuflags 0

19082 UNITS in planar1,   65533 runs,      3 skips

16.48 speedup ratio. On x86, SSE2 is ~7. Curiously, the Power C version
takes as many cycles as the x86 SSE2 version, yikes it's fast.

Note that this function uses VSX instructions, but is not marked so.
This is because several existing functions also make that mistake.
I'll submit a patch moving them once this is reviewed.

Signed-off-by: Lauri Kasanen <cand@gmx.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2018-11-26 02:56:25 +01:00
Martin Vignali
86e6f0dbc7 swscale : add support for YUVA444P12 and YUVA422P12 2018-11-24 16:24:47 +01:00
Michael Niedermayer
517573a670 Bump minor version for master after 4.1 branchpoint
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2018-11-02 00:53:07 +01:00