Adds a diff_pixels_unaligned()
Fixes: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=872503
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
(cherry picked from commit bc488ec28aec4bc91ba47283c49c9f7f25696eaa)
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This can overread (either before start or beyond end) of the buffer in
Nx1 (i.e. height=1) images.
Fixes mozilla bug 1240080.
(cherry picked from commit 0f88b3f82fafd536979993aeaafcb11a22266dbd)
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes trac ticket 3226. Also see Andreas' analysis in
https://bugs.debian.org/801745, which was very helpful.
(cherry picked from commit 52f84d82bdf1851ecfcc412c1719e5f6f3396209)
This is faster and simpler as well
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
(cherry picked from commit d79f7bf0d63a81ee66026ee92a6946a7303d04bd)
Conflicts:
libavcodec/x86/cavsdsp.c
Simplifies the code and makes it build on certain compilers
running out of registers on x86.
CC: libav-stable@libav.org
Reported-By: mudler
(cherry picked from commit e4610300de6869bd6b3b00e76cfeabb6d7653dcd)
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Based on patch by Francisco Blas Izquierdo Riera
Commit message partly taken from carl
fixes a compilation
error in mlpdsp_init.c with -fstack-check and some gcc compilers (I
reproduced the issue with gcc 4.7.3) by simplifying the code.
See also https://bugs.gentoo.org/show_bug.cgi?id=471756
$ make libavcodec/x86/mlpdsp_init.o
libavcodec/x86/mlpdsp_init.c: In function ‘mlp_filter_channel_x86’:
libavcodec/x86/mlpdsp_init.c:142:5: error: can’t find a register in
class ‘GENERAL_REGS’ while reloading ‘asm’
libavcodec/x86/mlpdsp_init.c:142:5: error: ‘asm’ operand has impossible
constraints
4551 -> 4509 dezicycles
Reviewed-by: Ramiro Polla <ramiro.polla@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
(cherry picked from commit 03f39fbb2a558153a3c464edec1378d637a755fe)
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This lets the cglobal macro automatically append a suffix to the function name.
This means that INIT_XMM avx must be used rather than INIT_AVX.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit '95c0cec03acec0a80cc1c7db48f3b2355d9e767b':
idctdsp: Add global function pointers for {add|put}_pixels_clamped functions
Conflicts:
libavcodec/arm/idctdsp_init_arm.c
libavcodec/dct.h
libavcodec/idctdsp.c
libavcodec/jrevdct.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
These function pointers already existed in the ARM code. Adding them globally
allows calls to the function pointers to access arch-optimized versions of the
functions transparently.
* commit 'dcb7c868ec7af7d3a138b3254ef2e08f074d8ec5':
cosmetics: Make naming scheme of Xvid IDCT consistent with other IDCTs
Conflicts:
libavcodec/mpeg4videodec.c
libavcodec/x86/Makefile
libavcodec/x86/dct-test.c
libavcodec/x86/xvididct_sse2.c
libavcodec/xvididct.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
In some cases, 2 or 3 calls are performed to functions for unusual
widths. Instead, perform 2 calls for different widths to split the
workload.
The 8+16 and 4+8 widths for respectively 8 and more than 8 bits can't
be processed that way without modifications: some calls use unaligned
buffers, and having branches to handle this was resulting in no
micro-benchmark benefit.
For block_w == 12 (around 1% of the pixels of the sequence):
Before:
12758 decicycles in epel_uni, 4093 runs, 3 skips
19389 decicycles in qpel_uni, 8187 runs, 5 skips
22699 decicycles in epel_bi, 32743 runs, 25 skips
34736 decicycles in qpel_bi, 32733 runs, 35 skips
After:
11929 decicycles in epel_uni, 4096 runs, 0 skips
18131 decicycles in qpel_uni, 8184 runs, 8 skips
20065 decicycles in epel_bi, 32750 runs, 18 skips
31458 decicycles in qpel_bi, 32753 runs, 15 skips
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* Reduced xmm register count to 7 (As such they are now enabled for x86_32).
* Removed four movdqa (affects the sse2 version only).
* pxor is now used to clear m0 only once.
~5% faster.
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
* commit 'efd26bedec9a345a5960dbfcbaec888418f2d4e6':
build: Add explanatory comments to (optimization) blocks in the Makefiles
Conflicts:
libavcodec/ppc/Makefile
libavcodec/x86/Makefile
Merged-by: Michael Niedermayer <michaelni@gmx.at>
It now does 12 samples per iteration, up from 4.
From 1.8 to 3.2 times faster again. 3.6 to 5.7 times faster overall.
Runtime is reduced by a further 2 to 18%. Overall runtime reduced by
4 to 50%.
Same conditions as before apply.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
From 1.8 to 2.4 times faster. Runtime is reduced by 2 to 39%. The
speed-up generally increases with compression_level.
This lpc encoder is not used with levels < 3 so it provides no speed-up
in these cases.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Fixes mismatch in first keyframe in sample
ffvp9_fails_where_libvpx.succeeds.webm from ticket 3849. There's still
a second mismatch a few frames into the sample.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit '84d173d3de97c753234ab0c0b50551d51413d663':
xvididct: Ensure that the scantable permutation is always set correctly
Merged-by: Michael Niedermayer <michaelni@gmx.at>