Christophe Gisquet 110d0cdc9d rv40dsp x86: MMX/MMX2/3DNow/SSE2/SSSE3 implementations of MC
Code mostly inspired by vp8's MC, however:
- its MMX2 horizontal filter is worse because it can't take advantage of
  the coefficient redundancy
- that same coefficient redundancy allows better code for non-SSSE3 versions

Benchmark (rounded to tens of unit):
        V8x8  H8x8  2D8x8  V16x16  H16x16  2D16x16
C       445    358   985    1785    1559    3280
MMX*    219    271   478     714     929    1443
SSE2    131    158   294     425     515     892
SSSE3   120    122   248     387     390     763

End result is overall around a 15% speedup for SSSE3 version (on 6 sequences);
all loop filter functions now take around 55% of decoding time, while luma MC
dsp functions are around 6%, chroma ones are 1.3% and biweight around 2.3%.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2012-05-10 18:42:43 +02:00
2012-03-09 22:02:49 +01:00
2012-04-24 21:28:27 -04:00
2011-03-16 21:54:39 +01:00
2011-04-07 02:54:12 +02:00
2012-02-23 19:36:16 +01:00
2012-04-24 21:28:27 -04:00
2012-01-21 14:54:31 +01:00

Libav README
------------

1) Documentation
----------------

* Read the documentation in the doc/ directory.

2) Licensing
------------

* See the LICENSE file.
Description
No description provided
Readme 209 MiB
Languages
C 90.4%
Assembly 7.7%
Makefile 1.3%
C++ 0.2%
Objective-C 0.2%
Other 0.1%