avcodec/jpeg2000: replace naive pow call with smarter exp2fi
pow is a very wasteful function for this purpose. A low hanging fruit would be simply to replace with exp2f, and that does yield some speedup. However, there are 2 drawbacks of this: 1. It does not exploit the integer nature of the argument. 2. (minor) Some platforms lack a proper exp2f routine, making benefits available only to non broken libm. 3. exp2f does not solve the same issue that plagues pow, namely terrible worst case performance. This is a fundamental issue known as the "table-maker's dilemma" recognized by Prof. Kahan himself and subsequently elaborated and researched by many others. All this is clear from benchmarks below. This exploits the IEEE-754 format to get very good performance even in the worst case for integer powers of 2. This solves all the issues noted above. Function tested with clang usan over [-1000, 1000] (beyond range of relevance for this, which is [-255, 255]), patch itself with FATE. Benchmarks obtained on x86-64, Haswell, GNU-Linux via 10^5 iterations of the pow call, START/STOP, and command ffplay ~/samples/jpeg2000/chiens_dcinema2K.mxf. Low number of runs also given to prove the point about worst case: pow: 216270 decicycles in pow, 1 runs, 0 skips 110175 decicycles in pow, 2 runs, 0 skips 56085 decicycles in pow, 4 runs, 0 skips 29013 decicycles in pow, 8 runs, 0 skips 15472 decicycles in pow, 16 runs, 0 skips 8689 decicycles in pow, 32 runs, 0 skips 5295 decicycles in pow, 64 runs, 0 skips 3599 decicycles in pow, 128 runs, 0 skips 2748 decicycles in pow, 256 runs, 0 skips 2304 decicycles in pow, 511 runs, 1 skips 2072 decicycles in pow, 1022 runs, 2 skips 1963 decicycles in pow, 2044 runs, 4 skips 1894 decicycles in pow, 4091 runs, 5 skips 1860 decicycles in pow, 8184 runs, 8 skips exp2f: 134140 decicycles in pow, 1 runs, 0 skips 68110 decicycles in pow, 2 runs, 0 skips 34530 decicycles in pow, 4 runs, 0 skips 17677 decicycles in pow, 8 runs, 0 skips 9175 decicycles in pow, 16 runs, 0 skips 4931 decicycles in pow, 32 runs, 0 skips 2808 decicycles in pow, 64 runs, 0 skips 1747 decicycles in pow, 128 runs, 0 skips 1208 decicycles in pow, 256 runs, 0 skips 952 decicycles in pow, 512 runs, 0 skips 822 decicycles in pow, 1024 runs, 0 skips 765 decicycles in pow, 2047 runs, 1 skips 722 decicycles in pow, 4094 runs, 2 skips 693 decicycles in pow, 8190 runs, 2 skips exp2fi: 2740 decicycles in pow, 1 runs, 0 skips 1530 decicycles in pow, 2 runs, 0 skips 955 decicycles in pow, 4 runs, 0 skips 622 decicycles in pow, 8 runs, 0 skips 477 decicycles in pow, 16 runs, 0 skips 368 decicycles in pow, 32 runs, 0 skips 317 decicycles in pow, 64 runs, 0 skips 291 decicycles in pow, 128 runs, 0 skips 277 decicycles in pow, 256 runs, 0 skips 268 decicycles in pow, 512 runs, 0 skips 265 decicycles in pow, 1024 runs, 0 skips 263 decicycles in pow, 2048 runs, 0 skips 263 decicycles in pow, 4095 runs, 1 skips 260 decicycles in pow, 8191 runs, 1 skips Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
This commit is contained in:
parent
5b0da6999f
commit
42868ca569
@ -192,6 +192,21 @@ void ff_jpeg2000_set_significance(Jpeg2000T1Context *t1, int x, int y,
|
||||
|
||||
// static const uint8_t lut_gain[2][4] = { { 0, 0, 0, 0 }, { 0, 1, 1, 2 } }; (unused)
|
||||
|
||||
static inline float exp2fi(int x) {
|
||||
/* Normal range */
|
||||
if (-126 <= x && x <= 128)
|
||||
return av_int2float(x+127 << 23);
|
||||
/* Too large */
|
||||
else if (x > 128)
|
||||
return INFINITY;
|
||||
/* Subnormal numbers */
|
||||
else if (x > -150)
|
||||
return av_int2float(1 << (x+149));
|
||||
/* Negligibly small */
|
||||
else
|
||||
return 0;
|
||||
}
|
||||
|
||||
static void init_band_stepsize(AVCodecContext *avctx,
|
||||
Jpeg2000Band *band,
|
||||
Jpeg2000CodingStyle *codsty,
|
||||
@ -221,7 +236,7 @@ static void init_band_stepsize(AVCodecContext *avctx,
|
||||
* R_b = R_I + log2 (gain_b )
|
||||
* see ISO/IEC 15444-1:2002 E.1.1 eqn. E-3 and E-4 */
|
||||
gain = cbps;
|
||||
band->f_stepsize = pow(2.0, gain - qntsty->expn[gbandno]);
|
||||
band->f_stepsize = exp2fi(gain - qntsty->expn[gbandno]);
|
||||
band->f_stepsize *= qntsty->mant[gbandno] / 2048.0 + 1.0;
|
||||
break;
|
||||
default:
|
||||
|
Loading…
x
Reference in New Issue
Block a user