aacenc: add SIMD optimizations for abs_pow34 and quantization

Performance improvements:

quant_bands:
with:     681 decicycles in quant_bands, 8388453 runs,    155 skips
without: 1190 decicycles in quant_bands, 8388386 runs,    222 skips
Around 42% for the function

Twoloop coder:

abs_pow34:
with/without: 7.82s/8.17s
Around 4% for the entire encoder

Both:
with/without: 7.15s/8.17s
Around 12% for the entire encoder

Fast coder:

abs_pow34:
with/without: 3.40s/3.77s
Around 10% for the entire encoder

Both:
with/without: 3.02s/3.77s
Around 20% faster for the entire encoder

Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
Tested-by: Michael Niedermayer <michael@niedermayer.cc>
Reviewed-by: James Almer <jamrial@gmail.com>
This commit is contained in:
Rostislav Pehlivanov
2016-10-08 15:59:14 +01:00
parent 3b02f6dd7b
commit d2ae5f77c6
13 changed files with 170 additions and 26 deletions

View File

@@ -70,7 +70,7 @@ static void codebook_trellis_rate(AACEncContext *s, SingleChannelElement *sce,
float next_minbits = INFINITY;
int next_mincb = 0;
abs_pow34_v(s->scoefs, sce->coeffs, 1024);
s->abs_pow34(s->scoefs, sce->coeffs, 1024);
start = win*128;
for (cb = 0; cb < CB_TOT_ALL; cb++) {
path[0][cb].cost = run_bits+4;