aacenc: add SIMD optimizations for abs_pow34 and quantization

Performance improvements: quant_bands: with: 681 decicycles in quant_bands, 8388453 runs, 155 skips without: 1190 decicycles in quant_bands, 8388386 runs, 222 skips Around 42% for the function Twoloop coder: abs_pow34: with/without: 7.82s/8.17s Around 4% for the entire encoder Both: with/without: 7.15s/8.17s Around 12% for the entire encoder Fast coder: abs_pow34: with/without: 3.40s/3.77s Around 10% for the entire encoder Both: with/without: 3.02s/3.77s Around 20% faster for the entire encoder Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com> Tested-by: Michael Niedermayer <michael@niedermayer.cc> Reviewed-by: James Almer <jamrial@gmail.com>
2016-10-08 15:59:14 +01:00
parent 3b02f6dd7b
commit d2ae5f77c6
13 changed files with 170 additions and 26 deletions
--- a/libavcodec/aaccoder_trellis.h
+++ b/libavcodec/aaccoder_trellis.h
@@ -70,7 +70,7 @@ static void codebook_trellis_rate(AACEncContext *s, SingleChannelElement *sce,
    float next_minbits = INFINITY;
    int next_mincb = 0;

-    abs_pow34_v(s->scoefs, sce->coeffs, 1024);
+    s->abs_pow34(s->scoefs, sce->coeffs, 1024);
    start = win*128;
    for (cb = 0; cb < CB_TOT_ALL; cb++) {
        path[0][cb].cost     = run_bits+4;