224 Commits

Author SHA1 Message Date
James Almer
2ac399d7fa Merge commit '58d154922707bfeb873cb3a7476e0f94b17463dd'
* commit '58d154922707bfeb873cb3a7476e0f94b17463dd':
  aarch64: vp8: Port epel4 functions from arm version

Merged-by: James Almer <jamrial@gmail.com>
2019-03-14 16:17:33 -03:00
James Almer
c6892f59eb Merge commit 'cc7ba00c35faf0478f1f56215e926f70ccb31282'
* commit 'cc7ba00c35faf0478f1f56215e926f70ccb31282':
  aarch64: vp8: Port missing epel8 functions from arm version

Merged-by: James Almer <jamrial@gmail.com>
2019-03-14 16:16:43 -03:00
James Almer
79025da3f2 Merge commit '52c9b0a6c0d02cff6caebcf6989e565e05b55200'
* commit '52c9b0a6c0d02cff6caebcf6989e565e05b55200':
  aarch64: vp8: Port vp8_luma_dc_wht and vp8_idct_dc_add4uv from arm version

Merged-by: James Almer <jamrial@gmail.com>
2019-03-14 16:14:40 -03:00
James Almer
39278ff0de Merge commit 'c513fcd7d235aa4cef45a6c3125bd4dcc03bf276'
* commit 'c513fcd7d235aa4cef45a6c3125bd4dcc03bf276':
  aarch64: vp8: Fix a typo in a comment

Merged-by: James Almer <jamrial@gmail.com>
2019-03-14 16:13:32 -03:00
James Almer
4f9a8d3fe2 Merge commit 'f1011ea28a4048ddec97794ca3e9901474fe055f'
* commit 'f1011ea28a4048ddec97794ca3e9901474fe055f':
  aarch64: vp8: Reorder the function pointer inits to match the arm original

Merged-by: James Almer <jamrial@gmail.com>
2019-03-14 16:09:11 -03:00
James Almer
398000abcf Merge commit '85bfaa4949f4afcde19061def3e8a18988964858'
* commit '85bfaa4949f4afcde19061def3e8a18988964858':
  aarch64: vp8: Use the proper aarch64 form for conditional branches

Merged-by: James Almer <jamrial@gmail.com>
2019-03-14 16:06:43 -03:00
James Almer
a2ae381b5a Merge commit '0801853e640624537db386727b36fa97aa6258e7'
* commit '0801853e640624537db386727b36fa97aa6258e7':
  libavcodec: vp8 neon optimizations for aarch64

See 833fed5253617924c41132e0ab261c1d8c076360

Merged-by: James Almer <jamrial@gmail.com>
2019-03-14 16:05:52 -03:00
Carl Eugen Hoyos
7e4d3dbe18 lavc/aarch64/h264dsp_init: Only use neon horizontal intra loopfilter for 4:2:0. 2019-02-20 23:56:21 +01:00
James Almer
aa844dc46f aarch64/h264dsp: change loop filter stride argument to ptrdiff_t
This was missed in d5d699ab6e6f8a8290748d107416fd5c19757a1b

Signed-off-by: James Almer <jamrial@gmail.com>
2019-02-20 19:38:46 -03:00
James Almer
e4e04dce1f Merge commit '28a8b5413b64b831dfb8650208bccd8b78360484'
* commit '28a8b5413b64b831dfb8650208bccd8b78360484':
  h264/aarch64: add intra loop filter neon asm

Merged-by: James Almer <jamrial@gmail.com>
2019-02-20 15:42:01 -03:00
James Almer
4dc1f06f0c Merge commit '846c3d6aca5484904e60946c4fe8b8833bc07f92'
* commit '846c3d6aca5484904e60946c4fe8b8833bc07f92':
  h264/aarch64: optimize neon loop filter

Merged-by: James Almer <jamrial@gmail.com>
2019-02-20 15:41:03 -03:00
James Almer
5ca7eb36b7 Merge commit 'bb515e3a735f526ccb1068031e289eb5aeb69e22'
* commit 'bb515e3a735f526ccb1068031e289eb5aeb69e22':
  h264/aarch64: sign extend int stride in loop filter asm

Merged-by: James Almer <jamrial@gmail.com>
2019-02-20 14:50:37 -03:00
Martin Storsjö
c8bc9d1380 aarch64: vp8: Move the vp8dsp makefile entries to the right places
Even if NEON would be disabled, the init functions should be built
as they are called as long as ARCH_AARCH64 is set.

These functions are part of a generic DSP subsytem, not tied directly
to one decoder. (They should be built if the vp7 decoder is enabled,
even if the vp8 decoder is disabled.)

Signed-off-by: Martin Storsjö <martin@martin.st>
(cherry picked from commit b4b27dce95a6d40bfcd78043d3abec7d80dae143)
2019-02-19 23:43:17 +02:00
Martin Storsjö
fecf75a5c4 aarch64: vp8: Remove superfluous includes
This fixes building with MSVC, which lacks unistd.h.

Signed-off-by: Martin Storsjö <martin@martin.st>
(cherry picked from commit ad32f7b1264dbc614f0db1c443d5361420e9e07e)
2019-02-19 23:42:16 +02:00
Martin Storsjö
7ddfa5e908 aarch64: vp8: Fix assembling with armasm64
Signed-off-by: Martin Storsjö <martin@martin.st>
(cherry picked from commit 2eeac79936e83c4495cbe5905064ab797e9b45ff)
2019-02-19 23:42:03 +02:00
Martin Storsjö
c950beb68d aarch64: vp8: Fix assembling with clang
This also partially fixes assembling with MS armasm64 (via
gas-preprocessor).

The movrel macro invocations need to pass the offset via a separate
parameter. Mach-o and COFF relocations don't allow a negative
offset to a symbol, which is handled properly if the offset is passed
via the parameter. If no offset parameter is given, the macro
evaluates to something like "adrp x17, subpel_filters-16+(0)", which
older clang versions also fail to parse (the older clang versions
only support one single offset term, although it can be a parenthesis.

Signed-off-by: Martin Storsjö <martin@martin.st>
(cherry picked from commit 26d7af4c381ee3c7b13b032b3817168b84b98ca6)
2019-02-19 23:41:47 +02:00
Martin Storsjö
58d1549227 aarch64: vp8: Port epel4 functions from arm version
Cortex A53    A72    A73
vp8_put_epel4_h4_c:        631.4  291.7  367.8
vp8_put_epel4_h4_neon:     241.0  131.0  155.7
vp8_put_epel4_h4v4_c:      967.5  529.3  667.7
vp8_put_epel4_h4v4_neon:   429.3  241.8  279.7
vp8_put_epel4_h4v6_c:     1374.7  657.5  864.5
vp8_put_epel4_h4v6_neon:   515.5  295.5  334.7
vp8_put_epel4_h6_c:        851.0  421.0  486.0
vp8_put_epel4_h6_neon:     321.5  195.0  217.7
vp8_put_epel4_h6v4_c:     1111.3  621.1  781.2
vp8_put_epel4_h6v4_neon:   539.2  328.0  365.3
vp8_put_epel4_h6v6_c:     1561.3  763.3  999.7
vp8_put_epel4_h6v6_neon:   645.5  401.0  434.7
vp8_put_epel4_v4_c:        663.8  298.3  357.0
vp8_put_epel4_v4_neon:     116.0   81.5   72.5
vp8_put_epel4_v6_c:        870.5  437.0  507.4
vp8_put_epel4_v6_neon:     147.7  108.8   92.0

Signed-off-by: Martin Storsjö <martin@martin.st>
2019-02-19 11:46:11 +02:00
Martin Storsjö
cc7ba00c35 aarch64: vp8: Port missing epel8 functions from arm version
Cortex A53     A72     A73
vp8_put_epel8_h4_c:       2594.8  1159.6  1374.8
vp8_put_epel8_h4_neon:     506.4   244.2   314.0
vp8_put_epel8_h6_c:       3445.8  1677.1  1811.3
vp8_put_epel8_h6_neon:     634.4   371.7   433.0
vp8_put_epel8_v4_c:       2614.0  1174.8  1378.0
vp8_put_epel8_v4_neon:     321.0   221.7   235.8
vp8_put_epel8_v6_c:       3635.5  1703.0  2079.2
vp8_put_epel8_v6_neon:     416.9   317.0   295.5

Signed-off-by: Martin Storsjö <martin@martin.st>
2019-02-19 11:46:08 +02:00
Martin Storsjö
52c9b0a6c0 aarch64: vp8: Port vp8_luma_dc_wht and vp8_idct_dc_add4uv from arm version
Cortex A53    A72    A73
vp8_luma_dc_wht_c:        115.7   75.7   90.7
vp8_luma_dc_wht_neon:      60.7   41.2   45.7
vp8_idct_dc_add4uv_c:     376.1  262.9  282.5
vp8_idct_dc_add4uv_neon:   52.0   29.0   37.0

Signed-off-by: Martin Storsjö <martin@martin.st>
2019-02-19 11:46:04 +02:00
Martin Storsjö
c513fcd7d2 aarch64: vp8: Fix a typo in a comment
Signed-off-by: Martin Storsjö <martin@martin.st>
2019-02-19 11:46:00 +02:00
Martin Storsjö
f1011ea28a aarch64: vp8: Reorder the function pointer inits to match the arm original
Signed-off-by: Martin Storsjö <martin@martin.st>
2019-02-19 11:45:56 +02:00
Martin Storsjö
b4b27dce95 aarch64: vp8: Move the vp8dsp makefile entries to the right places
Even if NEON would be disabled, the init functions should be built
as they are called as long as ARCH_AARCH64 is set.

These functions are part of a generic DSP subsytem, not tied directly
to one decoder. (They should be built if the vp7 decoder is enabled,
even if the vp8 decoder is disabled.)

Signed-off-by: Martin Storsjö <martin@martin.st>
2019-02-19 11:45:53 +02:00
Martin Storsjö
ad32f7b126 aarch64: vp8: Remove superfluous includes
This fixes building with MSVC, which lacks unistd.h.

Signed-off-by: Martin Storsjö <martin@martin.st>
2019-02-19 11:45:50 +02:00
Martin Storsjö
85bfaa4949 aarch64: vp8: Use the proper aarch64 form for conditional branches
The previous form also does seem to assemble on current tools,
but I think it might fail on some older aarch64 tools.

Signed-off-by: Martin Storsjö <martin@martin.st>
2019-02-19 11:45:47 +02:00
Martin Storsjö
2eeac79936 aarch64: vp8: Fix assembling with armasm64
Signed-off-by: Martin Storsjö <martin@martin.st>
2019-02-19 11:45:44 +02:00
Martin Storsjö
26d7af4c38 aarch64: vp8: Fix assembling with clang
This also partially fixes assembling with MS armasm64 (via
gas-preprocessor).

The movrel macro invocations need to pass the offset via a separate
parameter. Mach-o and COFF relocations don't allow a negative
offset to a symbol, which is handled properly if the offset is passed
via the parameter. If no offset parameter is given, the macro
evaluates to something like "adrp x17, subpel_filters-16+(0)", which
older clang versions also fail to parse (the older clang versions
only support one single offset term, although it can be a parenthesis.

Signed-off-by: Martin Storsjö <martin@martin.st>
2019-02-19 11:45:41 +02:00
Magnus Röös
0801853e64 libavcodec: vp8 neon optimizations for aarch64
Partial port of the ARM Neon for aarch64.

Benchmarks from fate:

benchmarking with Linux Perf Monitoring API
nop: 58.6
checkasm: using random seed 1760970128
NEON:
 - vp8dsp.idct       [OK]
 - vp8dsp.mc         [OK]
 - vp8dsp.loopfilter [OK]
checkasm: all 21 tests passed
vp8_idct_add_c: 201.6
vp8_idct_add_neon: 83.1
vp8_idct_dc_add_c: 107.6
vp8_idct_dc_add_neon: 33.8
vp8_idct_dc_add4y_c: 426.4
vp8_idct_dc_add4y_neon: 59.4
vp8_loop_filter8uv_h_c: 688.1
vp8_loop_filter8uv_h_neon: 216.3
vp8_loop_filter8uv_inner_h_c: 649.3
vp8_loop_filter8uv_inner_h_neon: 195.3
vp8_loop_filter8uv_inner_v_c: 544.8
vp8_loop_filter8uv_inner_v_neon: 131.3
vp8_loop_filter8uv_v_c: 706.1
vp8_loop_filter8uv_v_neon: 141.1
vp8_loop_filter16y_h_c: 668.8
vp8_loop_filter16y_h_neon: 242.8
vp8_loop_filter16y_inner_h_c: 647.3
vp8_loop_filter16y_inner_h_neon: 224.6
vp8_loop_filter16y_inner_v_c: 647.8
vp8_loop_filter16y_inner_v_neon: 128.8
vp8_loop_filter16y_v_c: 721.8
vp8_loop_filter16y_v_neon: 154.3
vp8_loop_filter_simple_h_c: 387.8
vp8_loop_filter_simple_h_neon: 187.6
vp8_loop_filter_simple_v_c: 384.1
vp8_loop_filter_simple_v_neon: 78.6
vp8_put_epel8_h4v4_c: 3971.1
vp8_put_epel8_h4v4_neon: 855.1
vp8_put_epel8_h4v6_c: 5060.1
vp8_put_epel8_h4v6_neon: 989.6
vp8_put_epel8_h6v4_c: 4320.8
vp8_put_epel8_h6v4_neon: 1007.3
vp8_put_epel8_h6v6_c: 5449.3
vp8_put_epel8_h6v6_neon: 1158.1
vp8_put_epel16_h6_c: 6683.8
vp8_put_epel16_h6_neon: 831.8
vp8_put_epel16_h6v6_c: 11110.8
vp8_put_epel16_h6v6_neon: 2214.8
vp8_put_epel16_v6_c: 7024.8
vp8_put_epel16_v6_neon: 799.6
vp8_put_pixels8_c: 112.8
vp8_put_pixels8_neon: 78.1
vp8_put_pixels16_c: 131.3
vp8_put_pixels16_neon: 129.8

This contains a fix to include guards by Carl Eugen Hoyos.

Signed-off-by: Martin Storsjö <martin@martin.st>
2019-02-19 11:45:33 +02:00
Carl Eugen Hoyos
ed20fbcd48 lavc/aarch64/vp8dsp: Fix the include guard.
Fixes fate-source.
2019-01-31 22:35:44 +01:00
Magnus Röös
833fed5253 libavcodec: vp8 neon optimizations for aarch64
Partial port of the ARM Neon for aarch64.

Benchmarks from fate:

benchmarking with Linux Perf Monitoring API
nop: 58.6
checkasm: using random seed 1760970128
NEON:
 - vp8dsp.idct       [OK]
 - vp8dsp.mc         [OK]
 - vp8dsp.loopfilter [OK]
checkasm: all 21 tests passed
vp8_idct_add_c: 201.6
vp8_idct_add_neon: 83.1
vp8_idct_dc_add_c: 107.6
vp8_idct_dc_add_neon: 33.8
vp8_idct_dc_add4y_c: 426.4
vp8_idct_dc_add4y_neon: 59.4
vp8_loop_filter8uv_h_c: 688.1
vp8_loop_filter8uv_h_neon: 216.3
vp8_loop_filter8uv_inner_h_c: 649.3
vp8_loop_filter8uv_inner_h_neon: 195.3
vp8_loop_filter8uv_inner_v_c: 544.8
vp8_loop_filter8uv_inner_v_neon: 131.3
vp8_loop_filter8uv_v_c: 706.1
vp8_loop_filter8uv_v_neon: 141.1
vp8_loop_filter16y_h_c: 668.8
vp8_loop_filter16y_h_neon: 242.8
vp8_loop_filter16y_inner_h_c: 647.3
vp8_loop_filter16y_inner_h_neon: 224.6
vp8_loop_filter16y_inner_v_c: 647.8
vp8_loop_filter16y_inner_v_neon: 128.8
vp8_loop_filter16y_v_c: 721.8
vp8_loop_filter16y_v_neon: 154.3
vp8_loop_filter_simple_h_c: 387.8
vp8_loop_filter_simple_h_neon: 187.6
vp8_loop_filter_simple_v_c: 384.1
vp8_loop_filter_simple_v_neon: 78.6
vp8_put_epel8_h4v4_c: 3971.1
vp8_put_epel8_h4v4_neon: 855.1
vp8_put_epel8_h4v6_c: 5060.1
vp8_put_epel8_h4v6_neon: 989.6
vp8_put_epel8_h6v4_c: 4320.8
vp8_put_epel8_h6v4_neon: 1007.3
vp8_put_epel8_h6v6_c: 5449.3
vp8_put_epel8_h6v6_neon: 1158.1
vp8_put_epel16_h6_c: 6683.8
vp8_put_epel16_h6_neon: 831.8
vp8_put_epel16_h6v6_c: 11110.8
vp8_put_epel16_h6v6_neon: 2214.8
vp8_put_epel16_v6_c: 7024.8
vp8_put_epel16_v6_neon: 799.6
vp8_put_pixels8_c: 112.8
vp8_put_pixels8_neon: 78.1
vp8_put_pixels16_c: 131.3
vp8_put_pixels16_neon: 129.8

Signed-off-by: Magnus Röös <mla2.roos@gmail.com>
2019-01-31 20:17:51 +01:00
Janne Grunau
28a8b5413b h264/aarch64: add intra loop filter neon asm
Add my neon asm from x264 relicensed under the LGPL 2.1 or later. Ported
(x264 uses nv12 chroma) and optimized.

Cycle count for checkasm --bench on a Snapdragon 820e:
h264_h_loop_filter_luma_intra_8bpp_c: 60.0
h264_h_loop_filter_luma_intra_8bpp_neon: 54.2
h264_v_loop_filter_luma_intra_8bpp_c: 148.3
h264_v_loop_filter_luma_intra_8bpp_neon: 73.8
h264_h_loop_filter_chroma_intra_8bpp_c: 27.8
h264_h_loop_filter_chroma_intra_8bpp_neon: 21.4
h264_h_loop_filter_chroma_mbaff_intra_8bpp_c: 15.8
h264_h_loop_filter_chroma_mbaff_intra_8bpp_neon: 15.7
h264_v_loop_filter_chroma_intra_8bpp_c: 45.8
h264_v_loop_filter_chroma_intra_8bpp_neon: 17.3
2019-01-26 12:05:10 +01:00
Janne Grunau
846c3d6aca h264/aarch64: optimize neon loop filter
Exit as soon as possible if no filtering will be done.

Improves the checkasm --bench cycle count on a Snapdragon 820e:
h264_h_loop_filter_luma_8bpp_c:      72.4 ->  72.5
h264_h_loop_filter_luma_8bpp_neon:   97.1 ->  56.3
h264_v_loop_filter_luma_8bpp_c:     174.0 -> 173.5
h264_v_loop_filter_luma_8bpp_neon:   62.9 ->  60.9
h264_h_loop_filter_chroma_8bpp_c:    30.2 ->  30.3
h264_h_loop_filter_chroma_8bpp_neon: 51.6 ->  25.7
h264_v_loop_filter_chroma_8bpp_c:    57.3 ->  57.3
h264_v_loop_filter_chroma_8bpp_neon: 28.0 ->  24.0
2019-01-26 12:05:10 +01:00
Janne Grunau
bb515e3a73 h264/aarch64: sign extend int stride in loop filter asm 2019-01-26 12:05:10 +01:00
Manoj Gupta
6fcf813110 libavcodec: Remove dynamic relocs from aarch64/h264idct_neon.S
Some of the assembly functions e.g. ff_h264_idct_dc_add_neon
has code like:
        movrel          x14, X(ff_h264_idct_add_neon)

Linker cannot resolve them fully at link time and emits dynamic
relocations.
Use explicit labels instead so that no dynamic relocations are
needed at all.

This avoids lld complains about text relocations.

For background, see https://crbug.com/917919

Signed-off-by: Manoj Gupta <manojgupta@chromium.org>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-01-03 20:12:07 +01:00
Carl Eugen Hoyos
0576ef466d lavc/aarch64/h264dsp_init_aarch64: Fix weight function prototypes.
Fixes the following warnings:
libavcodec/aarch64/h264dsp_init_aarch64.c: In function ‘ff_h264dsp_init_aarch64’:
libavcodec/aarch64/h264dsp_init_aarch64.c:84:38: warning: assignment from incompatible pointer type [enabled by default]
         c->weight_h264_pixels_tab[0] = ff_weight_h264_pixels_16_neon;
                                      ^
libavcodec/aarch64/h264dsp_init_aarch64.c:85:38: warning: assignment from incompatible pointer type [enabled by default]
         c->weight_h264_pixels_tab[1] = ff_weight_h264_pixels_8_neon;
                                      ^
libavcodec/aarch64/h264dsp_init_aarch64.c:86:38: warning: assignment from incompatible pointer type [enabled by default]
         c->weight_h264_pixels_tab[2] = ff_weight_h264_pixels_4_neon;
                                      ^
libavcodec/aarch64/h264dsp_init_aarch64.c:88:40: warning: assignment from incompatible pointer type [enabled by default]
         c->biweight_h264_pixels_tab[0] = ff_biweight_h264_pixels_16_neon;
                                        ^
libavcodec/aarch64/h264dsp_init_aarch64.c:89:40: warning: assignment from incompatible pointer type [enabled by default]
         c->biweight_h264_pixels_tab[1] = ff_biweight_h264_pixels_8_neon;
                                        ^
libavcodec/aarch64/h264dsp_init_aarch64.c:90:40: warning: assignment from incompatible pointer type [enabled by default]
         c->biweight_h264_pixels_tab[2] = ff_biweight_h264_pixels_4_neon;
                                        ^
2018-07-13 21:28:04 +02:00
Rodger Combs
7723750475 lavc/aarch64/sbrdsp_neon: fix build on old binutils 2018-01-26 02:42:01 -06:00
James Almer
0525722ca0 Merge commit '732510636e597585a79be7d111c88b3f7e174fe7'
* commit '732510636e597585a79be7d111c88b3f7e174fe7':
  aarch64: Remove a dot from a label

Merged-by: James Almer <jamrial@gmail.com>
2017-11-11 17:47:10 -03:00
Martin Storsjö
732510636e aarch64: Remove a dot from a label
This fixes building with armasm64 (when run through gas-preprocessor).

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-10-18 10:49:33 +03:00
Matthieu Bouron
0a24d7ca83 lavc/aarch64: add sbrdsp neon implementation
autocorrelate_c: 644.0
autocorrelate_neon: 420.0
hf_apply_noise_0_c: 1688.5
hf_apply_noise_0_neon: 1498.6
hf_apply_noise_1_c: 1691.2
hf_apply_noise_1_neon: 1500.6
hf_apply_noise_2_c: 1688.1
hf_apply_noise_2_neon: 1500.3
hf_apply_noise_3_c: 1696.6
hf_apply_noise_3_neon: 1502.2
hf_g_filt_c: 2117.8
hf_g_filt_neon: 1218.7
hf_gen_c: 4573.4
hf_gen_neon: 2461.0
neg_odd_64_c: 72.0
neg_odd_64_neon: 64.7
qmf_deint_bfly_c: 1107.6
qmf_deint_bfly_neon: 291.6
qmf_deint_neg_c: 210.4
qmf_deint_neg_neon: 107.4
qmf_post_shuffle_c: 163.0
qmf_post_shuffle_neon: 107.7
qmf_pre_shuffle_c: 120.5
qmf_pre_shuffle_neon: 110.7
sum64x5_c: 1361.6
sum64x5_neon: 435.4
sum_square_c: 1686.4
sum_square_neon: 787.2
2017-07-03 14:29:22 +02:00
Clément Bœsch
b12a36170b lavc/aacpsdsp: use ptrdiff_t for stride in hybrid_analysis 2017-06-28 12:22:39 +02:00
Clément Bœsch
ff0ecef624 lavc/aarch64: add a few SIMD functions for AAC PS
☭ tests/checkasm/checkasm --bench --test=aacpsdsp
checkasm: using random seed 3318985180
MMX implied by specified flags
MMX implied by specified flags
NEON:
 - aacpsdsp.add_squares        [OK]
 - aacpsdsp.mul_pair_single    [OK]
 - aacpsdsp.hybrid_analysis    [OK]
 - aacpsdsp.stereo_interpolate [OK]
checkasm: all 5 tests passed
nop: 10.0
ps_add_squares_c: 63221.2
ps_add_squares_neon: 22311.7
ps_hybrid_analysis_c: 2466.6
ps_hybrid_analysis_neon: 1521.9
ps_mul_pair_single_c: 68592.0
ps_mul_pair_single_neon: 17426.6
ps_stereo_interpolate_c: 72344.3
ps_stereo_interpolate_neon: 72308.8
ps_stereo_interpolate_ipdopd_c: 117415.2
ps_stereo_interpolate_ipdopd_neon: 113386.3
2017-06-28 12:22:39 +02:00
Memphiz
9e85c5d6a7 aarch64: vp9 16bpp: Fix assembling with Xcode 6.2 and older
Properly use the b.eq form instead of the nonstandard form (which
both gas and newer clang accept though), and expand the register
lists that used a range (which the Xcode 6.2 clang, based on clang
3.5 svn, didn't support).

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-06-21 09:08:14 +03:00
Memphiz
998609ddb8 aarch64: vp9: Fix assembling with Xcode 6.2 and older
Properly use the b.eq/b.ge forms instead of the nonstandard forms
(which both gas and newer clang accept though), and expand the
register list that used a range (which the Xcode 6.2 clang, based
on clang 3.5 svn, didn't support).

This is cherrypicked from libav commit
a970f9de865c84ed5360dd0398baee7d48d04620.

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-06-21 09:08:13 +03:00
Memphiz
a970f9de86 aarch64: vp9: Fix assembling with Xcode 6.2 and older
Properly use the b.eq/b.ge forms instead of the nonstandard forms
(which both gas and newer clang accept though), and expand the
register list that used a range (which the Xcode 6.2 clang, based
on clang 3.5 svn, didn't support).

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-06-20 16:14:03 +03:00
Matthieu Bouron
204008354f lavc/aarch64/simple_idct: fix build with Xcode 7.2 2017-06-14 23:20:58 +02:00
Matthieu Bouron
8aa60606fb lavc/aarch64/simple_idct: fix idct_col4_top coefficient
Fixes regression introduced by 5d0b8b1ae307951310c7d9a8fa282fbca9b997cd.
2017-06-13 17:46:55 +02:00
Matthieu Bouron
5d0b8b1ae3 lavc/aarch64/simple_idct: fix iOS build without gas-preprocessor
Separates macro arguments with commas and passes .4H/.8H as macro
arguments instead of 4H/8H (the later form being interpreted as an
hexadecimal value).

Fixes ticket #6324.

Suggested-by: Martin Storsjö <martin@martin.st>
2017-05-11 16:28:54 +02:00
Clément Bœsch
0f00eb0e4e Merge commit '2425d7329fdccfa9954faba748f3865151354f0c'
* commit '2425d7329fdccfa9954faba748f3865151354f0c':
  arm64: replace 'bic' with immediate with 'and' with inverted immediate

Merged-by: Clément Bœsch <u@pkh.me>
2017-04-26 16:28:57 +02:00
James Almer
5694427dc3 Merge commit '72a19f4013ec2c7f8581416f8ad4bf81df163fb6'
* commit '72a19f4013ec2c7f8581416f8ad4bf81df163fb6':
  mpegaudiodsp: aarch64: Adjust function prototype after 2caa93b813adc5dbb7771dfe615da826a2947d18

Merged-by: James Almer <jamrial@gmail.com>
2017-03-31 14:43:37 -03:00
James Almer
c31cbeef58 aarch64/vp9dsp: add missing header includes 2017-03-28 23:02:09 -03:00
Ronald S. Bultje
f8c019944d vp9: re-split the decoder/format/dsp interface header files.
The advantage here is that the internal software decoder interface is
not exposed to the DSP functions or the hardware accelerations.
2017-03-28 18:04:26 -04:00