ffmpeg/libavutil
Lynne 82a68a8771
x86/tx_float: remove vgatherdpd usage
Its performance loss ranges from either being just as fast as individual loads
(Skylake), a few percent slower (Alderlake), 8% slower (Zen 3), to completely
disasterous (older/other CPUs).

Sadly, gathers never panned out fast on x86, even with the benefit of time and
implementation experience.

This also saves a register, as there's no need to fill out an additional
register mask.

Zen 3 (16384-point transform):
Before: 1561050 decicycles in           av_tx (fft),  131072 runs,      0 skips
After:  1449621 decicycles in           av_tx (fft),  131072 runs,      0 skips

Alderlake:
2% slower on big transforms (65536), to 1% (131072), to a few percent for smaller
sizes.
2022-05-20 10:12:34 +02:00
..
2022-05-20 10:12:34 +02:00
2021-04-27 10:43:13 -03:00
2021-07-22 14:34:31 +02:00
2021-07-22 14:34:31 +02:00
2022-02-24 12:56:49 +01:00
2022-03-10 16:45:48 -03:00
2021-07-22 14:34:31 +02:00
2021-07-22 14:34:31 +02:00
2022-02-24 12:56:49 +01:00
2022-02-24 12:56:49 +01:00
2022-02-24 12:56:49 +01:00
2022-02-24 12:56:49 +01:00
2018-09-12 19:15:09 +02:00
2018-09-12 19:15:09 +02:00
2021-04-27 10:43:13 -03:00
2021-07-22 14:34:31 +02:00
2021-07-22 14:34:31 +02:00
2021-07-22 14:34:31 +02:00
2021-07-22 14:34:31 +02:00
2022-02-24 12:56:49 +01:00
2021-07-21 16:35:27 +02:00
2021-07-21 16:35:27 +02:00
2021-10-02 17:13:57 +02:00
2021-07-22 14:34:31 +02:00
2021-09-20 01:04:09 +02:00
2021-04-27 10:43:13 -03:00
2021-07-22 14:34:31 +02:00
2022-03-15 09:42:29 -03:00
2021-07-22 14:34:31 +02:00
2020-05-23 15:51:44 +02:00
2020-05-23 15:51:44 +02:00
2021-07-22 14:34:31 +02:00
2021-07-22 14:34:31 +02:00
2021-07-22 14:34:31 +02:00
2017-11-05 22:13:16 +01:00
2022-02-24 12:56:49 +01:00
2022-02-24 12:56:49 +01:00
2021-07-22 14:34:31 +02:00
2021-11-19 16:47:28 +01:00