mirror of
https://github.com/bitcoin/bitcoin.git
synced 2026-01-19 23:03:45 +01:00
6eb5ba5691refactor: extract shared `SipHash` state into `SipHashState` (Lőrinc)118d22ddb4optimization: cache `PresaltedSipHasher` in `CBlockHeaderAndShortTxIDs` (Lőrinc)9ca52a4cbeoptimization: migrate `SipHashUint256` to `PresaltedSipHasher` (Lőrinc)ec11b9fedeoptimization: introduce `PresaltedSipHasher` for repeated hashing (Lőrinc)20330548cfrefactor: extract `SipHash` C0-C3 constants to class scope (Lőrinc)9f9eb7fbc0test: rename k1/k2 to k0/k1 in `SipHash` consistency tests (Lőrinc) Pull request description: This change is part of [[IBD] - Tracking PR for speeding up Initial Block Download](https://github.com/bitcoin/bitcoin/pull/32043) ### Summary The in-memory representation of the UTXO set uses (salted) [SipHash](https://github.com/bitcoin/bitcoin/blob/master/src/coins.h#L226) to avoid key collision attacks. Hashing `uint256` keys is performed frequently throughout the codebase. Previously, specialized optimizations existed as standalone functions (`SipHashUint256` and `SipHashUint256Extra`), but the constant salting operations (C0-C3 XOR with keys) were recomputed on every call. This PR introduces `PresaltedSipHasher`, a class that caches the initial SipHash state (v0-v3 after XORing constants with keys), eliminating redundant constant computations when hashing multiple values with the same keys. The optimization is applied uniformly across: - All `Salted*Hasher` classes (`SaltedUint256Hasher`, `SaltedTxidHasher`, `SaltedWtxidHasher`, `SaltedOutpointHasher`) - `CBlockHeaderAndShortTxIDs` for compact block short ID computation ### Details The change replaces the standalone `SipHashUint256` and `SipHashUint256Extra` functions with `PresaltedSipHasher` class methods that cache the constant-salted state. This is particularly beneficial for hash map operations where the same salt is used repeatedly (as suggested by Sipa in https://github.com/bitcoin/bitcoin/pull/30442#issuecomment-2628994530). `CSipHasher` behavior remains unchanged; only the specialized `uint256` paths and callers now reuse the cached state instead of recomputing it. ### Measurements Benchmarks were run using local `SaltedOutpointHasherBench_*` microbenchmarks (not included in this PR) that exercise `SaltedOutpointHasher` in realistic `std::unordered_set` scenarios. <details> <summary>Benchmarks</summary> ```C++ diff --git a/src/bench/crypto_hash.cpp b/src/bench/crypto_hash.cpp --- a/src/bench/crypto_hash.cpp(revision9b1a7c3e8d) +++ b/src/bench/crypto_hash.cpp(revision e1b4f056b3097e7e34b0eda31f57826d81c9d810) @@ -2,7 +2,6 @@ // Distributed under the MIT software license, see the accompanying // file COPYING or http://www.opensource.org/licenses/mit-license.php. - #include <bench/bench.h> #include <crypto/muhash.h> #include <crypto/ripemd160.h> @@ -12,9 +11,11 @@ #include <crypto/sha512.h> #include <crypto/siphash.h> #include <random.h> -#include <span.h> #include <tinyformat.h> #include <uint256.h> +#include <primitives/transaction.h> +#include <util/hasher.h> +#include <unordered_set> #include <cstdint> #include <vector> @@ -205,6 +206,98 @@ }); } +static void SaltedOutpointHasherBench_hash(benchmark::Bench& bench) +{ + FastRandomContext rng{/*fDeterministic=*/true}; + constexpr size_t size{1000}; + + std::vector<COutPoint> outpoints(size); + for (auto& outpoint : outpoints) { + outpoint = {Txid::FromUint256(rng.rand256()), rng.rand32()}; + } + + const SaltedOutpointHasher hasher; + bench.batch(size).run([&] { + size_t result{0}; + for (const auto& outpoint : outpoints) { + result ^= hasher(outpoint); + } + ankerl::nanobench::doNotOptimizeAway(result); + }); +} + +static void SaltedOutpointHasherBench_match(benchmark::Bench& bench) +{ + FastRandomContext rng{/*fDeterministic=*/true}; + constexpr size_t size{1000}; + + std::unordered_set<COutPoint, SaltedOutpointHasher> values; + std::vector<COutPoint> value_vector; + values.reserve(size); + value_vector.reserve(size); + + for (size_t i{0}; i < size; ++i) { + COutPoint outpoint{Txid::FromUint256(rng.rand256()), rng.rand32()}; + values.emplace(outpoint); + value_vector.push_back(outpoint); + assert(values.contains(outpoint)); + } + + bench.batch(size).run([&] { + bool result{true}; + for (const auto& outpoint : value_vector) { + result ^= values.contains(outpoint); + } + ankerl::nanobench::doNotOptimizeAway(result); + }); +} + +static void SaltedOutpointHasherBench_mismatch(benchmark::Bench& bench) +{ + FastRandomContext rng{/*fDeterministic=*/true}; + constexpr size_t size{1000}; + + std::unordered_set<COutPoint, SaltedOutpointHasher> values; + std::vector<COutPoint> missing_value_vector; + values.reserve(size); + missing_value_vector.reserve(size); + + for (size_t i{0}; i < size; ++i) { + values.emplace(Txid::FromUint256(rng.rand256()), rng.rand32()); + COutPoint missing_outpoint{Txid::FromUint256(rng.rand256()), rng.rand32()}; + missing_value_vector.push_back(missing_outpoint); + assert(!values.contains(missing_outpoint)); + } + + bench.batch(size).run([&] { + bool result{false}; + for (const auto& outpoint : missing_value_vector) { + result ^= values.contains(outpoint); + } + ankerl::nanobench::doNotOptimizeAway(result); + }); +} + +static void SaltedOutpointHasherBench_create_set(benchmark::Bench& bench) +{ + FastRandomContext rng{/*fDeterministic=*/true}; + constexpr size_t size{1000}; + + std::vector<COutPoint> outpoints(size); + for (auto& outpoint : outpoints) { + outpoint = {Txid::FromUint256(rng.rand256()), rng.rand32()}; + } + + bench.batch(size).run([&] { + std::unordered_set<COutPoint, SaltedOutpointHasher> set; + set.reserve(size); + for (const auto& outpoint : outpoints) { + set.emplace(outpoint); + } + ankerl::nanobench::doNotOptimizeAway(set.size()); + }); +} + static void MuHash(benchmark::Bench& bench) { MuHash3072 acc; @@ -276,6 +369,10 @@ BENCHMARK(SHA256_32b_AVX2, benchmark::PriorityLevel::HIGH); BENCHMARK(SHA256_32b_SHANI, benchmark::PriorityLevel::HIGH); BENCHMARK(SipHash_32b, benchmark::PriorityLevel::HIGH); +BENCHMARK(SaltedOutpointHasherBench_hash, benchmark::PriorityLevel::HIGH); +BENCHMARK(SaltedOutpointHasherBench_match, benchmark::PriorityLevel::HIGH); +BENCHMARK(SaltedOutpointHasherBench_mismatch, benchmark::PriorityLevel::HIGH); +BENCHMARK(SaltedOutpointHasherBench_create_set, benchmark::PriorityLevel::HIGH); BENCHMARK(SHA256D64_1024_STANDARD, benchmark::PriorityLevel::HIGH); BENCHMARK(SHA256D64_1024_SSE4, benchmark::PriorityLevel::HIGH); BENCHMARK(SHA256D64_1024_AVX2, benchmark::PriorityLevel::HIGH); ``` </details> > cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release && cmake --build build -j$(nproc) && build/bin/bench_bitcoin -filter='SaltedOutpointHasherBench' -min-time=10000 > Before: | ns/op | op/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 58.60 | 17,065,922.04 | 0.3% | 11.02 | `SaltedOutpointHasherBench_create_set` | 11.97 | 83,576,684.83 | 0.1% | 11.01 | `SaltedOutpointHasherBench_hash` | 14.50 | 68,985,850.12 | 0.3% | 10.96 | `SaltedOutpointHasherBench_match` | 13.90 | 71,942,033.47 | 0.4% | 11.03 | `SaltedOutpointHasherBench_mismatch` > After: | ns/op | op/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 57.27 | 17,462,299.19 | 0.1% | 11.02 | `SaltedOutpointHasherBench_create_set` | 11.24 | 88,997,888.48 | 0.3% | 11.04 | `SaltedOutpointHasherBench_hash` | 13.91 | 71,902,014.20 | 0.2% | 11.01 | `SaltedOutpointHasherBench_match` | 13.29 | 75,230,390.31 | 0.1% | 11.00 | `SaltedOutpointHasherBench_mismatch` compared to master: ```python create_set - 17,462,299.19 / 17,065,922.04 - 2.3% faster hash - 88,997,888.48 / 83,576,684.83 - 6.4% faster match - 71,902,014.20 / 68,985,850.12 - 4.2% faster mismatch - 75,230,390.31 / 71,942,033.47 - 4.5% faster ``` > C++ compiler .......................... GNU 13.3.0 > Before: | ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- | 136.76 | 7,312,133.16 | 0.0% | 1,086.67 | 491.12 | 2.213 | 119.54 | 1.1% | 11.01 | `SaltedOutpointHasherBench_create_set` | 23.82 | 41,978,882.62 | 0.0% | 252.01 | 85.57 | 2.945 | 4.00 | 0.0% | 11.00 | `SaltedOutpointHasherBench_hash` | 60.42 | 16,549,695.42 | 0.1% | 460.51 | 217.04 | 2.122 | 21.00 | 1.4% | 10.99 | `SaltedOutpointHasherBench_match` | 78.66 | 12,713,595.35 | 0.1% | 555.59 | 282.52 | 1.967 | 20.19 | 2.2% | 10.74 | `SaltedOutpointHasherBench_mismatch` > After: | ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- | 135.38 | 7,386,349.49 | 0.0% | 1,078.19 | 486.16 | 2.218 | 119.56 | 1.1% | 11.00 | `SaltedOutpointHasherBench_create_set` | 23.67 | 42,254,558.08 | 0.0% | 247.01 | 85.01 | 2.906 | 4.00 | 0.0% | 11.00 | `SaltedOutpointHasherBench_hash` | 58.95 | 16,962,220.14 | 0.1% | 446.55 | 211.74 | 2.109 | 20.86 | 1.4% | 11.01 | `SaltedOutpointHasherBench_match` | 76.98 | 12,991,047.69 | 0.1% | 548.93 | 276.50 | 1.985 | 20.25 | 2.3% | 10.72 | `SaltedOutpointHasherBench_mismatch` ```python compared to master: create_set - 7,386,349.49 / 7,312,133.16 - 1.0% faster hash - 42,254,558.08 / 41,978,882.62 - 0.6% faster match - 16,962,220.14 / 16,549,695.42 - 2.4% faster mismatch - 12,991,047.69 / 12,713,595.35 - 2.1% faster ``` ACKs for top commit: achow101: ACK6eb5ba5691vasild: ACK6eb5ba5691sipa: ACK6eb5ba5691Tree-SHA512: 9688b87e1d79f8af9efc18a8487922c5f1735487a9c5b78029dd46abc1d94f05d499cd1036bd615849aa7d6b17d11653c968086050dd7d04300403ebd0e81210