248b6a27c3
optimization: peel align-head and unroll body to 64 bytes (Lőrinc)e7114fc6dc
optimization: migrate fixed-size obfuscation from `std::vector<std::byte>` to `uint64_t` (Lőrinc)478d40afc6
refactor: encapsulate `vector`/`array` keys into `Obfuscation` (Lőrinc)377aab8e5a
refactor: move `util::Xor` to `Obfuscation().Xor` (Lőrinc)fa5d296e3b
refactor: prepare mempool_persist for obfuscation key change (Lőrinc)6bbf2d9311
refactor: prepare `DBWrapper` for obfuscation key change (Lőrinc)0b8bec8aa6
scripted-diff: unify xor-vs-obfuscation nomenclature (Lőrinc)972697976c
bench: make ObfuscationBench more representative (Lőrinc)618a30e326
test: compare util::Xor with randomized inputs against simple impl (Lőrinc)a5141cd39e
test: make sure dbwrapper obfuscation key is never obfuscated (Lőrinc)54ab0bd64c
refactor: commit to 8 byte obfuscation keys (Lőrinc)7aa557a37b
random: add fixed-size `std::array` generation (Lőrinc) Pull request description: This change is part of [[IBD] - Tracking PR for speeding up Initial Block Download](https://github.com/bitcoin/bitcoin/pull/32043) ### Summary Current block obfuscations are done byte-by-byte, this PR batches them to 64 bit primitives to speed up obfuscating bigger memory batches. This is especially relevant now that https://github.com/bitcoin/bitcoin/pull/31551 was merged, having bigger obfuscatable chunks. Since this obfuscation is optional, the speedup measured here depends on whether it's a [random value](https://github.com/bitcoin/bitcoin/pull/31144#issuecomment-2523295114) or [completely turned off](https://github.com/bitcoin/bitcoin/pull/31144#issuecomment-2519764142) (i.e. XOR-ing with 0). ### Changes in testing, benchmarking and implementation * Added new tests comparing randomized inputs against a trivial implementation and performing roundtrip checks with random chunks. * Migrated `std::vector<std::byte>(8)` keys to plain `uint64_t`; * Process unaligned bytes separately and unroll body to 64 bytes. ### Assembly Memory alignment is enforced by a small peel-loop (`std::memcpy` is optimized out on tested platform), with an `std::assume_aligned<8>` check, see the Godbolt listing at https://godbolt.org/z/59EMv7h6Y for details <details> <summary>Details</summary> Target & Compiler | Stride (per hot-loop iter) | Main operation(s) in loop | Effective XORs / iter -- | -- | -- | -- Clang x86-64 (trunk) | 64 bytes | 4 × movdqu → pxor → store | 8 × 64-bit GCC x86-64 (trunk) | 64 bytes | 4 × movdqu/pxor sequence, enabled by 8-way unroll | 8 × 64-bit GCC RV32 (trunk) | 8 bytes | copy 8 B to temp → 2 × 32-bit XOR → copy back | 1 × 64-bit (as 2 × 32-bit) GCC s390x (big-endian 14.2) | 64 bytes | 8 × XC (mem-mem 8-B XOR) with key cached on stack | 8 × 64-bit </details> ### Endianness The only endianness issue was with bit rotation, intended to realign the key if obfuscation halted before full key consumption. Elsewhere, memory is read, processed, and written back in the same endianness, preserving byte order. Since CI lacks a big-endian machine, testing was done locally via Docker. <details> <summary>Details</summary> ```bash brew install podman pigz softwareupdate --install-rosetta podman machine init podman machine start docker run --platform linux/s390x -it ubuntu:latest /bin/bash apt update && apt install -y git build-essential cmake ccache pkg-config libevent-dev libboost-dev libssl-dev libsqlite3-dev python3 && \ cd /mnt && git clone --depth=1 https://github.com/bitcoin/bitcoin.git && cd bitcoin && git remote add l0rinc https://github.com/l0rinc/bitcoin.git && git fetch --all && git checkout l0rinc/optimize-xor && \ cmake -B build && cmake --build build --target test_bitcoin -j$(nproc) && \ ./build/bin/test_bitcoin --run_test=streams_tests ``` </details> ### Measurements (micro benchmarks and full IBDs) > cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=gcc/clang -DCMAKE_CXX_COMPILER=g++/clang++ && \ cmake --build build -j$(nproc) && \ build/bin/bench_bitcoin -filter='ObfuscationBench' -min-time=5000 <details> <summary>GNU 14.2.0</summary> > Before: | ns/byte | byte/s | err% | ins/byte | cyc/byte | IPC | bra/byte | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- | 0.84 | 1,184,138,235.64 | 0.0% | 9.01 | 3.03 | 2.971 | 1.00 | 0.1% | 5.50 | `ObfuscationBench` > After (first optimizing commit): | ns/byte | byte/s | err% | ins/byte | cyc/byte | IPC | bra/byte | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- | 0.04 | 28,365,698,819.44 | 0.0% | 0.34 | 0.13 | 2.714 | 0.07 | 0.0% | 5.33 | `ObfuscationBench` > and (second optimizing commit): | ns/byte | byte/s | err% | ins/byte | cyc/byte | IPC | bra/byte | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- | 0.03 | 32,464,658,919.11 | 0.0% | 0.50 | 0.11 | 4.474 | 0.08 | 0.0% | 5.29 | `ObfuscationBench` </details> <details> <summary>Clang 20.1.7</summary> > Before: | ns/byte | byte/s | err% | ins/byte | cyc/byte | IPC | bra/byte | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- | 0.89 | 1,124,087,330.23 | 0.1% | 6.52 | 3.20 | 2.041 | 0.50 | 0.2% | 5.50 | `ObfuscationBench` > After (first optimizing commit): | ns/byte | byte/s | err% | ins/byte | cyc/byte | IPC | bra/byte | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- | 0.08 | 13,012,464,203.00 | 0.0% | 0.65 | 0.28 | 2.338 | 0.13 | 0.8% | 5.50 | `ObfuscationBench` > and (second optimizing commit): | ns/byte | byte/s | err% | ins/byte | cyc/byte | IPC | bra/byte | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- | 0.02 | 41,231,547,045.17 | 0.0% | 0.30 | 0.09 | 3.463 | 0.02 | 0.0% | 5.47 | `ObfuscationBench` </details> i.e. 27.4x faster obfuscation with GCC, 36.7x faster with Clang For other benchmark speedups see https://corecheck.dev/bitcoin/bitcoin/pulls/31144 ------ Running an IBD until 888888 blocks reveals a 4% speedup. <details> <summary>Details</summary> SSD: ```bash COMMITS="8324a00bd4a6a5291c841f2d01162d8a014ddb02 5ddfd31b4158a89b0007cfb2be970c03d9278525"; \ STOP_HEIGHT=888888; DBCACHE=1000; \ CC=gcc; CXX=g++; \ BASE_DIR="/mnt/my_storage"; DATA_DIR="$BASE_DIR/BitcoinData"; LOG_DIR="$BASE_DIR/logs"; \ (for c in $COMMITS; do git fetch origin $c -q && git log -1 --pretty=format:'%h %s' $c || exit 1; done) && \ hyperfine \ --sort 'command' \ --runs 1 \ --export-json "$BASE_DIR/ibd-${COMMITS// /-}-$STOP_HEIGHT-$DBCACHE-$CC.json" \ --parameter-list COMMIT ${COMMITS// /,} \ --prepare "killall bitcoind; rm -rf $DATA_DIR/*; git checkout {COMMIT}; git clean -fxd; git reset --hard; \ cmake -B build -DCMAKE_BUILD_TYPE=Release -DENABLE_WALLET=OFF && \ cmake --build build -j$(nproc) --target bitcoind && \ ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=1 -printtoconsole=0; sleep 100" \ --cleanup "cp $DATA_DIR/debug.log $LOG_DIR/debug-{COMMIT}-$(date +%s).log" \ "COMPILER=$CC ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=$STOP_HEIGHT -dbcache=$DBCACHE -blocksonly -printtoconsole=0" ``` > 8324a00bd4 test: Compare util::Xor with randomized inputs against simple impl > 5ddfd31b41 optimization: Xor 64 bits together instead of byte-by-byte ```python Benchmark 1: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=1000 -blocksonly -printtoconsole=0 (COMMIT = 8324a00bd4a6a5291c841f2d01162d8a014ddb02) Time (abs ≡): 25033.413 s [User: 33953.984 s, System: 2613.604 s] Benchmark 2: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=1000 -blocksonly -printtoconsole=0 (COMMIT = 5ddfd31b4158a89b0007cfb2be970c03d9278525) Time (abs ≡): 24110.710 s [User: 33389.536 s, System: 2660.292 s] Relative speed comparison 1.04 COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=1000 -blocksonly -printtoconsole=0 (COMMIT = 8324a00bd4a6a5291c841f2d01162d8a014ddb02) 1.00 COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=1000 -blocksonly -printtoconsole=0 (COMMIT = 5ddfd31b4158a89b0007cfb2be970c03d9278525) ``` > HDD: ```bash COMMITS="71eb6eaa740ad0b28737e90e59b89a8e951d90d9 46854038e7984b599d25640de26d4680e62caba7"; \ STOP_HEIGHT=888888; DBCACHE=4500; \ CC=gcc; CXX=g++; \ BASE_DIR="/mnt/my_storage"; DATA_DIR="$BASE_DIR/BitcoinData"; LOG_DIR="$BASE_DIR/logs"; \ (for c in $COMMITS; do git fetch origin $c -q && git log -1 --pretty=format:'%h %s' $c || exit 1; done) && \ hyperfine \ --sort 'command' \ --runs 2 \ --export-json "$BASE_DIR/ibd-${COMMITS// /-}-$STOP_HEIGHT-$DBCACHE-$CC.json" \ --parameter-list COMMIT ${COMMITS// /,} \ --prepare "killall bitcoind; rm -rf $DATA_DIR/*; git checkout {COMMIT}; git clean -fxd; git reset --hard; \ cmake -B build -DCMAKE_BUILD_TYPE=Release -DENABLE_WALLET=OFF && cmake --build build -j$(nproc) --target bitcoind && \ ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=1 -printtoconsole=0; sleep 100" \ --cleanup "cp $DATA_DIR/debug.log $LOG_DIR/debug-{COMMIT}-$(date +%s).log" \ "COMPILER=$CC ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=$STOP_HEIGHT -dbcache=$DBCACHE -blocksonly -printtoconsole=0" ``` > 71eb6eaa74 test: compare util::Xor with randomized inputs against simple impl > 46854038e7 optimization: migrate fixed-size obfuscation from `std::vector<std::byte>` to `uint64_t` ```python Benchmark 1: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=4500 -blocksonly -printtoconsole=0 (COMMIT = 71eb6eaa740ad0b28737e90e59b89a8e951d90d9) Time (mean ± σ): 37676.293 s ± 83.100 s [User: 36900.535 s, System: 2220.382 s] Range (min … max): 37617.533 s … 37735.053 s 2 runs Benchmark 2: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=4500 -blocksonly -printtoconsole=0 (COMMIT = 46854038e7984b599d25640de26d4680e62caba7) Time (mean ± σ): 36181.287 s ± 195.248 s [User: 34962.822 s, System: 1988.614 s] Range (min … max): 36043.226 s … 36319.349 s 2 runs Relative speed comparison 1.04 ± 0.01 COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=4500 -blocksonly -printtoconsole=0 (COMMIT = 71eb6eaa740ad0b28737e90e59b89a8e951d90d9) 1.00 COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=4500 -blocksonly -printtoconsole=0 (COMMIT = 46854038e7984b599d25640de26d4680e62caba7) ``` </details> ACKs for top commit: achow101: ACK248b6a27c3
maflcko: review ACK248b6a27c3
🎻 ryanofsky: Code review ACK248b6a27c3
. Looks good! Thanks for adapting this and considering all the suggestions. I did leave more comments below but non are important and this looks good as-is Tree-SHA512: ef541cd8a1f1dc504613c4eaa708202e32ae5ac86f9c875e03bcdd6357121f6af0860ef83d513c473efa5445b701e59439d416effae1085a559716b0fd45ecd6
Bitcoin Core integration/staging tree
For an immediately usable, binary version of the Bitcoin Core software, see https://bitcoincore.org/en/download/.
What is Bitcoin Core?
Bitcoin Core connects to the Bitcoin peer-to-peer network to download and fully validate blocks and transactions. It also includes a wallet and graphical user interface, which can be optionally built.
Further information about Bitcoin Core is available in the doc folder.
License
Bitcoin Core is released under the terms of the MIT license. See COPYING for more information or see https://opensource.org/license/MIT.
Development Process
The master
branch is regularly built (see doc/build-*.md
for instructions) and tested, but it is not guaranteed to be
completely stable. Tags are created
regularly from release branches to indicate new official, stable release versions of Bitcoin Core.
The https://github.com/bitcoin-core/gui repository is used exclusively for the development of the GUI. Its master branch is identical in all monotree repositories. Release branches and tags do not exist, so please do not fork that repository unless it is for development reasons.
The contribution workflow is described in CONTRIBUTING.md and useful hints for developers can be found in doc/developer-notes.md.
Testing
Testing and code review is the bottleneck for development; we get more pull requests than we can review and test on short notice. Please be patient and help out by testing other people's pull requests, and remember this is a security-critical project where any mistake might cost people lots of money.
Automated Testing
Developers are strongly encouraged to write unit tests for new code, and to
submit new unit tests for old code. Unit tests can be compiled and run
(assuming they weren't disabled during the generation of the build system) with: ctest
. Further details on running
and extending unit tests can be found in /src/test/README.md.
There are also regression and integration tests, written
in Python.
These tests can be run (if the test dependencies are installed) with: build/test/functional/test_runner.py
(assuming build
is your build directory).
The CI (Continuous Integration) systems make sure that every pull request is tested on Windows, Linux, and macOS. The CI must pass on all commits before merge to avoid unrelated CI failures on new pull requests.
Manual Quality Assurance (QA) Testing
Changes should be tested by somebody other than the developer who wrote the code. This is especially important for large or high-risk changes. It is useful to add a test plan to the pull request description if testing the changes is not straightforward.
Translations
Changes to translations as well as new translations can be submitted to Bitcoin Core's Transifex page.
Translations are periodically pulled from Transifex and merged into the git repository. See the translation process for details on how this works.
Important: We do not accept translation changes as GitHub pull requests because the next pull from Transifex would automatically overwrite them again.