Benchmarks indicated that obfuscating multiple bytes already gives an order of magnitude speed-up, but:
* GCC still emitted scalar code;
* Clang’s auto-vectorized loop ran on the slow unaligned-load path.
Fix contains:
* peeling the misaligned head enabled the hot loop starting at an 8-byte address;
* `std::assume_aligned<8>` tells the optimizer the promise holds - required to keep Apple Clang happy;
* manually unrolling the body to 64 bytes enabled GCC to auto-vectorize.
Note that `target.size() > KEY_SIZE` condition is just an optimization, the aligned and unaligned loops work without it as well - it's why the alignment calculation still contains `std::min`.
> C++ compiler .......................... GNU 14.2.0
| ns/byte | byte/s | err% | ins/byte | cyc/byte | IPC | bra/byte | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 0.03 | 32,464,658,919.11 | 0.0% | 0.50 | 0.11 | 4.474 | 0.08 | 0.0% | 5.29 | `ObfuscationBench`
> C++ compiler .......................... Clang 20.1.7
| ns/byte | byte/s | err% | ins/byte | cyc/byte | IPC | bra/byte | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 0.02 | 41,231,547,045.17 | 0.0% | 0.30 | 0.09 | 3.463 | 0.02 | 0.0% | 5.47 | `ObfuscationBench`
Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>
All former `std::vector<std::byte>` keys were replaced with `uint64_t` (we still serialize them as vectors but convert immediately to `uint64_t` on load).
This is why some tests still generate vector keys and convert them to `uint64_t` later instead of generating them directly.
In `Obfuscation::Unserialize` we can safely throw an `std::ios_base::failure` since during mempool fuzzing `mempool_persist.cpp#L141` catches and ignored these errors.
> C++ compiler .......................... GNU 14.2.0
| ns/byte | byte/s | err% | ins/byte | cyc/byte | IPC | bra/byte | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 0.04 | 28,365,698,819.44 | 0.0% | 0.34 | 0.13 | 2.714 | 0.07 | 0.0% | 5.33 | `ObfuscationBench`
> C++ compiler .......................... Clang 20.1.7
| ns/byte | byte/s | err% | ins/byte | cyc/byte | IPC | bra/byte | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 0.08 | 13,012,464,203.00 | 0.0% | 0.65 | 0.28 | 2.338 | 0.13 | 0.8% | 5.50 | `ObfuscationBench`
Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>
Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
This is meant to focus the usages to narrow the scope of the obfuscation optimization.
`Obfuscation::Xor` is mostly a move.
Co-authored-by: maflcko <6399679+maflcko@users.noreply.github.com>
Since `FastRandomContext` delegates to `GetRandBytes` anyway, we can simplify new key generation to a Write/Read combo, unifying the flow of enabling obfuscation via `Read`.
The comments were also adjusted to clarify that the `m_obfuscation` field affects the behavior of `Read` and `Write` methods.
These changes are meant to simplify the diffs for the riskier optimization commits later.
Mechanical refactor of the low-level "xor" wording to signal the intent instead of the implementation used.
The renames are ordered by heaviest-hitting substitutions first, and were constructed such that after each replacement the code is still compilable.
-BEGIN VERIFY SCRIPT-
sed -i \
-e 's/\bGetObfuscateKey\b/GetObfuscation/g' \
-e 's/\bxor_key\b/obfuscation/g' \
-e 's/\bxor_pat\b/obfuscation/g' \
-e 's/\bm_xor_key\b/m_obfuscation/g' \
-e 's/\bm_xor\b/m_obfuscation/g' \
-e 's/\bobfuscate_key\b/m_obfuscation/g' \
-e 's/\bOBFUSCATE_KEY_KEY\b/OBFUSCATION_KEY_KEY/g' \
-e 's/\bSetXor(/SetObfuscation(/g' \
-e 's/\bdata_xor\b/obfuscation/g' \
-e 's/\bCreateObfuscateKey\b/CreateObfuscation/g' \
-e 's/\bobfuscate key\b/obfuscation key/g' \
$(git ls-files '*.cpp' '*.h')
-END VERIFY SCRIPT-
The two tests are doing different things - `xor_roundtrip_random_chunks` does black-box style property-based testing to validate that certain invariants hold - that deobfuscating an obfuscation results in the original message (higher level, it doesn't have to know about the implementation details).
The `xor_bytes_reference` test makes sure the optimized xor implementation behaves in every imaginable scenario exactly as the simplest possible obfuscation - with random chunks, random alignment, random data, random key.
Since we're touching the file, other related small refactors were also applied:
* `nullpt` typo fixed;
* manual byte-by-byte xor key creations were replaced with `_hex` factories;
* since we're only using 64 bit keys in production, smaller keys were changed to reflect real-world usage;
Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>
Since 31 byte xor-keys are not used in the codebase, using the common size (8 bytes) makes the benchmarks more realistic.
Co-authored-by: maflcko <6399679+maflcko@users.noreply.github.com>
This test was introduced in #28251 to ensure that the mempool is not
trimmed in the middle of a package evaluation and the m_view cache
is updated when evictions and replacements happen so coins are no longer
visible in subsequent package transactions. These two things have
coverage in other tests as well, and are pretty unlikely to happen.
This test is also brittle: it requires evaluation of the parents in a
particular order, and creates a transaction that itself is not
enough to trigger eviction but will be pushed out immediately by the
package spending from it. While the current magic number 2000 works, we
do not have a way to query remaining space in the mempool if mempool
data structures change, and it can differ across platforms.
f5647c6c5a depends: fix libevent _WIN32_WINNT usage (fanquake)
Pull request description:
Starting with version 13.x, the mingw headers will define the value of
`NTDDI_VERSION`, based on the value of `_WIN32_WINNT`, if that version is <
Windows 10. Given that libevent was undefining our `_WIN32_WINNT`, and
redefining it to a value < Windows 10 (`0x0501`), `NTDDI_VERSION` was also
being defined to that value, leading to functions not being exposed in
the mingw-w64 headers; see here: 9c2668ef77/mingw-w64-headers/include/iphlpapi.h (L36-L41).
Imports a commit from usptream ([a14ff91254f40cf36e0fee199e26fb11260fab49](a14ff91254)).
Fixes#32707.
ACKs for top commit:
willcl-ark:
crACK f5647c6c5a
Tree-SHA512: eb429457a4af6191dd27ef3d1087667c5304ff0f49d4c6824883651e3c2dbab5d9784fa1f170402f23cd9238005c5214e0a71a4160562a59dfa35618dc702132
This argument might have been used in the legacy wallets, but I don't
see any implementation using this argument in the SQLite wallets.
Removing it cleans up the code a bit.
4bb4c86599 test: document HOST for get_previous_releases.py (Sjors Provoost)
609203d507 test: stop signing previous releases >= v28.2 (Sjors Provoost)
c6dc2c29f8 test: replace v28.0 with notarized v28.2 (Sjors Provoost)
5bd73d96a3 test: fix macOS detection (Sjors Provoost)
Pull request description:
Since https://github.com/bitcoin/bitcoin/pull/31407 macOS guix builds are signed and notarized. This was included in v29 and backported to 28.x.
This PR bumps the v28.0 previous release binary to v28.2 and adjusts the test that uses it. Additionally it no longer manually code signs binaries >= v28.2.
While testing on an M4 mac and redownloading all the binaries, I noticed that `platform == "arm64-apple-darwin"` doesn't actually work. This initially used `args.platform` in #26694, but that was changed to just `platform` in #32219.
So the first commit switches this to use `args.host`. I manually tested on Intel macOS 13.7.6 that code-signing still isn't needed there (when downloading using a script).
Also documented that you can set `HOST`.
ACKs for top commit:
m3dwards:
ACK 4bb4c86599
maflcko:
review ACK 4bb4c86599🚏
Tree-SHA512: b4803d39a21cb622fd2388a0528b76d2b502956e2505385d3da201143b0afcf6f9d71c8c28937f27b70d2588fb6da677da058bdcd67b90fb53617acc3a727818
61e800e75c test: headers sync timeout (stringintech)
Pull request description:
When reviewing PR #32051 and considering which functional tests might need to be adapted/extended accordingly, I noticed there appears to be limited functional test coverage for header sync timeouts and thought it might be helpful to add one.
This test attempts to cover two scenarios:
1. **Normal peer timeout behavior:** When a peer fails to respond to initial getheaders requests within the timeout period, it should be disconnected and the node should attempt to sync headers from the next available peer.
2. **Noban peer behavior:** When a peer with noban privileges times out, it should remain connected while the node still attempts to sync headers from other peers.
ACKs for top commit:
maflcko:
re-ACK 61e800e75c 🗝
stratospher:
reACK 61e800e7.
Tree-SHA512: b8a867e7986b6f0aa00d81a84b205f2bf8fb2e6047a2e37272e0244229d1f43020e9031467827dabbfe7849a91429f2685e00a25356e2ed477fa1a035fa0b1fd
28416f367a test: fix intermittent failure in rpc_invalidateblock.py (stratospher)
Pull request description:
resolves#32965.
node1 (with 24 blocks) causes node0 (with 6 blocks + 1 extra header) to silently reorg. so move the subtest to a point before the 20 blocks are generated so that node1's state doesn't cause node0 to silently reorg.
ACKs for top commit:
maflcko:
lgtm ACK 28416f367a
mzumsande:
Code Review ACK 28416f367a
Tree-SHA512: f6cc682b8e5416125f887c094d5e291dd37f0bfc41d7c0de218f3e24fa1ea0cd642f7a1e362f3127f68cde725a67f3054501326b9bd25f0caa9a05de7d0052b0
This adds a large simulation fuzz test for all TxOrphanage public interface
functions, using a mix of comparison with expected behavior (in case it is
fully specified), and testing of properties exhibited otherwise.
This is preparation for the simulation fuzz test added in a later commit. Since
AddChildrenToWorkSet consumes randomness, there is no way for the simulator to
exactly predict its behavior. By returning the set of made-reconsiderable announcements
instead, the simulator can instead test that it is *a* valid choice, and then
apply it to its own data structures.
For the default number of peers (125), allows each to relay a default
descendant package (up to 25-1=24 can be missing inputs) of small (9
inputs or fewer) transactions out of order.
This limit also gives acceptable bounds for worst case LimitOrphans iterations.
Functional tests aren't changed to check for larger cap because it would
make the runtime too long.
Also deletes the now-unused DEFAULT_MAX_ORPHAN_TRANSACTIONS.
This is largely a reimplementation using boost::multi_index_container.
All the same public methods are available. It has an index by outpoint,
per-peer tracking, peer worksets, etc.
A few differences:
- Limits have changed: instead of a global limit of 100 unique orphans,
we have a maximum number of announcements (which can include duplicate
orphans) and a global memory limit which scales with the number of
peers.
- The maximum announcements limit is 100 to match the original limit,
but this is actually a stricter limit because the announcement count
is not de-duplicated.
- Eviction strategy: when global limits are reached, a per-peer limit
comes into play. While limits are exceeded, we choose the peer whose
“DoS score” (max usage / limit ratio for announcements and memory
limits) is highest and evict announcements by entry time, sorting
non-reconsiderable ones before reconsiderable ones. Since announcements
are unique by (wtxid, peer), as long as 1 announcement remains for a
transaction, it remains in the orphanage.
- This eviction strategy means no peer can influence the eviction of
another peer’s orphans.
- Also, since global limits are a multiple of per-peer limits, as long
as a peer does not exceed its limits, its orphans are protected from
eviction.
- Orphans no longer expire, since older announcements are generally
removed before newer ones.
- GetChildrenFromSamePeer returns the transactions from newest to
oldest.
Co-authored-by: Pieter Wuille <pieter@wuille.net>
Move towards a model where TxOrphanage is initialized with limits that
it remembers throughout its lifetime.
Remove the param. Limiting by number of unique orphans will be removed
in a later commit.
Now that -maxorphantx is gone, this does not change the node behavior.
The parameter is only used in tests.
c18bf0bd9b refactor: cleanup index logging (Sjors Provoost)
Pull request description:
This PR removes the use of `__func__` from index logging, since we have `-logsourcelocations`.
It also improves readability by putting `GetName()` in a more logical place.
Before
> coinstatsindex: best block of the index not found. Please rebuild the index.
After:
> best block of coinstatsindex not found. Please rebuild the index.
I found myself maintaining this commit as part of https://github.com/Sjors/bitcoin/pull/86, but since that might never land here, it seemed better to split it into its own PR (or get rid of it).
ACKs for top commit:
l0rinc:
Lightweight code review ACK c18bf0bd9b
maflcko:
review ACK c18bf0bd9b🚣
Tree-SHA512: 755948371e3ff7a5515b63ce48075631ec7868d69c3c1469176d5be0e8b28e1c071e206ae3f7320f87d8c441f815894acfef61621f05795b5ff6b8a5a3031e3b
This adds an `iters` parameter to DoWork(), which controls how much work it is
allowed to do right now.
Additionally, DoWork() won't stop at just getting everything ACCEPTABLE, but if
there is work budget left, will also attempt to get every cluster linearized
optimally.