Commit Graph

5923 Commits

Author SHA1 Message Date
Ava Chow
09f004bd9f Merge bitcoin/bitcoin#32945: tests: speed up coins_tests by parallelizing
06ab3a394a tests: speed up coins_tests by parallelizing (Anthony Towns)

Pull request description:

  Updates the cmake logic to generate a separate test for each BOOST_FIXTURE_TEST_SUITE declaration in a file, and splits coins_tests.cpp into three separate suites so that they can be run in parallel. Also updates the convention enforced by test/lint/lint-tests.py.

ACKs for top commit:
  l0rinc:
    reACK 06ab3a394a
  maflcko:
    lgtm ACK 06ab3a394a
  achow101:
    ACK 06ab3a394a

Tree-SHA512: 940d9aa31dab60d1000b5f57d8dc4b2c5b4045c7e5c979ac407aba39f2285d53bc00c5e4d7bf2247551fd7e1c8681144e11fc8c005a874282c4c59bd362fb467
2025-07-22 11:19:18 -07:00
brunoerg
065e42976a test: IsFinalTx returns true when there is no locktime 2025-07-22 10:03:07 -03:00
Anthony Towns
06ab3a394a tests: speed up coins_tests by parallelizing
Updates the cmake logic to generate a separate test for each
BOOST_FIXTURE_TEST_SUITE declaration in a file, and splits coins_tests.cpp
into three separate suites so that they can be run in parallel. Also
updates the convention enforced by test/lint/lint-tests.py.
2025-07-22 12:56:26 +10:00
merge-script
11c6a864c9 Merge bitcoin/bitcoin#33007: test: fix ReadTopologicalSet unsigned integer overflow
31c4e77a25 test: fix ReadTopologicalSet unsigned integer overflow (ismaelsadeeq)

Pull request description:

  This PR is a simple fix for a potential unsigned integer overflow in ReadTopologicalSet.
  We obtain the value of `mask` from fuzz input, which can be the maximum representable value.
  Adding 1 to it would then cause an overflow.

  The fix skips the addition when the read value is already the maximum.

  See https://github.com/bitcoin/bitcoin/pull/30605#discussion_r2215338569 for more context

ACKs for top commit:
  maflcko:
    lgtm ACK 31c4e77a25

Tree-SHA512: f58d7907f66a0de0ed8d4b1cad6a4971f65925a99f3c030537c21c4d84126b643257c65865242caf7d445b9cbb7a71a1816a9f870ab7520625c4c16cd41979cb
2025-07-21 12:08:05 +01:00
Ava Chow
5878f35446 Merge bitcoin/bitcoin#31144: [IBD] multi-byte block obfuscation
248b6a27c3 optimization: peel align-head and unroll body to 64 bytes (Lőrinc)
e7114fc6dc optimization: migrate fixed-size obfuscation from `std::vector<std::byte>` to `uint64_t` (Lőrinc)
478d40afc6 refactor: encapsulate `vector`/`array` keys into `Obfuscation` (Lőrinc)
377aab8e5a refactor: move `util::Xor` to `Obfuscation().Xor` (Lőrinc)
fa5d296e3b refactor: prepare mempool_persist for obfuscation key change (Lőrinc)
6bbf2d9311 refactor: prepare `DBWrapper` for obfuscation key change (Lőrinc)
0b8bec8aa6 scripted-diff: unify xor-vs-obfuscation nomenclature (Lőrinc)
972697976c bench: make ObfuscationBench more representative (Lőrinc)
618a30e326 test: compare util::Xor with randomized inputs against simple impl (Lőrinc)
a5141cd39e test: make sure dbwrapper obfuscation key is never obfuscated (Lőrinc)
54ab0bd64c refactor: commit to 8 byte obfuscation keys (Lőrinc)
7aa557a37b random: add fixed-size `std::array` generation (Lőrinc)

Pull request description:

  This change is part of [[IBD] - Tracking PR for speeding up Initial Block Download](https://github.com/bitcoin/bitcoin/pull/32043)

  ### Summary

  Current block obfuscations are done byte-by-byte, this PR batches them to 64 bit primitives to speed up obfuscating bigger memory batches.
  This is especially relevant now that https://github.com/bitcoin/bitcoin/pull/31551 was merged, having bigger obfuscatable chunks.

  Since this obfuscation is optional, the speedup measured here depends on whether it's a [random value](https://github.com/bitcoin/bitcoin/pull/31144#issuecomment-2523295114) or [completely turned off](https://github.com/bitcoin/bitcoin/pull/31144#issuecomment-2519764142) (i.e. XOR-ing with 0).

  ### Changes in testing, benchmarking and implementation

  * Added new tests comparing randomized inputs against a trivial implementation and performing roundtrip checks with random chunks.
  * Migrated `std::vector<std::byte>(8)` keys to plain `uint64_t`;
  * Process unaligned bytes separately and unroll body to 64 bytes.

  ### Assembly

  Memory alignment is enforced by a small peel-loop (`std::memcpy` is optimized out on tested platform), with an `std::assume_aligned<8>` check, see the Godbolt listing at https://godbolt.org/z/59EMv7h6Y for details

  <details>
  <summary>Details</summary>

  Target & Compiler | Stride (per hot-loop iter) | Main operation(s) in loop | Effective XORs / iter
  -- | -- | -- | --
  Clang x86-64 (trunk) | 64 bytes | 4 × movdqu → pxor → store | 8 × 64-bit
  GCC x86-64 (trunk) | 64 bytes | 4 × movdqu/pxor sequence, enabled by 8-way unroll | 8 × 64-bit
  GCC RV32 (trunk) | 8 bytes | copy 8 B to temp → 2 × 32-bit XOR → copy back | 1 × 64-bit (as 2 × 32-bit)
  GCC s390x (big-endian 14.2) | 64 bytes | 8 × XC (mem-mem 8-B XOR) with key cached on stack | 8 × 64-bit

  </details>

  ### Endianness

  The only endianness issue was with bit rotation, intended to realign the key if obfuscation halted before full key consumption.
  Elsewhere, memory is read, processed, and written back in the same endianness, preserving byte order.
  Since CI lacks a big-endian machine, testing was done locally via Docker.
  <details>
  <summary>Details</summary>

  ```bash
  brew install podman pigz
  softwareupdate --install-rosetta
  podman machine init
  podman machine start
  docker run --platform linux/s390x -it ubuntu:latest /bin/bash
    apt update && apt install -y git build-essential cmake ccache pkg-config libevent-dev libboost-dev libssl-dev libsqlite3-dev python3 && \
    cd /mnt && git clone --depth=1 https://github.com/bitcoin/bitcoin.git && cd bitcoin && git remote add l0rinc https://github.com/l0rinc/bitcoin.git && git fetch --all && git checkout l0rinc/optimize-xor && \
    cmake -B build && cmake --build build --target test_bitcoin -j$(nproc) && \
    ./build/bin/test_bitcoin --run_test=streams_tests
  ```

  </details>

  ### Measurements (micro benchmarks and full IBDs)

  > cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=gcc/clang -DCMAKE_CXX_COMPILER=g++/clang++ && \
    cmake --build build -j$(nproc) && \
    build/bin/bench_bitcoin -filter='ObfuscationBench' -min-time=5000

  <details>
  <summary>GNU 14.2.0</summary>

  > Before:

  |             ns/byte |              byte/s |    err% |        ins/byte |        cyc/byte |    IPC |       bra/byte |   miss% |     total | benchmark
  |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
  |                0.84 |    1,184,138,235.64 |    0.0% |            9.01 |            3.03 |  2.971 |           1.00 |    0.1% |      5.50 | `ObfuscationBench`

  > After (first optimizing commit):

  |             ns/byte |              byte/s |    err% |        ins/byte |        cyc/byte |    IPC |       bra/byte |   miss% |     total | benchmark
  |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
  |                0.04 |   28,365,698,819.44 |    0.0% |            0.34 |            0.13 |  2.714 |           0.07 |    0.0% |      5.33 | `ObfuscationBench`

  > and (second optimizing commit):

  |             ns/byte |              byte/s |    err% |        ins/byte |        cyc/byte |    IPC |       bra/byte |   miss% |     total | benchmark
  |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
  |                0.03 |   32,464,658,919.11 |    0.0% |            0.50 |            0.11 |  4.474 |           0.08 |    0.0% |      5.29 | `ObfuscationBench`

  </details>

  <details>
  <summary>Clang 20.1.7</summary>

  > Before:

  |             ns/byte |              byte/s |    err% |        ins/byte |        cyc/byte |    IPC |       bra/byte |   miss% |     total | benchmark
  |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
  |                0.89 |    1,124,087,330.23 |    0.1% |            6.52 |            3.20 |  2.041 |           0.50 |    0.2% |      5.50 | `ObfuscationBench`

  > After (first optimizing commit):

  |             ns/byte |              byte/s |    err% |        ins/byte |        cyc/byte |    IPC |       bra/byte |   miss% |     total | benchmark
  |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
  |                0.08 |   13,012,464,203.00 |    0.0% |            0.65 |            0.28 |  2.338 |           0.13 |    0.8% |      5.50 | `ObfuscationBench`

  > and (second optimizing commit):

  |             ns/byte |              byte/s |    err% |        ins/byte |        cyc/byte |    IPC |       bra/byte |   miss% |     total | benchmark
  |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
  |                0.02 |   41,231,547,045.17 |    0.0% |            0.30 |            0.09 |  3.463 |           0.02 |    0.0% |      5.47 | `ObfuscationBench`

  </details>

  i.e. 27.4x faster obfuscation with GCC, 36.7x faster with Clang

  For other benchmark speedups see  https://corecheck.dev/bitcoin/bitcoin/pulls/31144

  ------

  Running an IBD until 888888 blocks reveals a 4% speedup.

  <details>
  <summary>Details</summary>

  SSD:

  ```bash
  COMMITS="8324a00bd4a6a5291c841f2d01162d8a014ddb02 5ddfd31b4158a89b0007cfb2be970c03d9278525"; \
  STOP_HEIGHT=888888; DBCACHE=1000; \
  CC=gcc; CXX=g++; \
  BASE_DIR="/mnt/my_storage"; DATA_DIR="$BASE_DIR/BitcoinData"; LOG_DIR="$BASE_DIR/logs"; \
  (for c in $COMMITS; do git fetch origin $c -q && git log -1 --pretty=format:'%h %s' $c || exit 1; done) && \
  hyperfine \
    --sort 'command' \
    --runs 1 \
    --export-json "$BASE_DIR/ibd-${COMMITS// /-}-$STOP_HEIGHT-$DBCACHE-$CC.json" \
    --parameter-list COMMIT ${COMMITS// /,} \
    --prepare "killall bitcoind; rm -rf $DATA_DIR/*; git checkout {COMMIT}; git clean -fxd; git reset --hard; \
      cmake -B build -DCMAKE_BUILD_TYPE=Release -DENABLE_WALLET=OFF && \
      cmake --build build -j$(nproc) --target bitcoind && \
      ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=1 -printtoconsole=0; sleep 100" \
    --cleanup "cp $DATA_DIR/debug.log $LOG_DIR/debug-{COMMIT}-$(date +%s).log" \
    "COMPILER=$CC ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=$STOP_HEIGHT -dbcache=$DBCACHE -blocksonly -printtoconsole=0"
  ```

  > 8324a00bd4 test: Compare util::Xor with randomized inputs against simple impl
  > 5ddfd31b41 optimization: Xor 64 bits together instead of byte-by-byte

  ```python
  Benchmark 1: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=1000 -blocksonly -printtoconsole=0 (COMMIT = 8324a00bd4a6a5291c841f2d01162d8a014ddb02)
    Time (abs ≡):        25033.413 s               [User: 33953.984 s, System: 2613.604 s]

  Benchmark 2: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=1000 -blocksonly -printtoconsole=0 (COMMIT = 5ddfd31b4158a89b0007cfb2be970c03d9278525)
    Time (abs ≡):        24110.710 s               [User: 33389.536 s, System: 2660.292 s]

  Relative speed comparison
          1.04          COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=1000 -blocksonly -printtoconsole=0 (COMMIT = 8324a00bd4a6a5291c841f2d01162d8a014ddb02)
          1.00          COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=1000 -blocksonly -printtoconsole=0 (COMMIT = 5ddfd31b4158a89b0007cfb2be970c03d9278525)
  ```

  > HDD:

  ```bash
  COMMITS="71eb6eaa740ad0b28737e90e59b89a8e951d90d9 46854038e7984b599d25640de26d4680e62caba7"; \
  STOP_HEIGHT=888888; DBCACHE=4500; \
  CC=gcc; CXX=g++; \
  BASE_DIR="/mnt/my_storage"; DATA_DIR="$BASE_DIR/BitcoinData"; LOG_DIR="$BASE_DIR/logs"; \
  (for c in $COMMITS; do git fetch origin $c -q && git log -1 --pretty=format:'%h %s' $c || exit 1; done) && \
  hyperfine \
    --sort 'command' \
    --runs 2 \
    --export-json "$BASE_DIR/ibd-${COMMITS// /-}-$STOP_HEIGHT-$DBCACHE-$CC.json" \
    --parameter-list COMMIT ${COMMITS// /,} \
    --prepare "killall bitcoind; rm -rf $DATA_DIR/*; git checkout {COMMIT}; git clean -fxd; git reset --hard; \
      cmake -B build -DCMAKE_BUILD_TYPE=Release -DENABLE_WALLET=OFF && cmake --build build -j$(nproc) --target bitcoind && \
      ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=1 -printtoconsole=0; sleep 100" \
    --cleanup "cp $DATA_DIR/debug.log $LOG_DIR/debug-{COMMIT}-$(date +%s).log" \
    "COMPILER=$CC ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=$STOP_HEIGHT -dbcache=$DBCACHE -blocksonly -printtoconsole=0"
  ```

  > 71eb6eaa74 test: compare util::Xor with randomized inputs against simple impl
  > 46854038e7 optimization: migrate fixed-size obfuscation from `std::vector<std::byte>` to `uint64_t`

  ```python
  Benchmark 1: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=4500 -blocksonly -printtoconsole=0 (COMMIT = 71eb6eaa740ad0b28737e90e59b89a8e951d90d9)
    Time (mean ± σ):     37676.293 s ± 83.100 s    [User: 36900.535 s, System: 2220.382 s]
    Range (min … max):   37617.533 s … 37735.053 s    2 runs

  Benchmark 2: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=4500 -blocksonly -printtoconsole=0 (COMMIT = 46854038e7984b599d25640de26d4680e62caba7)
    Time (mean ± σ):     36181.287 s ± 195.248 s    [User: 34962.822 s, System: 1988.614 s]
    Range (min … max):   36043.226 s … 36319.349 s    2 runs

  Relative speed comparison
          1.04 ±  0.01  COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=4500 -blocksonly -printtoconsole=0 (COMMIT = 71eb6eaa740ad0b28737e90e59b89a8e951d90d9)
          1.00          COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=4500 -blocksonly -printtoconsole=0 (COMMIT = 46854038e7984b599d25640de26d4680e62caba7)
  ```

  </details>

ACKs for top commit:
  achow101:
    ACK 248b6a27c3
  maflcko:
    review ACK 248b6a27c3 🎻
  ryanofsky:
    Code review ACK 248b6a27c3. Looks good! Thanks for adapting this and considering all the suggestions. I did leave more comments below but non are important and this looks good as-is

Tree-SHA512: ef541cd8a1f1dc504613c4eaa708202e32ae5ac86f9c875e03bcdd6357121f6af0860ef83d513c473efa5445b701e59439d416effae1085a559716b0fd45ecd6
2025-07-18 22:17:11 -07:00
Ava Chow
e9edd43a95 Merge bitcoin/bitcoin#32521: policy: make pathological transactions packed with legacy sigops non-standard
96da68a38f qa: functional test a transaction running into the legacy sigop limit (Antoine Poinsot)
367147954d qa: unit test standardness of inputs packed with legacy sigops (Antoine Poinsot)
5863315e33 policy: make pathological transactions packed with legacy sigops non-standard. (Antoine Poinsot)

Pull request description:

  The Consensus Cleanup soft fork proposal includes a limit on the number of legacy signature
  operations potentially executed when validating a transaction. If this change is to be implemented
  here and activated by Bitcoin users in the future, we should make transactions that are not valid
  according to the new rules non-standard first because it would otherwise be a trivial DoS to
  potentially unupgraded miners after the soft fork activates.

  ML post: https://gnusha.org/pi/bitcoindev/49dyqqkf5NqGlGdinp6SELIoxzE_ONh3UIj6-EB8S804Id5yROq-b1uGK8DUru66eIlWuhb5R3nhRRutwuYjemiuOOBS2FQ4KWDnEh0wLuA=@protonmail.com/T/#u

ACKs for top commit:
  instagibbs:
    reACK 96da68a38f
  maflcko:
    review ACK 96da68a38f 🚋
  achow101:
    ACK 96da68a38f
  glozow:
    light code review ACK 96da68a38f, looks correct to me

Tree-SHA512: 106ffe62e48952affa31c5894a404a17a3b4ea8971815828166fba89069f757366129f7807205e8c6558beb75c6f67d8f9a41000be2f8cf95be3b1a02d87bfe9
2025-07-18 13:24:54 -07:00
ismaelsadeeq
31c4e77a25 test: fix ReadTopologicalSet unsigned integer overflow 2025-07-18 14:57:39 +01:00
Antoine Poinsot
367147954d qa: unit test standardness of inputs packed with legacy sigops
Check bounds and different output types.
2025-07-17 09:18:30 -04:00
Lőrinc
478d40afc6 refactor: encapsulate vector/array keys into Obfuscation 2025-07-16 14:33:07 -07:00
Lőrinc
377aab8e5a refactor: move util::Xor to Obfuscation().Xor
This is meant to focus the usages to narrow the scope of the obfuscation optimization.

`Obfuscation::Xor` is mostly a move.

Co-authored-by: maflcko <6399679+maflcko@users.noreply.github.com>
2025-07-16 14:33:07 -07:00
Lőrinc
0b8bec8aa6 scripted-diff: unify xor-vs-obfuscation nomenclature
Mechanical refactor of the low-level "xor" wording to signal the intent instead of the implementation used.
The renames are ordered by heaviest-hitting substitutions first, and were constructed such that after each replacement the code is still compilable.

-BEGIN VERIFY SCRIPT-
sed -i \
  -e 's/\bGetObfuscateKey\b/GetObfuscation/g' \
  -e 's/\bxor_key\b/obfuscation/g' \
  -e 's/\bxor_pat\b/obfuscation/g' \
  -e 's/\bm_xor_key\b/m_obfuscation/g' \
  -e 's/\bm_xor\b/m_obfuscation/g' \
  -e 's/\bobfuscate_key\b/m_obfuscation/g' \
  -e 's/\bOBFUSCATE_KEY_KEY\b/OBFUSCATION_KEY_KEY/g' \
  -e 's/\bSetXor(/SetObfuscation(/g' \
  -e 's/\bdata_xor\b/obfuscation/g' \
  -e 's/\bCreateObfuscateKey\b/CreateObfuscation/g' \
  -e 's/\bobfuscate key\b/obfuscation key/g' \
  $(git ls-files '*.cpp' '*.h')
-END VERIFY SCRIPT-
2025-07-16 14:32:01 -07:00
Lőrinc
618a30e326 test: compare util::Xor with randomized inputs against simple impl
The two tests are doing different things - `xor_roundtrip_random_chunks` does black-box style property-based testing to validate that certain invariants hold - that deobfuscating an obfuscation results in the original message (higher level, it doesn't have to know about the implementation details).

The `xor_bytes_reference` test makes sure the optimized xor implementation behaves in every imaginable scenario exactly as the simplest possible obfuscation - with random chunks, random alignment, random data, random key.

Since we're touching the file, other related small refactors were also applied:
* `nullpt` typo fixed;
* manual byte-by-byte xor key creations were replaced with `_hex` factories;
* since we're only using 64 bit keys in production, smaller keys were changed to reflect real-world usage;

Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>
2025-07-16 14:28:05 -07:00
Lőrinc
a5141cd39e test: make sure dbwrapper obfuscation key is never obfuscated 2025-07-16 14:18:23 -07:00
Lőrinc
54ab0bd64c refactor: commit to 8 byte obfuscation keys
Since 31 byte xor-keys are not used in the codebase, using the common size (8 bytes) makes the benchmarks more realistic.

Co-authored-by: maflcko <6399679+maflcko@users.noreply.github.com>
2025-07-16 13:19:18 -07:00
Lőrinc
7aa557a37b random: add fixed-size std::array generation
Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>
2025-07-16 13:19:18 -07:00
Pieter Wuille
b113877545 [fuzz] Add simulation fuzz test for TxOrphanage
This adds a large simulation fuzz test for all TxOrphanage public interface
functions, using a mix of comparison with expected behavior (in case it is
fully specified), and testing of properties exhibited otherwise.
2025-07-14 16:13:47 -04:00
Pieter Wuille
03aaaedc6d [prep] Return the made-reconsiderable announcements in AddChildrenToWorkSet
This is preparation for the simulation fuzz test added in a later commit. Since
AddChildrenToWorkSet consumes randomness, there is no way for the simulator to
exactly predict its behavior. By returning the set of made-reconsiderable announcements
instead, the simulator can instead test that it is *a* valid choice, and then
apply it to its own data structures.
2025-07-14 16:13:47 -04:00
glozow
ea29c4371e [p2p] bump DEFAULT_MAX_ORPHANAGE_LATENCY_SCORE to 3,000
For the default number of peers (125), allows each to relay a default
descendant package (up to 25-1=24 can be missing inputs) of small (9
inputs or fewer) transactions out of order.

This limit also gives acceptable bounds for worst case LimitOrphans iterations.

Functional tests aren't changed to check for larger cap because it would
make the runtime too long.

Also deletes the now-unused DEFAULT_MAX_ORPHAN_TRANSACTIONS.
2025-07-14 16:13:47 -04:00
glozow
24afee8d8f [fuzz] TxOrphanage protects peers that don't go over limit
Co-authored-by: Greg Sanders <gsanders87@gmail.com>
2025-07-14 16:13:47 -04:00
glozow
a2878cfb4a [unit test] strengthen GetChildrenFromSamePeer tests: results are in recency order 2025-07-14 16:13:47 -04:00
glozow
7ce3b7ee57 [unit test] basic TxOrphanage eviction and protection 2025-07-14 16:13:47 -04:00
glozow
4d23d1d7e7 [cleanup] remove unused rng param from LimitOrphans 2025-07-14 16:13:47 -04:00
glozow
067365d2a8 [p2p] overhaul TxOrphanage with smarter limits
This is largely a reimplementation using boost::multi_index_container.
All the same public methods are available. It has an index by outpoint,
per-peer tracking, peer worksets, etc.

A few differences:
- Limits have changed: instead of a global limit of 100 unique orphans,
  we have a maximum number of announcements (which can include duplicate
orphans) and a global memory limit which scales with the number of
peers.
- The maximum announcements limit is 100 to match the original limit,
  but this is actually a stricter limit because the announcement count
is not de-duplicated.
- Eviction strategy: when global limits are reached, a per-peer limit
  comes into play. While limits are exceeded, we choose the peer whose
“DoS score” (max usage / limit ratio for announcements and memory
limits) is highest and evict announcements by entry time, sorting
non-reconsiderable ones before reconsiderable ones. Since announcements
are unique by (wtxid, peer), as long as 1 announcement remains for a
transaction, it remains in the orphanage.
- This eviction strategy means no peer can influence the eviction of
  another peer’s orphans.
- Also, since global limits are a multiple of per-peer limits, as long
  as a peer does not exceed its limits, its orphans are protected from
eviction.
- Orphans no longer expire, since older announcements are generally
  removed before newer ones.
- GetChildrenFromSamePeer returns the transactions from newest to
  oldest.

Co-authored-by: Pieter Wuille <pieter@wuille.net>
2025-07-14 16:13:47 -04:00
glozow
3da6d7f8f6 [prep/refactor] make TxOrphanage a virtual class implemented by TxOrphanageImpl 2025-07-14 16:13:46 -04:00
glozow
77ebe8f280 [prep/test] have TxOrphanage remember its own limits in LimitOrphans
Move towards a model where TxOrphanage is initialized with limits that
it remembers throughout its lifetime.
Remove the param. Limiting by number of unique orphans will be removed
in a later commit.
Now that -maxorphantx is gone, this does not change the node behavior.
The parameter is only used in tests.
2025-07-14 16:13:10 -04:00
glozow
d0af4239b7 [prep/refactor] move DEFAULT_MAX_ORPHAN_TRANSACTIONS to txorphanage.h
This is move only.
2025-07-14 16:13:10 -04:00
glozow
8dd24c29ae [prep/test] modify test to not access TxOrphanage internals
These internals should and will be private.
2025-07-14 16:12:47 -04:00
glozow
44f5327824 [fuzz] add SeedRandomStateForTest(SeedRand::ZEROS) to txorphan 2025-07-11 13:52:50 -04:00
glozow
08e58fa911 [prep/refactor] move txorphanage to node namespace and directory
This is move-only.
2025-07-11 13:52:50 -04:00
merge-script
23e15d40b9 Merge bitcoin/bitcoin#32631: refactor: Convert GenTxid to std::variant
a60f863d3e scripted-diff: Replace GenTxidVariant with GenTxid (marcofleon)
c8ba199598 Remove old GenTxid class (marcofleon)
072a198ea4 Convert remaining instances of GenTxid to GenTxidVariant (marcofleon)
1b528391c7 Convert `txrequest` to GenTxidVariant (marcofleon)
bde4579b07 Convert `txdownloadman_impl` to GenTxidVariant (marcofleon)
c876a892ec Replace GenTxid with Txid/Wtxid overloads in `txmempool` (marcofleon)
de858ce2be move-only: make GetInfo a private CTxMemPool member (stickies-v)
eee473d9f3 Convert `CompareInvMempoolOrder` to GenTxidVariant (marcofleon)
243553d590 refactor: replace get_iter_from_wtxid with GetIter(const Wtxid&) (stickies-v)
fcf92fd640 refactor: make CTxMemPool::GetIter strongly typed (marcofleon)
11d28f21bb Implement GenTxid as a variant (marcofleon)

Pull request description:

  Part of the [type safety refactor](https://github.com/bitcoin/bitcoin/pull/32189).

  This PR changes the GenTxid class to a variant, which holds both Txids and Wtxids. This provides compile-time type safety and eliminates the manual type check (bool m_is_wtxid). Variables that can be either a Txid or a Wtxid are now using the new GenTxid variant, instead of uint256.

ACKs for top commit:
  w0xlt:
    ACK a60f863d3e
  dergoegge:
    Code review ACK a60f863d3e
  maflcko:
    review ACK a60f863d3e 🎽
  theStack:
    Code-review ACK a60f863d3e

Tree-SHA512: da9b73b7bdffee2eb9281a409205519ac330d3336094d17681896703fbca8099608782c9c85801e388e4d90af5af8abf1f34931f57bbbe6e9674d802d6066047
2025-07-11 13:47:19 -04:00
merge-script
12fb00fd42 Merge bitcoin/bitcoin#32927: fuzz: Add missing calls to SetMockTime for determinism
fa8862723c fuzz: CheckGlobals in init (MarcoFalke)
fa26bfde98 test: Avoid resetting mocktime in testing setup (MarcoFalke)
fa6b45fa8e Add SetMockTime for time_point types (MarcoFalke)

Pull request description:

  (Tracking issue https://github.com/bitcoin/bitcoin/issues/29018)

  During fuzzing, `AppInitParameterInteraction` may actually disable a previously set mocktime. This is confusing and can also cause non-determinism.

  Fix this issue, by

  * fixing the erroneous `-mocktime` parsing in `AppInitParameterInteraction`.
  * adding the missing `SetMockTime` calls to the affected fuzz init functions.
  * adding a `CheckGlobals` to the fuzz init, to prevent this issue in the future.

  This can be tested by

  * Cherry-picking the `CheckGlobals`-commit onto current master and observing a fuzz failure in the touched fuzz targets.
  * Reverting the touched fuzz fixups and observing a fuzz failure for each target.

ACKs for top commit:
  w0xlt:
    ACK fa8862723c
  dergoegge:
    utACK fa8862723c

Tree-SHA512: 5a9400f0467c82fa224713af4cc2b525afbefefc7c3f419077110925ad7af6c7fda3dcd2b50f7facf0ee7df2547c6ac20336906d707adcdfd1d652a9d9a735fe
2025-07-11 11:18:03 +01:00
merge-script
5ef0d4897b Merge bitcoin/bitcoin#30605: Cluster linearization: separate tests from tests-of-tests
d7fca5c171 clusterlin: add big comment explaning the relation between tests (Pieter Wuille)
b64e61d2de clusterlin: abstract try-permutations into ExhaustiveLinearize function (Pieter Wuille)
1fa55a64ed clusterlin tests: verify that chunks are minimal (Pieter Wuille)
da23ecef29 clusterlin tests: support non-empty ReadTopologicalSubset() (Pieter Wuille)
94f3e17c33 clusterlin tests: compare with fuzz-provided linearizations (Pieter Wuille)
5f92ebee0d clusterlin tests: compare with fuzz-provided topological sets (Pieter Wuille)
6e37824ac3 clusterlin tests: optimize clusterlin_simple_linearize (Pieter Wuille)
98c1c88b6f clusterlin tests: separate testing of SimpleLinearize and Linearize (Pieter Wuille)
10e90f7aef clusterlin tests: make SimpleCandidateFinder always find connected (Pieter Wuille)
a38c38951e clusterlin tests: separate testing of Search- and SimpleCandidateFinder (Pieter Wuille)
77a432ee70 clusterlin tests: count SimpleCandidateFinder iterations better (Pieter Wuille)

Pull request description:

  Part of the cluster mempool project: #30289

  The current cluster linearization fuzz tests contain two tests which combine testing of production code with testing of the test code itself:
  * `clusterlin_search_finder`: establishes the correctness of `SearchCandidateFinder` by comparing against both `SimpleCandidateFinder` and `ExhaustiveCandidateFinder` (which is even more simple than `SimpleCandidateFinder`). If `SimpleCandidateFinder` works correctly, then this comparison with `ExhaustiveCandidateFinder` is redundant. If it isn't, we ought to find that in a test specific to `SimpleCandidateFinder` rather than as a side-effect of testing `SearchCandidateFinder`. Split this functionality out into a new `clusterlin_simple_finder`.
  * `clusterlin_linearize`: establishes the correctness of `Linearize` by comparing against both `SimpleLinearize` and literally every valid linearization for the cluster. Again, if `SimpleLinearize` works correctly, then this comparison with all valid linearizations is redundant, and if it isn't we should find it in a test for `SimpleLinearize`. Do so by splitting off that functionality into `clusterlin_simple_linearize`.

  After that, a few general improvements to the affected tests are made (comparing with linearizations and subsets read from the fuzz input, plus a performance improvement).

ACKs for top commit:
  marcofleon:
    Re ACK d7fca5c171
  ismaelsadeeq:
    re-ACK d7fca5c171
  monlovesmango:
    ACK d7fca5c171

Tree-SHA512: 33cb76bd9b9547a5f3ee231fa452e928f064ad03af98e3d9e64246eb972f2b026c13e7367257ccdac1ae57982ee8ef98c907684588ecbb4bc4c82cbec160b3e8
2025-07-10 13:52:31 -04:00
Ava Chow
a40e953658 Merge bitcoin/bitcoin#30479: validation: Add eligible ancestors of reconsidered block to setBlockIndexCandidates
8cc3ac6c23 validation: Don't use IsValid() to filter for invalid blocks (Martin Zumsande)
86d98b94e5 test: verify that ancestors of a reconsidered block can become the chain tip (stratospher)
3c39a55e64 validation: Add ancestors of reconsiderblock to setBlockIndexCandidates (Martin Zumsande)

Pull request description:

  When we call `reconsiderblock` for some block,  `Chainstate::ResetBlockFailureFlags` puts the descendants of that block into `setBlockIndexCandidates` (if they meet the criteria, i.e. have more work than the tip etc.), but never put any ancestors into the set even though we do clear their failure flags.

  I think that this is wrong, because `setBlockIndexCandidates` should always contain all eligible indexes that have at least as much work as the current tip, which can include ancestors of the reconsidered block. This is being checked by `CheckBlockIndex()`, which could fail if it was invoked after `ActivateBestChain` connects a block and releases `cs_main`:
  ``` diff
  diff --git a/src/validation.cpp b/src/validation.cpp
  index 7b04bd9a5b..ff0c3c9f58 100644
  --- a/src/validation.cpp
  +++ b/src/validation.cpp
  @@ -3551,6 +3551,7 @@ bool Chainstate::ActivateBestChain(BlockValidationState& state, std::shared_ptr<
               }
           }
           // When we reach this point, we switched to a new tip (stored in pindexNewTip).
  +        m_chainman.CheckBlockIndex();
   
           if (exited_ibd) {
               // If a background chainstate is in use, we may need to rebalance our
  ```
  makes `rpc_invalidateblock.py` fail on master.

  Even though we don't currently have a `CheckBlockIndex()` in that place, after `cs_main` is released other threads could invoke it, which is happening in the rare failures of #16444 where an invalid header received from another peer could trigger a `CheckBlockIndex()` call that would fail.

  Fix this by adding eligible ancestors to `setBlockIndexCandidates` in `Chainstate::ResetBlockFailureFlags` (also simplifying that function a bit).

  Fixes #16444

ACKs for top commit:
  achow101:
    ACK 8cc3ac6c23
  TheCharlatan:
    Re-ACK 8cc3ac6c23
  stratospher:
    reACK 8cc3ac6.

Tree-SHA512: 53f27591916246be4093d64b86a0494e55094abd8c586026b1247e4a36747bc3d6dbe46dc26ee4a22f47b8eb0d9699d13e577dee0e7198145f3c9b11ab2a30b7
2025-07-09 16:55:43 -07:00
Eugene Siegel
d541409a64 log: Add rate limiting to LogPrintf, LogInfo, LogWarning, LogError, LogPrintLevel
To mitigate disk-filling attacks caused by unsafe usages of LogPrintf and
friends, we rate-limit them by passing a should_ratelimit bool that
eventually makes its way to LogPrintStr which may call
LogRateLimiter::Consume. The rate limiting is accomplished by
adding a LogRateLimiter member to BCLog::Logger which tracks source
code locations for the given logging window.

Every hour, a source location can log up to 1MiB of data. Source
locations that exceed the limit will have their logs suppressed for the
rest of the window determined by m_limiter.

This change affects the public LogPrintLevel function if called with
a level >= BCLog::Level::Info.

The UpdateTipLog function has been changed to use the private LogPrintLevel_
macro with should_ratelimit set to false. This allows UpdateTipLog to log
during IBD without hitting the rate limit.

Note that on restart, a source location that was rate limited before the
restart will be able to log until it hits the rate limit again.

Co-Authored-By: Niklas Gogge <n.goeggi@gmail.com>
Co-Authored-By: stickies-v <stickies-v@protonmail.com>
2025-07-09 09:13:00 -04:00
Eugene Siegel
a6a35cc0c2 log: use std::source_location in place of __func__, __FILE__, __LINE__
The std::source_location conveniently stores the file name, line number,
and function name of a source code location. We switch to using it instead
of the __func__ identifier and the __FILE__ and __LINE__ macros.

BufferedLog is changed to have a std::source_location member, replacing the
source_file, source_line, and logging_function members. As a result,
MemUsage no longer explicitly counts source_file or logging_function as the
std::source_location memory usage is included in the MallocUsage call.

This also changes the behavior of -logsourcelocations as std::source_location
includes the entire function signature. Because of this, the functional test
feature_config_args.py must be changed to no longer include the function
signature as the function signature can differ across platforms.

Co-Authored-By: Niklas Gogge <n.goeggi@gmail.com>
Co-Authored-By: stickies-v <stickies-v@protonmail.com>
2025-07-09 09:12:59 -04:00
Eugene Siegel
afb9e39ec5 log: introduce LogRateLimiter, LogLimitStats, Status
LogRateLimiter will be used to keep track of source locations and our
current time-based logging window. It contains an unordered_map and a
m_suppressions_active bool to track source locations. The map is keyed
by std::source_location, so a custom Hash function (SourceLocationHasher)
and custom KeyEqual function (SourceLocationEqual) is provided.
SourceLocationHasher uses CSipHasher(0,0) under the hood to get a
uniform distribution.

A public Reset method is provided so that a scheduler (e.g. the
"b-scheduler" thread) can periodically reset LogRateLimiter's state when
the time window has elapsed.

The LogRateLimiter::Consume method checks if we have enough available
bytes in our rate limiting budget to log an additional string. It
returns a Status enum that denotes the rate limiting status and can
be used by the caller to emit a warning, skip logging, etc.

The Status enum has three states:
- UNSUPPRESSED     (logging was successful)
- NEWLY_SUPPRESSED (logging was succcesful, next log will be suppressed)
- STILL_SUPPRESSED (logging was unsuccessful)

LogLimitStats counts the available bytes left for logging per source
location for the current logging window. It does not track actual source
locations; it is used as a value in m_source_locations.

Also exposes a SuppressionsActive() method so the logger can use
that in a later commit to prefix [*] to logs whenenever suppressions
are active.

Co-Authored-By: Niklas Gogge <n.goeggi@gmail.com>
Co-Authored-By: stickies-v <stickies-v@protonmail.com>
2025-07-09 09:12:59 -04:00
Eugene Siegel
df7972a6cf test: Mark ~DebugLogHelper as noexcept(false)
We mark ~DebugLogHelper as noexcept(false) to be able to catch the
exception it throws. This lets us use it in test in combination with
BOOST_CHECK_THROW and BOOST_CHECK_NO_THROW to check that certain log
messages are (not) logged.

Co-Authored-By: Niklas Gogge <n.goeggi@gmail.com>
2025-07-09 09:12:59 -04:00
MarcoFalke
fa8862723c fuzz: CheckGlobals in init 2025-07-09 14:28:23 +02:00
MarcoFalke
fa26bfde98 test: Avoid resetting mocktime in testing setup
This allows to set the mocktime before the testing setup.

Also, in some fuzz tests the mocktime was reset to 0 before this change,
so set it.
2025-07-09 14:28:14 +02:00
marcofleon
a60f863d3e scripted-diff: Replace GenTxidVariant with GenTxid
-BEGIN VERIFY SCRIPT-
sed -i 's/GenTxidVariant/GenTxid/g' $(git grep -l 'GenTxidVariant')
-END VERIFY SCRIPT-
2025-07-08 20:00:51 +01:00
marcofleon
1b528391c7 Convert txrequest to GenTxidVariant
Switch all instances of GenTxid to the new variant
in `txrequest` and complete `txdownloadman_impl` by
converting `GetRequestsToSend`.
2025-07-08 20:00:51 +01:00
marcofleon
bde4579b07 Convert txdownloadman_impl to GenTxidVariant
Convert all of `txdownloadman_impl` to the new variant except for
`GetRequestsToSend`, which will be easier to switch at the same
time as `txrequest`.
2025-07-08 20:00:43 +01:00
marcofleon
c876a892ec Replace GenTxid with Txid/Wtxid overloads in txmempool
Co-authored-by: stickies-v <stickies-v@protonmail.com>
2025-07-08 19:31:02 +01:00
merge-script
87ab69155d Merge bitcoin/bitcoin#31553: cluster mempool: add TxGraph reorg functionality
1632fc104b txgraph: Track multiple potential would-be clusters in Trim (improvement) (Pieter Wuille)
4608df37e0 txgraph: add Trim benchmark (benchmark) (Pieter Wuille)
9c436ff01c txgraph: add fuzz test scenario that avoids cycles inside Trim() (tests) (Pieter Wuille)
938e86f8fe txgraph: add unit test for TxGraph::Trim (tests) (glozow)
a04e205ab0 txgraph: Add ability to trim oversized clusters (feature) (Pieter Wuille)
eabcd0eb6f txgraph: remove unnecessary m_group_oversized (simplification) (Greg Sanders)
19b14e61ea txgraph: Permit transactions that exceed cluster size limit (feature) (Pieter Wuille)
c4287b9b71 txgraph: Add ability to configure maximum cluster size/weight (feature) (Pieter Wuille)

Pull request description:

  Part of cluster mempool (#30289).

  During reorganisations, it is possible that dependencies get added which would result in clusters that violate policy limits (cluster count, cluster weight), when linking the new from-block transactions to the old from-mempool transactions. Unlike RBF scenarios, we cannot simply reject the changes when they are due to received blocks. To accommodate this, add a `TxGraph::Trim()`, which removes some subset of transactions (including descendants) in order to make all resulting clusters satisfy the limits.

  Conceptually, the way this is done is by defining a rudimentary linearization for the entire would-be too-large cluster, iterating it from beginning to end, and reasoning about the counts and weights of the clusters that would be reached using transactions up to that point. If a transaction is encountered whose addition would violate the limit, it is removed, together with all its descendants.

  This rudimentary linearization is like a merge sort of the chunks of the clusters being combined, but respecting topology. More specifically, it is continuously picking the highest-chunk-feerate remaining transaction among those which have no unmet dependencies left. For efficiency, this rudimentary linearization is computed lazily, by putting all viable transactions in a heap, sorted by chunk feerate, and adding new transactions to it as they become viable.

  The `Trim()` function is rather unusual compared to the `TxGraph` functionality added in previous PRs, in that `Trim()` makes it own decisions about what the resulting graph contents will be, without good specification of how it makes that decision - it is just a best-effort attempt (which is improved in the last commit). All other `TxGraph` mutators are simply to inform the graph about changes the calling mempool code decided on; this one lets the decision be made by txgraph.

  As part of this, the "oversized" property is expanded to also encompass a configurable cluster weight limit (in addition to cluster count limit).

ACKs for top commit:
  instagibbs:
    reACK 1632fc104b
  glozow:
    reACK 1632fc104b via range-diff
  ismaelsadeeq:
    reACK 1632fc104b 🛰️

Tree-SHA512: ccacb54be8ad622bd2717905fc9b7e42aea4b07f824de1924da9237027a97a9a2f1b862bc6a791cbd2e1a01897ad2c7c73c398a2d5ccbce90bfbeac0bcebc9ce
2025-07-07 16:11:51 -04:00
merge-script
d33c111448 Merge bitcoin/bitcoin#32829: threading: use correct mutex name in reverse_lock fatal error messages
de4eef52d1 threading: use correct mutex name in reverse_lock fatal error messages (Cory Fields)

Pull request description:

  "Now that REVERSE_LOCK requires the name of the actual mutex, it can be used for better error messages." - theuni

  This is a follow-up to this comment https://github.com/bitcoin/bitcoin/pull/32465#issuecomment-2981287545

  I just cherry-picked the commit 85c2848eb575f4abaa81fdd4e8f3b2048693dd98

ACKs for top commit:
  theuni:
    Re-ACK de4eef52d1
  TheCharlatan:
    ACK de4eef52d1

Tree-SHA512: 1109381e1f0589093f7c737cb1ebd1c43324a9e1ea34b5f05a9171d06ab44cca0c5ead43c581f6e37ded1f0463ab8a280f3319c288d39a4625109b5c08a7cb68
2025-07-07 15:51:37 +01:00
Cory Fields
de4eef52d1 threading: use correct mutex name in reverse_lock fatal error messages
Now that REVERSE_LOCK requires the name of the actual mutex, it can be used for
better error messages.
2025-07-07 10:34:05 -04:00
Ava Chow
ea4285775e Merge bitcoin/bitcoin#29307: util: explicitly close all AutoFiles that have been written
c10e382d2a flatfile: check whether the file has been closed successfully (Vasil Dimov)
4bb5dd78ea util: check that a file has been closed before ~AutoFile() is called (Vasil Dimov)
8bb34f07df Explicitly close all AutoFiles that have been written (Vasil Dimov)
a69c4098b2 rpc: take ownership of the file by WriteUTXOSnapshot() (Hodlinator)

Pull request description:

  `fclose(3)` may fail to flush the previously written data to disk, thus a failing `fclose(3)` is as serious as a failing `fwrite(3)`.

  Previously the code ignored `fclose(3)` failures. This PR improves that by changing all users of `AutoFile` that use it to write data to explicitly close the file and handle a possible error.

  ---

  Other alternatives are:

  1. `fflush(3)` after each write to the file (and throw if it fails from the `AutoFile::write()` method) and hope that `fclose(3)` will then always succeed. Assert that it succeeds from the destructor 🙄. Will hurt performance.
  2. Throw nevertheless from the destructor. Exception within the exception in C++ I think results in terminating the program without a useful message.
  3. (this is implemented in the latest incarnation of this PR) Redesign `AutoFile` so that its destructor cannot fail. Adjust _all_ its users 😭. For example, if the file has been written to, then require the callers to explicitly call the `AutoFile::fclose()` method before the object goes out of scope. In the destructor, as a sanity check, assume/assert that this is indeed the case. Defeats the purpose of a RAII wrapper for `FILE*` which automatically closes the file when it goes out of scope and there are a lot of users of `AutoFile`.
  4. Pass a new callback function to the `AutoFile` constructor which will be called from the destructor to handle `fclose()` errors, as described in https://github.com/bitcoin/bitcoin/pull/29307#issuecomment-2243842400. My thinking is that if that callback is going to only log a message, then we can log the message directly from the destructor without needing a callback. If the callback is going to do more complicated error handling then it is easier to do that at the call site by directly calling `AutoFile::fclose()` instead of getting the `AutoFile` object out of scope (so that its destructor is called) and inspecting for side effects done by the callback (e.g. set a variable to indicate a failed `fclose()`).

ACKs for top commit:
  l0rinc:
    ACK c10e382d2a
  achow101:
    ACK c10e382d2a
  hodlinator:
    re-ACK c10e382d2a

Tree-SHA512: 3994ca57e5b2b649fc84f24dad144173b7500fc0e914e06291d5c32fbbf8d2b1f8eae0040abd7a5f16095ddf4e11fe1636c6092f49058cda34f3eb2ee536d7ba
2025-07-03 15:37:44 -07:00
merge-script
49d5f1f2c6 Merge bitcoin/bitcoin#32850: test: check P2SH sigop count for coinbase tx
d6aaffcb11 test: check P2SH sigop count for coinbase tx (brunoerg)

Pull request description:

  We currently do not test that `GetP2SHSigOpCount` returns 0 for coinbase transactions (see line L129 at https://corecheck.dev/mutation/src/consensus/tx_verify.cpp). This PR addresses it.

ACKs for top commit:
  darosior:
    That said, i guess unit-tested dead consensus code is better than not-unit-tested dead consensus code. utACK d6aaffcb11
  theStack:
    ACK d6aaffcb11
  w0xlt:
    ACK d6aaffcb11
  ishaanam:
    ACK d6aaffcb11
  pablomartin4btc:
    ACK d6aaffcb11

Tree-SHA512: a7d7306f064bb2ec7e93e92625848ae38e150ebb67bde37cd15be1038816b154e867ad21ecd2685d8de5341b67e3b768d30b7654e27b541f33e8f9d63e52261d
2025-07-03 09:46:53 +01:00
Pieter Wuille
1632fc104b txgraph: Track multiple potential would-be clusters in Trim (improvement)
In the existing Trim function, as soon as the set of accepted transactions
would exceed the max cluster size or count limit, the acceptance loop is
stopped, removing all later transactions. However, it is possible that by
excluding some of those transactions the would-be cluster splits apart into
multiple would-clusters. And those clusters may well permit far more
transactions before their limits are reached.

Take this into account by using a union-find structure inside TrimTxData to
keep track of the count/size of all would-be clusters that would be formed
at any point, and only reject transactions which would cause these resulting
partitions to exceed their limits.

This is not an optimization in terms of CPU usage or memory; it just
improves the quality of the transactions removed by Trim().
2025-07-02 16:01:57 -04:00
Pieter Wuille
9c436ff01c txgraph: add fuzz test scenario that avoids cycles inside Trim() (tests)
Trim internally builds an approximate dependency graph of the merged cluster,
replacing all existing dependencies within existing clusters with a simple
linear chain of dependencies. This helps keep the complexity of the merging
operation down, but may result in cycles to appear in the general case, even
though in real scenarios (where Trim is called for stitching re-added mempool
transactions after a reorg back to the existing mempool transactions) such
cycles are not possible.

Add a test that specifically targets Trim() but in scenarios where it is
guaranteed not to have any cycles. It is a special case, is much more a
whitebox test than a blackbox test, and relies on randomness rather than
fuzz input. The upside is that somewhat stronger properties can be tested.

Co-authored-by: Greg Sanders <gsanders87@gmail.com>
2025-07-02 15:06:53 -04:00