Merge bitcoin/bitcoin#32279: [IBD] prevector: store P2WSH/P2TR/P2PK scripts inline

d5104cfbae prevector: store `P2WSH`/`P2TR`/`P2PK` scripts inline (Lőrinc) 52121506b2 test: assert `CScript` allocation characteristics (Lőrinc) 65ac7f6d4d refactor: modernize `CScriptBase` definition (Lőrinc) 756da2a994 refactor: extract `STATIC_SIZE` constant to prevector (Lőrinc) Pull request description: This change is part of [[IBD] - Tracking PR for speeding up Initial Block Download](https://github.com/bitcoin/bitcoin/pull/32043) ### Summary The current `prevector` size of 28 bytes (chosen to fill the `sizeof(CScript)` aligned size) was introduced in 2015 (https://github.com/bitcoin/bitcoin/pull/6914) before `SegWit` and `TapRoot`. However, the increasingly common `P2WSH` and `P2TR` scripts are both 34 bytes, and are forced to use heap (re)allocation rather than efficient inline storage. The core trade-off of this change is to eliminate heap allocations for common 34-36 byte scripts at the cost of increasing the base memory footprint of all `CScript` objects by 8 bytes (while still respecting peak memory usage defined by `-dbcache`). ### Context Increasing the `prevector` size allows these scripts to be stored inline, avoiding heap allocations, reducing potential memory fragmentation, and improving performance during cache flushes. Massif analysis confirms a lower stable memory usage after flushing, suggesting the elimination of heap allocations outweighs the larger base size for common workloads. Due to memory alignment, increasing the prevector size to 36 bytes doesn't change the overall `sizeof(CScript)` compared to an increase to 34 bytes, allowing us to include `P2PK` scripts as well at no additional memory cost. <details> <summary>Massif measurements</summary> > dbcache=440 Massif before, with a heap threshold of `28`: ```bash MB 744.1^# |#: ::::::@: ::::::: :@:: @::::::::::::::@@ |#: ::::::@::::: ::: :@:::@:::::: :: ::::@ |#: ::::::@::::: ::: :@:::@:::::: :: ::::@ |#: ::::::@::::: ::: : :@:::@:::::: :: ::::@ |#: ::::::@::::: ::: : :@:::@:::::: :: ::::@ |#: ::::::@::::: ::: : :@:::@:::::: :: ::::@ |#::::::::@::::: ::: : :@:::@:::::: :: ::::@ |#::::::::@::::: ::: :::@:::@:::::: :: ::::@ |#::::::::@::::: ::: :::@:::@:::::: :: ::::@ |#::::::::@::::: ::: :::@:::@:::::: :: ::::@ |#::::::::@::::: ::: :::@:::@:::::: :: ::::@ |#::::::::@::::: :::::::@:::@:::::: :: ::::@ |#::::::::@::::: :::::::@:::@:::::: :: ::::@ :::::@:::::@:::::@:::::@:::: |#::::::::@::::: :::::::@:::@:::::: :: ::::@ :::::@:::::@:::::@:::::@:::: |#::::::::@::::: :::::::@:::@:::::: :: ::::@ :::::@:::::@:::::@:::::@:::: |#::::::::@::::: :::::::@:::@:::::: :: ::::@ :::::@:::::@:::::@:::::@:::: |#::::::::@::::: :::::::@:::@:::::: :: ::::@ :::::@:::::@:::::@:::::@:::: |#::::::::@::::: :::::::@:::@:::::: :: ::::@ :::::@:::::@:::::@:::::@:::: |#::::::::@::::: :::::::@:::@:::::: :: ::::@ :::::@:::::@:::::@:::::@:::: 0 +----------------------------------------------------------------------->h 0 1.805 ``` and after, with a heap threshold of `36`: ```bash MB 744.2^ : |# : ::::::::::: : : :: ::: @@:::::: :: : |# : :::: :::::: : : :: ::: @ :: :: : : |# : :::: ::::::: : :@:: ::: @ :: :: : ::: |# : :::: ::::::: : :@:: ::: @ :: :: : : : |# : :::: ::::::: : :@:: ::: @ :: :: : : : |# : :::: ::::::: : :@:: ::: @ :: :: : : : |# :: :::: ::::::: : :@:: ::: @ :: :: : : : |# :: :::: ::::::: : :@:: ::::@ :: :: : : : |#:::: :::: ::::::: :::@:: ::::@ :: :: : : : |#: ::::::: ::::::: :::@:: ::::@ :: :: @: : : |#: ::::::: ::::::: :::@:::::::@ :: :: @: : : |#: ::::::: ::::::::::::@:::::::@ :: :: @: : : |#: ::::::: :::::::: :::@:::::::@ :: :: @: : : |#: ::::::: :::::::: :::@:::::::@ :: :: @: : :::@:::@::::@::::::@:::::@:: |#: ::::::: :::::::: :::@:::::::@ :: :: @: : :::@:::@::::@::::::@:::::@:: |#: ::::::: :::::::: :::@:::::::@ :: :: @: : :::@:::@::::@::::::@:::::@:: |#: ::::::: :::::::: :::@:::::::@ :: :: @: : :::@:::@::::@::::::@:::::@:: |#: ::::::: :::::::: :::@:::::::@ :: :: @: : :::@:::@::::@::::::@:::::@:: |#: ::::::: :::::::: :::@:::::::@ :: :: @: : :::@:::@::::@::::::@:::::@:: 0 +----------------------------------------------------------------------->h 0 1.618 ``` --- > for `dbcache=4500`: Massif before, with a heap threshold of `28`: ```bash GB 4.565^ :: | ##: @@::: :::: :@:::: :::: :::: | # : @ :: ::: :@: :: : :: ::: | # : @ :: ::::: :@: :: : :: ::: | # : @ :: : ::: :@: :: @: :: ::: | # : @ :: : ::: :@: :: @: :: ::: | # : @ :: : ::: :@: :: @: :: ::: | # : @ :: : ::: :@: :: @: :: ::: | # : ::@ :: : ::: :@: :: @: :: ::: | # : : @ :: : ::: :@: :: @: :: ::: | # : : @ :: : ::: :@: :: @: :::::: | # : : @ :: : ::: :@: :: @: :::::: | # : : @ :: : ::: :@: :: @: :::::: | # : : @ :: : ::: ::@: :: @: :::::: | # : : @ :: : ::: ::@: :: @: :::::: | # : : @ :: : ::: ::@: :: @: :::::: | # : : @ :: : ::: ::@: :: @: :::::: @:: | # : : @ :: : ::: ::@: :: @: :::::: @: | # : : @ :: : ::: ::@: :::@: :::::: @: | # : : @ :: : ::: ::@: :::@: :::::: @: :::::::::::::::::::::::::::::@::: 0 +----------------------------------------------------------------------->h 0 1.500 ``` and after, with a heap threshold of `36`: ``` GB 4.640^ : | ##:: ::::: :::: ::::::@ :::: | # :: : ::: :::: :: :::@ :::: | # :: :: ::: :::: :: :::@ :::: | # :: :: ::: ::::: :: :::@ :::: | # :: :: ::: ::::: :: :::@ :::: | # :: :: ::: ::::: :: :::@ :::: | # :: :: ::: ::::: :: :::@ :::: | # :: :: ::: ::::: :: :::@ :::: :@@ | # :: :: ::: ::::: ::: :::@ :::::::@ | # :: :: ::: ::::: ::: :::@ ::::: :@ | # :: :: ::: ::::: ::: :::@::::::: :@ | # ::::: ::: ::::: ::: :::@: ::::: :@ | # ::::: ::: ::::: ::: :::@: ::::: :@ | # ::::: :::::::::: ::: :::@: ::::: :@ | # ::::: :::: ::::: ::: :::@: ::::: :@ | # ::::: :::: ::::: ::: :::@: ::::: :@ | # ::::: :::: ::::::::: :::@: ::::: :@ | # ::::: :::: ::::::::: :::@: ::::: :@ | # ::::: :::: ::::::::: :::@: ::::: :@ ::::::@:::@:::@::::@:::::@::::@:: 0 +----------------------------------------------------------------------->h 0 1.360 ``` </details> ### Benchmarks and Memory Performance benchmarks for `AssumeUTXO` load and flush show: - Small dbcache (450MB): ~1-3% performance improvement (despite more frequent flushes) - Large dbcache (4500MB): ~6-8% performance improvement due to fewer heap allocations (and basically the number of flushes) - Very large dbcache (4500MB): ~5-6% performance improvement due to fewer heap allocations (and memory limit not being reached, so there's no memory penalty) Full IBD and `-reindex-chainstate` with also show an overall ~3-4% speedup (both for smaller and larger dbcache values). We haven't investigated using different `prevector` sizes based on script type, though this could be explored in the future if needed. ### Historical explanation for the speedup (by [Anthony Towns](https://github.com/bitcoin/bitcoin/pull/32279#issuecomment-3111757079)) > I think the tradeoff is something like: > > * spends of p2pk, p2sh, p2pkh coins -- these cost 8 more bytes > * spends of p2wpkh -- these cost 16 more bytes (sPK and scriptSig didn't need an allocation) > * spends of p2wsh and p2tr -- these cost ~48 fewer bytes (save 64 byte allocation on 64bit system, lose 8 bytes for both scriptSig and sPK) > * spends of nested p2wsh -- presumably save ~96 bytes, since the scriptSig would save an allocation, but I'm bundling it in the previous section > > Based on mainnet.observer stats for 2025-05-08, p2wpkh is about 55% of txs, p2tr is about 28%, p2pkh about 13%, p2wsh about 4% and the rest is noise, maybe? Those numbers net out to a saving of ~5.5 bytes per input. If p2wpkh rose from 55% to 80% and p2tr dropped to 20%, that would net to wasting ~3.2 bytes per input. ACKs for top commit: maflcko: review ACK d5104cfbae 🐺 achow101: reACK d5104cfbae jonatack: Review ACK d5104cfbae andrewtoth: ACK d5104cfbae Tree-SHA512: 7c5271ebaf4f6d91dc4b679ecbde4b7d0467579f072289f30da988a17c38a552d0b8cdf0e9c001739975dd019894c35e541908571527916cec56e04a8e214ae2
2025-11-15 16:38:23 +01:00 · 2025-07-28 11:55:48 -07:00
parent 2a97ff466d d5104cfbae
commit 321984705d
6 changed files with 131 additions and 28 deletions
--- a/src/bench/checkqueue.cpp
+++ b/src/bench/checkqueue.cpp
@@ -8,6 +8,7 @@
 #include <key.h>
 #include <prevector.h>
 #include <random.h>
+#include <script/script.h>

 #include <cstddef>
 #include <cstdint>
@@ -16,7 +17,6 @@

 static const size_t BATCHES = 101;
 static const size_t BATCH_SIZE = 30;
-static const int PREVECTOR_SIZE = 28;
 static const unsigned int QUEUE_BATCH_SIZE = 128;

 // This Benchmark tests the CheckQueue with a slightly realistic workload,
@@ -30,9 +30,9 @@ static void CCheckQueueSpeedPrevectorJob(benchmark::Bench& bench)
    ECC_Context ecc_context{};

    struct PrevectorJob {
-        prevector<PREVECTOR_SIZE, uint8_t> p;
+        prevector<CScriptBase::STATIC_SIZE, uint8_t> p;
        explicit PrevectorJob(FastRandomContext& insecure_rand){
-            p.resize(insecure_rand.randrange(PREVECTOR_SIZE*2));
+            p.resize(insecure_rand.randrange(CScriptBase::STATIC_SIZE * 2));
        }
        std::optional<int> operator()()
        {
--- a/src/bench/prevector.cpp
+++ b/src/bench/prevector.cpp
@@ -5,17 +5,20 @@
 #include <prevector.h>

 #include <bench/bench.h>
+#include <script/script.h>
 #include <serialize.h>
 #include <streams.h>

 #include <type_traits>
 #include <vector>

-struct nontrivial_t {
+struct nontrivial_t
+{
    int x{-1};
    nontrivial_t() = default;
    SERIALIZE_METHODS(nontrivial_t, obj) { READWRITE(obj.x); }
 };
+
 static_assert(!std::is_trivially_default_constructible_v<nontrivial_t>,
              "expected nontrivial_t to not be trivially constructible");

@@ -27,22 +30,22 @@ template <typename T>
 static void PrevectorDestructor(benchmark::Bench& bench)
 {
    bench.batch(2).run([&] {
-        prevector<28, T> t0;
-        prevector<28, T> t1;
-        t0.resize(28);
-        t1.resize(29);
+        prevector<CScriptBase::STATIC_SIZE, T> t0;
+        prevector<CScriptBase::STATIC_SIZE, T> t1;
+        t0.resize(CScriptBase::STATIC_SIZE);
+        t1.resize(CScriptBase::STATIC_SIZE + 1);
    });
 }

 template <typename T>
 static void PrevectorClear(benchmark::Bench& bench)
 {
-    prevector<28, T> t0;
-    prevector<28, T> t1;
+    prevector<CScriptBase::STATIC_SIZE, T> t0;
+    prevector<CScriptBase::STATIC_SIZE, T> t1;
    bench.batch(2).run([&] {
-        t0.resize(28);
+        t0.resize(CScriptBase::STATIC_SIZE);
        t0.clear();
-        t1.resize(29);
+        t1.resize(CScriptBase::STATIC_SIZE + 1);
        t1.clear();
    });
 }
@@ -50,12 +53,12 @@ static void PrevectorClear(benchmark::Bench& bench)
 template <typename T>
 static void PrevectorResize(benchmark::Bench& bench)
 {
-    prevector<28, T> t0;
-    prevector<28, T> t1;
+    prevector<CScriptBase::STATIC_SIZE, T> t0;
+    prevector<CScriptBase::STATIC_SIZE, T> t1;
    bench.batch(4).run([&] {
-        t0.resize(28);
+        t0.resize(CScriptBase::STATIC_SIZE);
        t0.resize(0);
-        t1.resize(29);
+        t1.resize(CScriptBase::STATIC_SIZE + 1);
        t1.resize(0);
    });
 }
@@ -64,8 +67,8 @@ template <typename T>
 static void PrevectorDeserialize(benchmark::Bench& bench)
 {
    DataStream s0{};
-    prevector<28, T> t0;
-    t0.resize(28);
+    prevector<CScriptBase::STATIC_SIZE, T> t0;
+    t0.resize(CScriptBase::STATIC_SIZE);
    for (auto x = 0; x < 900; ++x) {
        s0 << t0;
    }
@@ -74,7 +77,7 @@ static void PrevectorDeserialize(benchmark::Bench& bench)
        s0 << t0;
    }
    bench.batch(1000).run([&] {
-        prevector<28, T> t1;
+        prevector<CScriptBase::STATIC_SIZE, T> t1;
        for (auto x = 0; x < 1000; ++x) {
            s0 >> t1;
        }
@@ -86,7 +89,7 @@ template <typename T>
 static void PrevectorFillVectorDirect(benchmark::Bench& bench)
 {
    bench.run([&] {
-        std::vector<prevector<28, T>> vec;
+        std::vector<prevector<CScriptBase::STATIC_SIZE, T>> vec;
        vec.reserve(260);
        for (size_t i = 0; i < 260; ++i) {
            vec.emplace_back();
@@ -99,11 +102,11 @@ template <typename T>
 static void PrevectorFillVectorIndirect(benchmark::Bench& bench)
 {
    bench.run([&] {
-        std::vector<prevector<28, T>> vec;
+        std::vector<prevector<CScriptBase::STATIC_SIZE, T>> vec;
        vec.reserve(260);
        for (size_t i = 0; i < 260; ++i) {
            // force allocation
-            vec.emplace_back(29, T{});
+            vec.emplace_back(CScriptBase::STATIC_SIZE + 1, T{});
        }
    });
 }
--- a/src/prevector.h
+++ b/src/prevector.h
@@ -38,6 +38,8 @@ class prevector {
    static_assert(std::is_trivially_copyable_v<T>);

 public:
+    static constexpr unsigned int STATIC_SIZE{N};
+
    typedef Size size_type;
    typedef Diff difference_type;
    typedef T value_type;
--- a/src/script/script.h
+++ b/src/script/script.h
@@ -403,10 +403,8 @@ private:
 /**
 * We use a prevector for the script to reduce the considerable memory overhead
 *  of vectors in cases where they normally contain a small number of small elements.
- * Tests in October 2015 showed use of this reduced dbcache memory usage by 23%
- *  and made an initial sync 13% faster.
 */
-typedef prevector<28, unsigned char> CScriptBase;
+using CScriptBase = prevector<36, uint8_t>;

 bool GetScriptOp(CScriptBase::const_iterator& pc, CScriptBase::const_iterator end, opcodetype& opcodeRet, std::vector<unsigned char>* pvchRet);

--- a/src/test/script_tests.cpp
+++ b/src/test/script_tests.cpp
@@ -1151,6 +1151,107 @@ BOOST_AUTO_TEST_CASE(script_CHECKMULTISIG23)
    BOOST_CHECK_MESSAGE(err == SCRIPT_ERR_INVALID_STACK_OPERATION, ScriptErrorString(err));
 }

+/** Return the TxoutType of a script without exposing Solver details. */
+static TxoutType GetTxoutType(const CScript& output_script)
+{
+    std::vector<std::vector<uint8_t>> unused;
+    return Solver(output_script, unused);
+}
+
+#define CHECK_SCRIPT_STATIC_SIZE(script, expected_size)                   \
+    do {                                                                  \
+        BOOST_CHECK_EQUAL((script).size(), (expected_size));              \
+        BOOST_CHECK_EQUAL((script).capacity(), CScriptBase::STATIC_SIZE); \
+        BOOST_CHECK_EQUAL((script).allocated_memory(), 0);                \
+    } while (0)
+
+#define CHECK_SCRIPT_DYNAMIC_SIZE(script, expected_size, expected_extra)                 \
+    do {                                                                 \
+        BOOST_CHECK_EQUAL((script).size(), (expected_size));             \
+        BOOST_CHECK_EQUAL((script).capacity(), (expected_extra));         \
+        BOOST_CHECK_EQUAL((script).allocated_memory(), (expected_extra)); \
+    } while (0)
+
+BOOST_AUTO_TEST_CASE(script_size_and_capacity_test)
+{
+    BOOST_CHECK_EQUAL(sizeof(CompressedScript), 40);
+    BOOST_CHECK_EQUAL(sizeof(CScriptBase), 40);
+    BOOST_CHECK_NE(sizeof(CScriptBase), sizeof(prevector<CScriptBase::STATIC_SIZE + 1, uint8_t>)); // CScriptBase size should be set to avoid wasting space in padding
+    BOOST_CHECK_EQUAL(sizeof(CScript), 40);
+    BOOST_CHECK_EQUAL(sizeof(CTxOut), 48);
+
+    CKey dummy_key;
+    dummy_key.MakeNewKey(/*fCompressed=*/true);
+    const CPubKey dummy_pubkey{dummy_key.GetPubKey()};
+
+    // Small OP_RETURN has direct allocation
+    {
+        const auto script{CScript() << OP_RETURN << std::vector<uint8_t>(10, 0xaa)};
+        BOOST_CHECK_EQUAL(GetTxoutType(script), TxoutType::NULL_DATA);
+        CHECK_SCRIPT_STATIC_SIZE(script, 12);
+    }
+
+    // P2WPKH has direct allocation
+    {
+        const auto script{GetScriptForDestination(WitnessV0KeyHash{PKHash{dummy_pubkey}})};
+        BOOST_CHECK_EQUAL(GetTxoutType(script), TxoutType::WITNESS_V0_KEYHASH);
+        CHECK_SCRIPT_STATIC_SIZE(script, 22);
+    }
+
+    // P2SH has direct allocation
+    {
+        const auto script{GetScriptForDestination(ScriptHash{CScript{} << OP_TRUE})};
+        BOOST_CHECK(script.IsPayToScriptHash());
+        CHECK_SCRIPT_STATIC_SIZE(script, 23);
+    }
+
+    // P2PKH has direct allocation
+    {
+        const auto script{GetScriptForDestination(PKHash{dummy_pubkey})};
+        BOOST_CHECK_EQUAL(GetTxoutType(script), TxoutType::PUBKEYHASH);
+        CHECK_SCRIPT_STATIC_SIZE(script, 25);
+    }
+
+    // P2WSH has direct allocation
+    {
+        const auto script{GetScriptForDestination(WitnessV0ScriptHash{CScript{} << OP_TRUE})};
+        BOOST_CHECK(script.IsPayToWitnessScriptHash());
+        CHECK_SCRIPT_STATIC_SIZE(script, 34);
+    }
+
+    // P2TR has direct allocation
+    {
+        const auto script{GetScriptForDestination(WitnessV1Taproot{XOnlyPubKey{dummy_pubkey}})};
+        BOOST_CHECK_EQUAL(GetTxoutType(script), TxoutType::WITNESS_V1_TAPROOT);
+        CHECK_SCRIPT_STATIC_SIZE(script, 34);
+    }
+
+    // Compressed P2PK has direct allocation
+    {
+        const auto script{GetScriptForRawPubKey(dummy_pubkey)};
+        BOOST_CHECK_EQUAL(GetTxoutType(script), TxoutType::PUBKEY);
+        CHECK_SCRIPT_STATIC_SIZE(script, 35);
+    }
+
+    // Uncompressed P2PK needs extra allocation
+    {
+        CKey uncompressed_key;
+        uncompressed_key.MakeNewKey(/*fCompressed=*/false);
+        const CPubKey uncompressed_pubkey{uncompressed_key.GetPubKey()};
+
+        const auto script{GetScriptForRawPubKey(uncompressed_pubkey)};
+        BOOST_CHECK_EQUAL(GetTxoutType(script), TxoutType::PUBKEY);
+        CHECK_SCRIPT_DYNAMIC_SIZE(script, 67, 67);
+    }
+
+    // Bare multisig needs extra allocation
+    {
+        const auto script{GetScriptForMultisig(1, std::vector{2, dummy_pubkey})};
+        BOOST_CHECK_EQUAL(GetTxoutType(script), TxoutType::MULTISIG);
+        CHECK_SCRIPT_DYNAMIC_SIZE(script, 71, 103);
+    }
+}
+
 /* Wrapper around ProduceSignature to combine two scriptsigs */
 SignatureData CombineSignatures(const CTxOut& txout, const CMutableTransaction& tx, const SignatureData& scriptSig1, const SignatureData& scriptSig2)
 {
--- a/src/test/validation_flush_tests.cpp
+++ b/src/test/validation_flush_tests.cpp
@@ -26,9 +26,8 @@ BOOST_AUTO_TEST_CASE(getcoinscachesizestate)
    LOCK(::cs_main);
    auto& view = chainstate.CoinsTip();

-    // The number of bytes consumed by coin's heap data, i.e. CScript
-    // (prevector<28, unsigned char>) when assigned 56 bytes of data per above.
-    //
+    // The number of bytes consumed by coin's heap data, i.e.
+    // CScript (prevector<36, unsigned char>) when assigned 56 bytes of data per above.
    // See also: Coin::DynamicMemoryUsage().
    constexpr unsigned int COIN_SIZE = is_64_bit ? 80 : 64;