Merge bitcoin/bitcoin#32279: [IBD] prevector: store P2WSH/P2TR/P2PK scripts inline

d5104cfbae prevector: store `P2WSH`/`P2TR`/`P2PK` scripts inline (Lőrinc)
52121506b2 test: assert `CScript` allocation characteristics (Lőrinc)
65ac7f6d4d refactor: modernize `CScriptBase` definition (Lőrinc)
756da2a994 refactor: extract `STATIC_SIZE` constant to prevector (Lőrinc)

Pull request description:

  This change is part of [[IBD] - Tracking PR for speeding up Initial Block Download](https://github.com/bitcoin/bitcoin/pull/32043)

  ### Summary

  The current `prevector` size of 28 bytes (chosen to fill the `sizeof(CScript)` aligned size) was introduced in 2015 (https://github.com/bitcoin/bitcoin/pull/6914) before `SegWit` and `TapRoot`.
  However, the increasingly common `P2WSH` and `P2TR` scripts are both 34 bytes, and are forced to use heap (re)allocation rather than efficient inline storage.

  The core trade-off of this change is to eliminate heap allocations for common 34-36 byte scripts at the cost of increasing the base memory footprint of all `CScript` objects by 8 bytes (while still respecting peak memory usage defined by `-dbcache`).

  ### Context
  Increasing the `prevector` size allows these scripts to be stored inline, avoiding heap allocations, reducing potential memory fragmentation, and improving performance during cache flushes. Massif analysis confirms a lower stable memory usage after flushing, suggesting the elimination of heap allocations outweighs the larger base size for common workloads.

  Due to memory alignment, increasing the prevector size to 36 bytes doesn't change the overall `sizeof(CScript)` compared to an increase to 34 bytes, allowing us to include `P2PK` scripts as well at no additional memory cost.

  <details>
  <summary>Massif measurements</summary>

  > dbcache=440

  Massif before, with a heap threshold of `28`:
  ```bash
      MB
  744.1^#
       |#: ::::::@: :::::::   :@:: @::::::::::::::@@
       |#: ::::::@::::: :::   :@:::@:::::: :: ::::@
       |#: ::::::@::::: :::   :@:::@:::::: :: ::::@
       |#: ::::::@::::: ::: : :@:::@:::::: :: ::::@
       |#: ::::::@::::: ::: : :@:::@:::::: :: ::::@
       |#: ::::::@::::: ::: : :@:::@:::::: :: ::::@
       |#::::::::@::::: ::: : :@:::@:::::: :: ::::@
       |#::::::::@::::: ::: :::@:::@:::::: :: ::::@
       |#::::::::@::::: ::: :::@:::@:::::: :: ::::@
       |#::::::::@::::: ::: :::@:::@:::::: :: ::::@
       |#::::::::@::::: ::: :::@:::@:::::: :: ::::@
       |#::::::::@::::: :::::::@:::@:::::: :: ::::@
       |#::::::::@::::: :::::::@:::@:::::: :: ::::@ :::::@:::::@:::::@:::::@::::
       |#::::::::@::::: :::::::@:::@:::::: :: ::::@ :::::@:::::@:::::@:::::@::::
       |#::::::::@::::: :::::::@:::@:::::: :: ::::@ :::::@:::::@:::::@:::::@::::
       |#::::::::@::::: :::::::@:::@:::::: :: ::::@ :::::@:::::@:::::@:::::@::::
       |#::::::::@::::: :::::::@:::@:::::: :: ::::@ :::::@:::::@:::::@:::::@::::
       |#::::::::@::::: :::::::@:::@:::::: :: ::::@ :::::@:::::@:::::@:::::@::::
       |#::::::::@::::: :::::::@:::@:::::: :: ::::@ :::::@:::::@:::::@:::::@::::
     0 +----------------------------------------------------------------------->h
       0                                                                   1.805
  ```

  and after, with a heap threshold of `36`:
  ```bash
      MB
  744.2^       :
       |#  :  :::::::::::   : : :: ::: @@:::::: ::  :
       |#  :  :::: ::::::   : : :: ::: @ :: ::  :   :
       |#  :  :::: :::::::  : :@:: ::: @ :: ::  : :::
       |#  :  :::: :::::::  : :@:: ::: @ :: ::  : : :
       |#  :  :::: :::::::  : :@:: ::: @ :: ::  : : :
       |#  :  :::: :::::::  : :@:: ::: @ :: ::  : : :
       |#  :: :::: :::::::  : :@:: ::: @ :: ::  : : :
       |#  :: :::: :::::::  : :@:: ::::@ :: ::  : : :
       |#:::: :::: :::::::  :::@:: ::::@ :: ::  : : :
       |#: ::::::: :::::::  :::@:: ::::@ :: :: @: : :
       |#: ::::::: :::::::  :::@:::::::@ :: :: @: : :
       |#: ::::::: ::::::::::::@:::::::@ :: :: @: : :
       |#: ::::::: :::::::: :::@:::::::@ :: :: @: : :
       |#: ::::::: :::::::: :::@:::::::@ :: :: @: : :::@:::@::::@::::::@:::::@::
       |#: ::::::: :::::::: :::@:::::::@ :: :: @: : :::@:::@::::@::::::@:::::@::
       |#: ::::::: :::::::: :::@:::::::@ :: :: @: : :::@:::@::::@::::::@:::::@::
       |#: ::::::: :::::::: :::@:::::::@ :: :: @: : :::@:::@::::@::::::@:::::@::
       |#: ::::::: :::::::: :::@:::::::@ :: :: @: : :::@:::@::::@::::::@:::::@::
       |#: ::::::: :::::::: :::@:::::::@ :: :: @: : :::@:::@::::@::::::@:::::@::
     0 +----------------------------------------------------------------------->h
       0                                                                   1.618
  ```

  ---

  > for `dbcache=4500`:

  Massif before, with a heap threshold of `28`:
  ```bash
      GB
  4.565^   ::
       | ##:   @@:::  :::: :@::::  :::: ::::
       | # :   @ ::   :::  :@: ::  : :: :::
       | # :   @ :: :::::  :@: ::  : :: :::
       | # :   @ :: : :::  :@: :: @: :: :::
       | # :   @ :: : :::  :@: :: @: :: :::
       | # :   @ :: : :::  :@: :: @: :: :::
       | # :   @ :: : :::  :@: :: @: :: :::
       | # : ::@ :: : :::  :@: :: @: :: :::
       | # : : @ :: : :::  :@: :: @: :: :::
       | # : : @ :: : :::  :@: :: @: ::::::
       | # : : @ :: : :::  :@: :: @: ::::::
       | # : : @ :: : :::  :@: :: @: ::::::
       | # : : @ :: : ::: ::@: :: @: ::::::
       | # : : @ :: : ::: ::@: :: @: ::::::
       | # : : @ :: : ::: ::@: :: @: ::::::
       | # : : @ :: : ::: ::@: :: @: :::::: @::
       | # : : @ :: : ::: ::@: :: @: :::::: @:
       | # : : @ :: : ::: ::@: :::@: :::::: @:
       | # : : @ :: : ::: ::@: :::@: :::::: @: :::::::::::::::::::::::::::::@:::
     0 +----------------------------------------------------------------------->h
       0                                                                   1.500
  ```

  and after, with a heap threshold of `36`:
  ```
      GB
  4.640^    :
       | ##::  :::::   ::::  ::::::@  ::::
       | # ::  : :::   ::::  :: :::@  ::::
       | # :: :: :::   ::::  :: :::@  ::::
       | # :: :: :::  :::::  :: :::@  ::::
       | # :: :: :::  :::::  :: :::@  ::::
       | # :: :: :::  :::::  :: :::@  ::::
       | # :: :: :::  :::::  :: :::@  ::::
       | # :: :: :::  :::::  :: :::@  ::::  :@@
       | # :: :: :::  ::::: ::: :::@  :::::::@
       | # :: :: :::  ::::: ::: :::@  ::::: :@
       | # :: :: :::  ::::: ::: :::@::::::: :@
       | # ::::: :::  ::::: ::: :::@: ::::: :@
       | # ::::: :::  ::::: ::: :::@: ::::: :@
       | # ::::: :::::::::: ::: :::@: ::::: :@
       | # ::::: :::: ::::: ::: :::@: ::::: :@
       | # ::::: :::: ::::: ::: :::@: ::::: :@
       | # ::::: :::: ::::::::: :::@: ::::: :@
       | # ::::: :::: ::::::::: :::@: ::::: :@
       | # ::::: :::: ::::::::: :::@: ::::: :@ ::::::@:::@:::@::::@:::::@::::@::
     0 +----------------------------------------------------------------------->h
       0                                                                   1.360
  ```

  </details>

  ### Benchmarks and Memory

  Performance benchmarks for `AssumeUTXO` load and flush show:
  - Small dbcache (450MB): ~1-3% performance improvement (despite more frequent flushes)
  - Large dbcache (4500MB): ~6-8% performance improvement due to fewer heap allocations (and basically the number of flushes)
  - Very large dbcache (4500MB): ~5-6% performance improvement due to fewer heap allocations (and memory limit not being reached, so there's no memory penalty)

  Full IBD and `-reindex-chainstate` with also show an overall ~3-4% speedup (both for smaller and larger dbcache values).

  We haven't investigated using different `prevector` sizes based on script type, though this could be explored in the future if needed.

  ### Historical explanation for the speedup (by [Anthony Towns](https://github.com/bitcoin/bitcoin/pull/32279#issuecomment-3111757079))

  > I think the tradeoff is something like:
  >
  > * spends of p2pk, p2sh, p2pkh coins -- these cost 8 more bytes
  > * spends of p2wpkh -- these cost 16 more bytes (sPK and scriptSig didn't need an allocation)
  > * spends of p2wsh and p2tr -- these cost ~48 fewer bytes (save 64 byte allocation on 64bit system, lose 8 bytes for both scriptSig and sPK)
  > * spends of nested p2wsh -- presumably save ~96 bytes, since the scriptSig would save an allocation, but I'm bundling it in the previous section
  >
  > Based on mainnet.observer stats for 2025-05-08, p2wpkh is about 55% of txs, p2tr is about 28%, p2pkh about 13%, p2wsh about 4% and the rest is noise, maybe? Those numbers net out to a saving of ~5.5 bytes per input. If p2wpkh rose from 55% to 80% and p2tr dropped to 20%, that would net to wasting ~3.2 bytes per input.

ACKs for top commit:
  maflcko:
    review ACK d5104cfbae 🐺
  achow101:
    reACK d5104cfbae
  jonatack:
    Review ACK d5104cfbae
  andrewtoth:
    ACK d5104cfbae

Tree-SHA512: 7c5271ebaf4f6d91dc4b679ecbde4b7d0467579f072289f30da988a17c38a552d0b8cdf0e9c001739975dd019894c35e541908571527916cec56e04a8e214ae2
This commit is contained in:
Ava Chow
2025-07-28 11:55:48 -07:00
6 changed files with 131 additions and 28 deletions

View File

@@ -8,6 +8,7 @@
#include <key.h>
#include <prevector.h>
#include <random.h>
#include <script/script.h>
#include <cstddef>
#include <cstdint>
@@ -16,7 +17,6 @@
static const size_t BATCHES = 101;
static const size_t BATCH_SIZE = 30;
static const int PREVECTOR_SIZE = 28;
static const unsigned int QUEUE_BATCH_SIZE = 128;
// This Benchmark tests the CheckQueue with a slightly realistic workload,
@@ -30,9 +30,9 @@ static void CCheckQueueSpeedPrevectorJob(benchmark::Bench& bench)
ECC_Context ecc_context{};
struct PrevectorJob {
prevector<PREVECTOR_SIZE, uint8_t> p;
prevector<CScriptBase::STATIC_SIZE, uint8_t> p;
explicit PrevectorJob(FastRandomContext& insecure_rand){
p.resize(insecure_rand.randrange(PREVECTOR_SIZE*2));
p.resize(insecure_rand.randrange(CScriptBase::STATIC_SIZE * 2));
}
std::optional<int> operator()()
{

View File

@@ -5,17 +5,20 @@
#include <prevector.h>
#include <bench/bench.h>
#include <script/script.h>
#include <serialize.h>
#include <streams.h>
#include <type_traits>
#include <vector>
struct nontrivial_t {
struct nontrivial_t
{
int x{-1};
nontrivial_t() = default;
SERIALIZE_METHODS(nontrivial_t, obj) { READWRITE(obj.x); }
};
static_assert(!std::is_trivially_default_constructible_v<nontrivial_t>,
"expected nontrivial_t to not be trivially constructible");
@@ -27,22 +30,22 @@ template <typename T>
static void PrevectorDestructor(benchmark::Bench& bench)
{
bench.batch(2).run([&] {
prevector<28, T> t0;
prevector<28, T> t1;
t0.resize(28);
t1.resize(29);
prevector<CScriptBase::STATIC_SIZE, T> t0;
prevector<CScriptBase::STATIC_SIZE, T> t1;
t0.resize(CScriptBase::STATIC_SIZE);
t1.resize(CScriptBase::STATIC_SIZE + 1);
});
}
template <typename T>
static void PrevectorClear(benchmark::Bench& bench)
{
prevector<28, T> t0;
prevector<28, T> t1;
prevector<CScriptBase::STATIC_SIZE, T> t0;
prevector<CScriptBase::STATIC_SIZE, T> t1;
bench.batch(2).run([&] {
t0.resize(28);
t0.resize(CScriptBase::STATIC_SIZE);
t0.clear();
t1.resize(29);
t1.resize(CScriptBase::STATIC_SIZE + 1);
t1.clear();
});
}
@@ -50,12 +53,12 @@ static void PrevectorClear(benchmark::Bench& bench)
template <typename T>
static void PrevectorResize(benchmark::Bench& bench)
{
prevector<28, T> t0;
prevector<28, T> t1;
prevector<CScriptBase::STATIC_SIZE, T> t0;
prevector<CScriptBase::STATIC_SIZE, T> t1;
bench.batch(4).run([&] {
t0.resize(28);
t0.resize(CScriptBase::STATIC_SIZE);
t0.resize(0);
t1.resize(29);
t1.resize(CScriptBase::STATIC_SIZE + 1);
t1.resize(0);
});
}
@@ -64,8 +67,8 @@ template <typename T>
static void PrevectorDeserialize(benchmark::Bench& bench)
{
DataStream s0{};
prevector<28, T> t0;
t0.resize(28);
prevector<CScriptBase::STATIC_SIZE, T> t0;
t0.resize(CScriptBase::STATIC_SIZE);
for (auto x = 0; x < 900; ++x) {
s0 << t0;
}
@@ -74,7 +77,7 @@ static void PrevectorDeserialize(benchmark::Bench& bench)
s0 << t0;
}
bench.batch(1000).run([&] {
prevector<28, T> t1;
prevector<CScriptBase::STATIC_SIZE, T> t1;
for (auto x = 0; x < 1000; ++x) {
s0 >> t1;
}
@@ -86,7 +89,7 @@ template <typename T>
static void PrevectorFillVectorDirect(benchmark::Bench& bench)
{
bench.run([&] {
std::vector<prevector<28, T>> vec;
std::vector<prevector<CScriptBase::STATIC_SIZE, T>> vec;
vec.reserve(260);
for (size_t i = 0; i < 260; ++i) {
vec.emplace_back();
@@ -99,11 +102,11 @@ template <typename T>
static void PrevectorFillVectorIndirect(benchmark::Bench& bench)
{
bench.run([&] {
std::vector<prevector<28, T>> vec;
std::vector<prevector<CScriptBase::STATIC_SIZE, T>> vec;
vec.reserve(260);
for (size_t i = 0; i < 260; ++i) {
// force allocation
vec.emplace_back(29, T{});
vec.emplace_back(CScriptBase::STATIC_SIZE + 1, T{});
}
});
}

View File

@@ -38,6 +38,8 @@ class prevector {
static_assert(std::is_trivially_copyable_v<T>);
public:
static constexpr unsigned int STATIC_SIZE{N};
typedef Size size_type;
typedef Diff difference_type;
typedef T value_type;

View File

@@ -403,10 +403,8 @@ private:
/**
* We use a prevector for the script to reduce the considerable memory overhead
* of vectors in cases where they normally contain a small number of small elements.
* Tests in October 2015 showed use of this reduced dbcache memory usage by 23%
* and made an initial sync 13% faster.
*/
typedef prevector<28, unsigned char> CScriptBase;
using CScriptBase = prevector<36, uint8_t>;
bool GetScriptOp(CScriptBase::const_iterator& pc, CScriptBase::const_iterator end, opcodetype& opcodeRet, std::vector<unsigned char>* pvchRet);

View File

@@ -1151,6 +1151,107 @@ BOOST_AUTO_TEST_CASE(script_CHECKMULTISIG23)
BOOST_CHECK_MESSAGE(err == SCRIPT_ERR_INVALID_STACK_OPERATION, ScriptErrorString(err));
}
/** Return the TxoutType of a script without exposing Solver details. */
static TxoutType GetTxoutType(const CScript& output_script)
{
std::vector<std::vector<uint8_t>> unused;
return Solver(output_script, unused);
}
#define CHECK_SCRIPT_STATIC_SIZE(script, expected_size) \
do { \
BOOST_CHECK_EQUAL((script).size(), (expected_size)); \
BOOST_CHECK_EQUAL((script).capacity(), CScriptBase::STATIC_SIZE); \
BOOST_CHECK_EQUAL((script).allocated_memory(), 0); \
} while (0)
#define CHECK_SCRIPT_DYNAMIC_SIZE(script, expected_size, expected_extra) \
do { \
BOOST_CHECK_EQUAL((script).size(), (expected_size)); \
BOOST_CHECK_EQUAL((script).capacity(), (expected_extra)); \
BOOST_CHECK_EQUAL((script).allocated_memory(), (expected_extra)); \
} while (0)
BOOST_AUTO_TEST_CASE(script_size_and_capacity_test)
{
BOOST_CHECK_EQUAL(sizeof(CompressedScript), 40);
BOOST_CHECK_EQUAL(sizeof(CScriptBase), 40);
BOOST_CHECK_NE(sizeof(CScriptBase), sizeof(prevector<CScriptBase::STATIC_SIZE + 1, uint8_t>)); // CScriptBase size should be set to avoid wasting space in padding
BOOST_CHECK_EQUAL(sizeof(CScript), 40);
BOOST_CHECK_EQUAL(sizeof(CTxOut), 48);
CKey dummy_key;
dummy_key.MakeNewKey(/*fCompressed=*/true);
const CPubKey dummy_pubkey{dummy_key.GetPubKey()};
// Small OP_RETURN has direct allocation
{
const auto script{CScript() << OP_RETURN << std::vector<uint8_t>(10, 0xaa)};
BOOST_CHECK_EQUAL(GetTxoutType(script), TxoutType::NULL_DATA);
CHECK_SCRIPT_STATIC_SIZE(script, 12);
}
// P2WPKH has direct allocation
{
const auto script{GetScriptForDestination(WitnessV0KeyHash{PKHash{dummy_pubkey}})};
BOOST_CHECK_EQUAL(GetTxoutType(script), TxoutType::WITNESS_V0_KEYHASH);
CHECK_SCRIPT_STATIC_SIZE(script, 22);
}
// P2SH has direct allocation
{
const auto script{GetScriptForDestination(ScriptHash{CScript{} << OP_TRUE})};
BOOST_CHECK(script.IsPayToScriptHash());
CHECK_SCRIPT_STATIC_SIZE(script, 23);
}
// P2PKH has direct allocation
{
const auto script{GetScriptForDestination(PKHash{dummy_pubkey})};
BOOST_CHECK_EQUAL(GetTxoutType(script), TxoutType::PUBKEYHASH);
CHECK_SCRIPT_STATIC_SIZE(script, 25);
}
// P2WSH has direct allocation
{
const auto script{GetScriptForDestination(WitnessV0ScriptHash{CScript{} << OP_TRUE})};
BOOST_CHECK(script.IsPayToWitnessScriptHash());
CHECK_SCRIPT_STATIC_SIZE(script, 34);
}
// P2TR has direct allocation
{
const auto script{GetScriptForDestination(WitnessV1Taproot{XOnlyPubKey{dummy_pubkey}})};
BOOST_CHECK_EQUAL(GetTxoutType(script), TxoutType::WITNESS_V1_TAPROOT);
CHECK_SCRIPT_STATIC_SIZE(script, 34);
}
// Compressed P2PK has direct allocation
{
const auto script{GetScriptForRawPubKey(dummy_pubkey)};
BOOST_CHECK_EQUAL(GetTxoutType(script), TxoutType::PUBKEY);
CHECK_SCRIPT_STATIC_SIZE(script, 35);
}
// Uncompressed P2PK needs extra allocation
{
CKey uncompressed_key;
uncompressed_key.MakeNewKey(/*fCompressed=*/false);
const CPubKey uncompressed_pubkey{uncompressed_key.GetPubKey()};
const auto script{GetScriptForRawPubKey(uncompressed_pubkey)};
BOOST_CHECK_EQUAL(GetTxoutType(script), TxoutType::PUBKEY);
CHECK_SCRIPT_DYNAMIC_SIZE(script, 67, 67);
}
// Bare multisig needs extra allocation
{
const auto script{GetScriptForMultisig(1, std::vector{2, dummy_pubkey})};
BOOST_CHECK_EQUAL(GetTxoutType(script), TxoutType::MULTISIG);
CHECK_SCRIPT_DYNAMIC_SIZE(script, 71, 103);
}
}
/* Wrapper around ProduceSignature to combine two scriptsigs */
SignatureData CombineSignatures(const CTxOut& txout, const CMutableTransaction& tx, const SignatureData& scriptSig1, const SignatureData& scriptSig2)
{

View File

@@ -26,9 +26,8 @@ BOOST_AUTO_TEST_CASE(getcoinscachesizestate)
LOCK(::cs_main);
auto& view = chainstate.CoinsTip();
// The number of bytes consumed by coin's heap data, i.e. CScript
// (prevector<28, unsigned char>) when assigned 56 bytes of data per above.
//
// The number of bytes consumed by coin's heap data, i.e.
// CScript (prevector<36, unsigned char>) when assigned 56 bytes of data per above.
// See also: Coin::DynamicMemoryUsage().
constexpr unsigned int COIN_SIZE = is_64_bit ? 80 : 64;