Commit Graph

230 Commits

Author SHA1 Message Date
merge-script
a005fdff6c Merge bitcoin/bitcoin#34074: A few followups after introducing /rest/blockpart/ endpoint
59b93f11e8 rest: print also HTTP response reason in case of an error (Roman Zeyde)
7fe94a0493 rest: add a test for unsuported `/blockpart/` request type (Roman Zeyde)
55d0d19b5c rest: deduplicate `interface_rest.py` negative tests (Roman Zeyde)
89eb531024 rest: update release notes for `/blockpart/` endpoint (Roman Zeyde)
41118e17f8 blockstorage: simplify partial block read validation (Roman Zeyde)
599effdeab rest: reformat `uri_prefixes` initializer list (Roman Zeyde)

Pull request description:

  The commits below should resolve a few leftovers from #33657.

ACKs for top commit:
  l0rinc:
    ACK 59b93f11e8
  hodlinator:
    re-ACK 59b93f11e8

Tree-SHA512: ae45e08edd315018e11283b354fb32f9658f5829c956554dc662a81c2e16397def7c3700e6354e0a91ff03c850def35638a69ec2668b7c015d25d6fed42b92bb
2025-12-17 15:09:15 +00:00
merge-script
4f11ef058b Merge bitcoin/bitcoin#30214: refactor: Improve assumeutxo state representation
82be652e40 doc: Improve ChainstateManager documentation, use consistent terms (Ryan Ofsky)
af455dcb39 refactor: Simplify pruning functions (TheCharlatan)
ae85c495f1 refactor: Delete ChainstateManager::GetAll() method (Ryan Ofsky)
6a572dbda9 refactor: Add ChainstateManager::ActivateBestChains() method (Ryan Ofsky)
491d827d52 refactor: Add ChainstateManager::m_chainstates member (Ryan Ofsky)
e514fe6116 refactor: Delete ChainstateManager::SnapshotBlockhash() method (Ryan Ofsky)
ee35250683 refactor: Delete ChainstateManager::IsSnapshotValidated() method (Ryan Ofsky)
d9e82299fc refactor: Delete ChainstateManager::IsSnapshotActive() method (Ryan Ofsky)
4dfe383912 refactor: Convert ChainstateRole enum to struct (Ryan Ofsky)
352ad27fc1 refactor: Add ChainstateManager::ValidatedChainstate() method (Ryan Ofsky)
a229cb9477 refactor: Add ChainstateManager::CurrentChainstate() method (Ryan Ofsky)
a9b7f5614c refactor: Add Chainstate::StoragePath() method (Ryan Ofsky)
840bd2ef23 refactor: Pass chainstate parameters to MaybeCompleteSnapshotValidation (Ryan Ofsky)
1598a15aed refactor: Deduplicate Chainstate activation code (Ryan Ofsky)
9fe927b6d6 refactor: Add Chainstate m_assumeutxo and m_target_utxohash members (Ryan Ofsky)
6082c84713 refactor: Add Chainstate::m_target_blockhash member (Ryan Ofsky)
de00e87548 test: Fix broken chainstatemanager_snapshot_init check (Ryan Ofsky)

Pull request description:

  This PR contains the first part of #28608, which tries to make assumeutxo code more maintainable, and improve it by not locking `cs_main` for a long time when the snapshot block is connected, and by deleting the snapshot validation chainstate when it is no longer used, instead of waiting until the next restart.

  The changes in this PR are just refactoring. They make `Chainstate` objects self-contained, so for example, it is possible to determine what blocks to connect to a chainstate without querying `ChainstateManager`, and to determine whether a Chainstate is validated without basing it on inferences like `&cs != &ActiveChainstate()` or `GetAll().size() == 1`.

  The PR also tries to make assumeutxo terminology less confusing, using "current chainstate" to refer to the chainstate targeting the current network tip, and "historical chainstate" to refer to the chainstate downloading old blocks and validating the assumeutxo snapshot. It removes uses of the terms "active chainstate," "usable chainstate," "disabled chainstate," "ibd chainstate," and "snapshot chainstate" which are confusing for various reasons.

ACKs for top commit:
  maflcko:
    re-review ACK 82be652e40 🕍
  fjahr:
    re-ACK 82be652e40
  sedited:
    Re-ACK 82be652e40

Tree-SHA512: 81c67abba9fc5bb170e32b7bf8a1e4f7b5592315b4ef720be916d5f1f5a7088c0c59cfb697744dd385552f58aa31ee36176bae6a6e465723e65861089a1252e5
2025-12-16 14:03:34 +00:00
Roman Zeyde
41118e17f8 blockstorage: simplify partial block read validation
Use `SaturatingAdd` following https://github.com/bitcoin/bitcoin/pull/33657#discussion_r2610832092.
2025-12-14 10:44:12 +01:00
merge-script
938d7aacab Merge bitcoin/bitcoin#33657: rest: allow reading partial block data from storage
07135290c1 rest: allow reading partial block data from storage (Roman Zeyde)
4e2af1c065 blockstorage: allow reading partial block data from storage (Roman Zeyde)
f2fd1aa21c blockstorage: return an error code from `ReadRawBlock()` (Roman Zeyde)

Pull request description:

  It allows fetching specific transactions using an external index, following https://github.com/bitcoin/bitcoin/pull/32541#issuecomment-3267485313.

  Currently, electrs and other indexers map between an address/scripthash to the list of the relevant transactions.

  However, in order to fetch those transactions from bitcoind, electrs relies on reading the whole block and post-filtering for a specific transaction[^1]. Other indexers use a `txindex` to fetch a transaction using its txid [^2][^3][^4].

  The above approach has significant storage and CPU overhead, since the `txid` is a pseudo-random 32-byte value. Also, mainnet `txindex` takes ~60GB today.

  This PR is adding support for using the transaction's position within its block to be able to fetch it directly using [REST API](https://github.com/bitcoin/bitcoin/blob/master/doc/REST-interface.md), using the following HTTP request:

  ```
  GET /rest/blockpart/BLOCKHASH.bin?offset=OFFSET&size=SIZE
  ```

  - The offsets' index can be encoded much more efficiently ([~1.3GB today](https://github.com/romanz/bindex-rs/pull/66#issuecomment-3508476436)).

  - Address history query performance can be tested on mainnet using [1BitcoinEaterAddressDontSendf59kuE](https://mempool.space/address/1BitcoinEaterAddressDontSendf59kuE) - assuming warm OS block cache, [it takes <1s to fetch 5200 txs, i.e. <0.2ms per tx](https://github.com/romanz/bindex-rs/pull/66#issuecomment-3508476436) with [bindex](https://github.com/romanz/bindex-rs).

  - Only binary and hex response formats are supported.

  [^1]: https://github.com/romanz/electrs/blob/master/doc/schema.md
  [^2]: https://github.com/Blockstream/electrs/blob/new-index/doc/schema.md#txstore
  [^3]: https://github.com/spesmilo/electrumx/blob/master/docs/HOWTO.rst#prerequisites
  [^4]: https://github.com/cculianu/Fulcrum/blob/master/README.md#requirements

ACKs for top commit:
  maflcko:
    review ACK 07135290c1 🏪
  l0rinc:
    ACK 07135290c1
  hodlinator:
    re-ACK 07135290c1

Tree-SHA512: bcce7bf4b9a3e5e920ab5a83e656f50d5d7840cdde6b7147d329cf578f8a2db555fc1aa5334e8ee64d5630d25839ece77a2cf421c6c3ac1fa379bb453163bd4f
2025-12-12 13:22:00 +00:00
TheCharlatan
af455dcb39 refactor: Simplify pruning functions
Move GetPruneRange from ChainstateManager to Chainstate.
2025-12-12 11:49:59 +01:00
Ryan Ofsky
ae85c495f1 refactor: Delete ChainstateManager::GetAll() method
Just use m_chainstates array instead.
2025-12-12 06:49:59 -04:00
Ryan Ofsky
6a572dbda9 refactor: Add ChainstateManager::ActivateBestChains() method
Deduplicate code looping over chainstate objects and calling
ActivateBestChain() and avoid need for code outside ChainstateManager to use
the GetAll() method.
2025-12-12 06:49:59 -04:00
Ryan Ofsky
4dfe383912 refactor: Convert ChainstateRole enum to struct
Change ChainstateRole parameter passed to wallets and indexes. Wallets and
indexes need to know whether chainstate is historical and whether it is fully
validated. They should not be aware of the assumeutxo snapshot validation
process.
2025-12-12 06:49:59 -04:00
Roman Zeyde
4e2af1c065 blockstorage: allow reading partial block data from storage
It will allow fetching specific transactions using an external index,
following https://github.com/bitcoin/bitcoin/pull/32541#issuecomment-3267485313.

No logging takes place in case of an invalid offset/size (to avoid spamming the log),
by using a new `ReadRawError::BadPartRange` error variant.

Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>
Co-authored-by: Lőrinc <pap.lorinc@gmail.com>
2025-12-11 18:54:55 +01:00
Roman Zeyde
f2fd1aa21c blockstorage: return an error code from ReadRawBlock()
It will enable different error handling flows for different error types.

Also, `ReadRawBlockBench` performance has decreased due to no longer reusing a vector
with an unchanging capacity - mirroring our production code behavior.

Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>
Co-authored-by: Lőrinc <pap.lorinc@gmail.com>
2025-12-11 18:54:55 +01:00
MarcoFalke
fa89f60e31 scripted-diff: LogPrintLevel(*,BCLog::Level::*,*) -> LogError()/LogWarning()
This is a minimal behavior change and changes log output from:

  [net:error] Something bad happened
  [net:warning] Something problematic happened

to either

  [error] Something bad happened
  [warning] Something problematic happened

or, when -loglevelalways=1 is enabled:

  [all:error] Something bad happened
  [all:warning] Something problematic happened

Such a behavior change is desired, because all warning and error logs
are written in the same style in the source code and they are logged in
the same format for log consumers.

-BEGIN VERIFY SCRIPT-

 sed --regexp-extended --in-place \
   's/LogPrintLevel\((BCLog::[^,]*), BCLog::Level::(Error|Warning), */Log\2(/g' \
   $( git grep -l LogPrintLevel ':(exclude)src/test/logging_tests.cpp' )

-END VERIFY SCRIPT-
2025-12-09 10:44:33 +01:00
MarcoFalke
fa45a1503e log: Use LogWarning for non-critical logs
As per doc/developer-notes#logging, LogWarning should be used for severe
problems that do not warrant shutting down the node
2025-11-27 14:33:59 +01:00
Andrew Toth
99d012ec80 refactor: return reference instead of pointer
The return value of BlockManager::GetFirstBlock must always be non-null. This
can be inferred by the implementation, which has an assertion that the return
value is not null. A raw pointer should only be returned if the result may be
null. In this case a reference is more appropriate.
2025-11-13 09:57:42 -05:00
merge-script
3789215f73 Merge bitcoin/bitcoin#33724: refactor: Return uint64_t from GetSerializeSize
fa6c0bedd3 refactor: Return uint64_t from GetSerializeSize (MarcoFalke)
fad0c8680e refactor: Use uint64_t over size_t for serialized-size values (MarcoFalke)
fa4f388fc9 refactor: Use fixed size ints over (un)signed ints for serialized values (MarcoFalke)
fa01f38e53 move-only: Move CBlockFileInfo to kernel namespace (MarcoFalke)
fa2bbc9e4c refactor: [rpc] Remove cast when reporting serialized size (MarcoFalke)
fa364af89b test: Remove outdated comment (MarcoFalke)

Pull request description:

  Consensus code should arrive at the same conclusion, regardless of the architecture it runs on. Using architecture-specific types such as `size_t` can lead to issues, such as the low-severity [CVE-2025-46597](https://bitcoincore.org/en/2025/10/24/disclose-cve-2025-46597/).

  The CVE was already worked around, but it may be good to still fix the underlying issue.

  Fixes https://github.com/bitcoin/bitcoin/issues/33709 with a few refactors to use explicit fixed-sized integer types in serialization-size related code and concluding with a refactor to return `uint64_t` from `GetSerializeSize`. The refactors should not change any behavior, because the CVE was already worked around.

ACKs for top commit:
  Crypt-iQ:
    crACK fa6c0bedd3
  l0rinc:
    ACK fa6c0bedd3
  laanwj:
    Code review ACK fa6c0bedd3

Tree-SHA512: f45057bd86fb46011e4cb3edf0dc607057d72ed869fd6ad636562111ae80fea233b2fc45c34b02256331028359a9c3f4fa73e9b882b225bdc089d00becd0195e
2025-11-12 09:48:10 -05:00
Ava Chow
a4e96cae7d Merge bitcoin/bitcoin#33042: refactor: inline constant return values from dbwrapper write methods
743abbcbde refactor: inline constant return value of `BlockTreeDB::WriteBatchSync` and `BlockManager::WriteBlockIndexDB` and `BlockTreeDB::WriteFlag` (Lőrinc)
e030240e90 refactor: inline constant return value of `CDBWrapper::Erase` and `BlockTreeDB::WriteReindexing` (Lőrinc)
cdab9480e9 refactor: inline constant return value of `CDBWrapper::Write` (Lőrinc)
d1847cf5b5 refactor: inline constant return value of `TxIndex::DB::WriteTxs` (Lőrinc)
50b63a5698 refactor: inline constant return value of `CDBWrapper::WriteBatch` (Lőrinc)

Pull request description:

  Related to https://github.com/bitcoin/bitcoin/pull/31144#discussion_r2223587480

  ### Summary
  `WriteBatch` always returns `true` - the errors are handled by throwing `dbwrapper_error` instead.

  ### Context
  This boolean return value of the `Write` methods is confusing because it's inconsistent with `CDBWrapper::Read`, which catches exceptions and returns a boolean to indicate success/failure. It's bad that `Read` returns and `Write` throws - but it's a lot worse that `Write` advertises a return value when it actually communicates errors through exceptions.

  ### Solution
  This PR removes the constant return values from write methods and inlines `true` at their call sites. Many upstream methods had boolean return values only because they were propagating these constants - those have been cleaned up as well.

  Methods that returned a constant `true` value that now return `void`:
  - `CDBWrapper::WriteBatch`, `CDBWrapper::Write`, `CDBWrapper::Erase`
  - `TxIndex::DB::WriteTxs`
  - `BlockTreeDB::WriteReindexing`, `BlockTreeDB::WriteBatchSync`, `BlockTreeDB::WriteFlag`
  - `BlockManager::WriteBlockIndexDB`

  ### Note
  `CCoinsView::BatchWrite` (and transitively `CCoinsViewCache::Flush` & `CCoinsViewCache::Sync`) were intentionally not changed here. While all implementations return `true`, the base `CCoinsView::BatchWrite` returns `false`. Changing this would cause `coins_view` tests to fail with:
  > terminating due to uncaught exception of type std::logic_error: Not all unspent flagged entries were cleared

  We can fix that in a follow-up PR.

ACKs for top commit:
  achow101:
    ACK 743abbcbde
  janb84:
    ACK 743abbcbde
  TheCharlatan:
    ACK 743abbcbde
  sipa:
    ACK 743abbcbde

Tree-SHA512: b2a550bff066216f1958d2dd9a7ef6a9949de518cc636f8ab9c670e0b7a330c1eb8c838e458a8629acb8ac980cea6616955cd84436a7b8ab9096f6d648073b1e
2025-11-10 09:15:24 -08:00
MarcoFalke
fa01f38e53 move-only: Move CBlockFileInfo to kernel namespace
Also, move it to the blockstorage module, because it is only used inside
that module.

Can be reviewed with the git option --color-moved=dimmed-zebra
2025-10-28 16:08:44 +01:00
Lőrinc
743abbcbde refactor: inline constant return value of BlockTreeDB::WriteBatchSync and BlockManager::WriteBlockIndexDB and BlockTreeDB::WriteFlag 2025-08-13 15:47:48 -07:00
Lőrinc
e030240e90 refactor: inline constant return value of CDBWrapper::Erase and BlockTreeDB::WriteReindexing
Did both in this commit, since the return value of `WriteReindexing` was ignored anyway - which existed only because of the constant `Erase` being called
2025-08-13 15:47:48 -07:00
Lőrinc
cdab9480e9 refactor: inline constant return value of CDBWrapper::Write 2025-08-13 15:47:48 -07:00
Lőrinc
50b63a5698 refactor: inline constant return value of CDBWrapper::WriteBatch
`WriteBatch` can only ever return `true` - its errors are handled by throwing a `throw dbwrapper_error` instead.
The boolean return value is quite confusing, especially since it's symmetric with `CDBWrapper::Read`, which catches the exceptions and returns a boolean instead.
We're removing the constant return value and inlining `true` for its usages.
2025-08-13 15:47:39 -07:00
Sergi Delgado Segura
18524b072e Make nSequenceId init value constants
Make it easier to follow what the values come without having to go
over the comments, plus easier to maintain
2025-07-28 10:15:17 -04:00
Sergi Delgado Segura
8b91883a23 Set the same best tip on restart if two candidates have the same work
Before this, if we had two (or more) same work tip candidates and restarted our node,
it could be the case that the block set as tip after bootstrap didn't match the one
before stopping. That's because the work and `nSequenceId` of both block will be the same
(the latter is only kept in memory), so the active chain after restart would have depended
on what tip candidate was loaded first.

This makes sure that we are consistent over reboots.
2025-07-28 10:15:14 -04:00
Sergi Delgado Segura
ab145cb3b4 Updates CBlockIndexWorkComparator outdated comment 2025-07-28 10:11:34 -04:00
MarcoFalke
face8123fd log: [refactor] Use info level for init logs
This refactor does not change behavior.
2025-07-25 09:50:50 +02:00
MarcoFalke
fa183761cb log: Remove function name from init logs
It is redundant with -logsourcelocations and the log messages are
clearer without it.

Also, remove a double-space.

Also, add braces around `if` touched in the next commit.

This tiny behavior change requires a test fixup.
2025-07-25 09:50:24 +02:00
Lőrinc
478d40afc6 refactor: encapsulate vector/array keys into Obfuscation 2025-07-16 14:33:07 -07:00
Lőrinc
0b8bec8aa6 scripted-diff: unify xor-vs-obfuscation nomenclature
Mechanical refactor of the low-level "xor" wording to signal the intent instead of the implementation used.
The renames are ordered by heaviest-hitting substitutions first, and were constructed such that after each replacement the code is still compilable.

-BEGIN VERIFY SCRIPT-
sed -i \
  -e 's/\bGetObfuscateKey\b/GetObfuscation/g' \
  -e 's/\bxor_key\b/obfuscation/g' \
  -e 's/\bxor_pat\b/obfuscation/g' \
  -e 's/\bm_xor_key\b/m_obfuscation/g' \
  -e 's/\bm_xor\b/m_obfuscation/g' \
  -e 's/\bobfuscate_key\b/m_obfuscation/g' \
  -e 's/\bOBFUSCATE_KEY_KEY\b/OBFUSCATION_KEY_KEY/g' \
  -e 's/\bSetXor(/SetObfuscation(/g' \
  -e 's/\bdata_xor\b/obfuscation/g' \
  -e 's/\bCreateObfuscateKey\b/CreateObfuscation/g' \
  -e 's/\bobfuscate key\b/obfuscation key/g' \
  $(git ls-files '*.cpp' '*.h')
-END VERIFY SCRIPT-
2025-07-16 14:32:01 -07:00
Lőrinc
54ab0bd64c refactor: commit to 8 byte obfuscation keys
Since 31 byte xor-keys are not used in the codebase, using the common size (8 bytes) makes the benchmarks more realistic.

Co-authored-by: maflcko <6399679+maflcko@users.noreply.github.com>
2025-07-16 13:19:18 -07:00
Ava Chow
ea4285775e Merge bitcoin/bitcoin#29307: util: explicitly close all AutoFiles that have been written
c10e382d2a flatfile: check whether the file has been closed successfully (Vasil Dimov)
4bb5dd78ea util: check that a file has been closed before ~AutoFile() is called (Vasil Dimov)
8bb34f07df Explicitly close all AutoFiles that have been written (Vasil Dimov)
a69c4098b2 rpc: take ownership of the file by WriteUTXOSnapshot() (Hodlinator)

Pull request description:

  `fclose(3)` may fail to flush the previously written data to disk, thus a failing `fclose(3)` is as serious as a failing `fwrite(3)`.

  Previously the code ignored `fclose(3)` failures. This PR improves that by changing all users of `AutoFile` that use it to write data to explicitly close the file and handle a possible error.

  ---

  Other alternatives are:

  1. `fflush(3)` after each write to the file (and throw if it fails from the `AutoFile::write()` method) and hope that `fclose(3)` will then always succeed. Assert that it succeeds from the destructor 🙄. Will hurt performance.
  2. Throw nevertheless from the destructor. Exception within the exception in C++ I think results in terminating the program without a useful message.
  3. (this is implemented in the latest incarnation of this PR) Redesign `AutoFile` so that its destructor cannot fail. Adjust _all_ its users 😭. For example, if the file has been written to, then require the callers to explicitly call the `AutoFile::fclose()` method before the object goes out of scope. In the destructor, as a sanity check, assume/assert that this is indeed the case. Defeats the purpose of a RAII wrapper for `FILE*` which automatically closes the file when it goes out of scope and there are a lot of users of `AutoFile`.
  4. Pass a new callback function to the `AutoFile` constructor which will be called from the destructor to handle `fclose()` errors, as described in https://github.com/bitcoin/bitcoin/pull/29307#issuecomment-2243842400. My thinking is that if that callback is going to only log a message, then we can log the message directly from the destructor without needing a callback. If the callback is going to do more complicated error handling then it is easier to do that at the call site by directly calling `AutoFile::fclose()` instead of getting the `AutoFile` object out of scope (so that its destructor is called) and inspecting for side effects done by the callback (e.g. set a variable to indicate a failed `fclose()`).

ACKs for top commit:
  l0rinc:
    ACK c10e382d2a
  achow101:
    ACK c10e382d2a
  hodlinator:
    re-ACK c10e382d2a

Tree-SHA512: 3994ca57e5b2b649fc84f24dad144173b7500fc0e914e06291d5c32fbbf8d2b1f8eae0040abd7a5f16095ddf4e11fe1636c6092f49058cda34f3eb2ee536d7ba
2025-07-03 15:37:44 -07:00
Vasil Dimov
8bb34f07df Explicitly close all AutoFiles that have been written
There is no way to report a close error from `AutoFile` destructor.
Such an error could be serious if the file has been written to because
it may mean the file is now corrupted (same as if write fails).

So, change all users of `AutoFile` that use it to write data to
explicitly close the file and handle a possible error.
2025-06-16 15:33:15 +02:00
Roman Zeyde
6ecb9fc65f chore: use std::vector<std::byte> for BlockManager::ReadRawBlock() 2025-06-13 19:19:44 +03:00
Lőrinc
09ee8b7f27 node: avoid recomputing block hash in ReadBlock
Eliminate one SHA‑256 double‑hash computation of the header per block read by reusing the hash for:
* proof‑of‑work verification;
* (optional) integrity check against the supplied hash.
2025-05-26 23:23:44 +02:00
fanquake
2b85d31bcc refactor: starts/ends_with changes for clang-tidy 20 2025-04-22 13:16:54 +01:00
Lőrinc
8d801e3efb optimization: bulk serialization writes in WriteBlockUndo and WriteBlock
Similarly to the serialization reads optimization, buffered writes will enable batched XOR calculations.
This is especially beneficial since the current implementation requires copying the write input's `std::span` to perform obfuscation.
Batching allows us to apply XOR operations on the internal buffer instead, reducing unnecessary data copying and improving performance.

------

> macOS Sequoia 15.3.1
> C++ compiler .......................... Clang 19.1.7
> cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ && cmake --build build -j$(nproc) && build/bin/bench_bitcoin -filter='WriteBlockBench' -min-time=10000

Before:

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|        5,149,564.31 |              194.19 |    0.8% |     10.95 | `WriteBlockBench`

After:

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|        2,990,564.63 |              334.39 |    1.5% |     11.27 | `WriteBlockBench`

------

> Ubuntu 24.04.2 LTS
> C++ compiler .......................... GNU 13.3.0
> cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ && cmake --build build -j$(nproc) && build/bin/bench_bitcoin -filter='WriteBlockBench' -min-time=20000

Before:

|               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|        5,152,973.58 |              194.06 |    2.2% |   19,350,886.41 |    8,784,539.75 |  2.203 |   3,079,335.21 |    0.4% |     23.18 | `WriteBlockBench`

After:

|               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|        4,145,681.13 |              241.21 |    4.0% |   15,337,596.85 |    5,732,186.47 |  2.676 |   2,239,662.64 |    0.1% |     23.94 | `WriteBlockBench`

Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
Co-authored-by: Cory Fields <cory-nospam-@coryfields.com>
2025-04-14 12:04:06 +02:00
Lőrinc
520965e293 optimization: bulk serialization reads in UndoRead, ReadBlock
The obfuscation (XOR) operations are currently done byte-by-byte during serialization. Buffering the reads will enable batching the obfuscation operations later.

Different operating systems handle file caching differently, so reading larger batches (and processing them from memory) is measurably faster, likely because of fewer native fread calls and reduced lock contention.

Note that `ReadRawBlock` doesn't need buffering since it already reads the whole block directly.
Unlike `ReadBlockUndo`, the new `ReadBlock` implementation delegates to `ReadRawBlock`, which uses more memory than a buffered alternative but results in slightly simpler code and a small performance increase (~0.4%). This approach also clearly documents that `ReadRawBlock` is a logical subset of `ReadBlock` functionality.

The current implementation, which iterates over a fixed-size buffer, provides a more general alternative to Cory Fields' solution of reading the entire block size in advance.

Buffer sizes were selected based on benchmarking to ensure the buffered reader produces performance similar to reading the whole block into memory. Smaller buffers were slower, while larger ones showed diminishing returns.

------

> macOS Sequoia 15.3.1
> C++ compiler .......................... Clang 19.1.7
> cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ && cmake --build build -j$(nproc) && build/bin/bench_bitcoin -filter='ReadBlockBench' -min-time=10000

Before:

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|        2,271,441.67 |              440.25 |    0.1% |     11.00 | `ReadBlockBench`

After:

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|        1,738,971.29 |              575.05 |    0.2% |     10.97 | `ReadBlockBench`

------

> Ubuntu 24.04.2 LTS
> C++ compiler .......................... GNU 13.3.0
> cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ && cmake --build build -j$(nproc) && build/bin/bench_bitcoin -filter='ReadBlockBench' -min-time=20000

Before:

|               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|        6,895,987.11 |              145.01 |    0.0% |   71,055,269.86 |   23,977,374.37 |  2.963 |   5,074,828.78 |    0.4% |     22.00 | `ReadBlockBench`

After:

|               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|        5,771,882.71 |              173.25 |    0.0% |   65,741,889.82 |   20,453,232.33 |  3.214 |   3,971,321.75 |    0.3% |     22.01 | `ReadBlockBench`

Co-authored-by: maflcko <6399679+maflcko@users.noreply.github.com>
Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
Co-authored-by: Martin Leitner-Ankerl <martin.ankerl@gmail.com>
Co-authored-by: Cory Fields <cory-nospam-@coryfields.com>
2025-04-14 12:04:06 +02:00
Lőrinc
056cb3c0d2 refactor: clear up blockstorage/streams in preparation for optimization
Made every OpenBlockFile#fReadOnly value explicit.

Replaced hard-coded values in ReadRawBlock with STORAGE_HEADER_BYTES.
Changed `STORAGE_HEADER_BYTES` and `UNDO_DATA_DISK_OVERHEAD` to `uint32_t` to avoid casts.

Also added `LIFETIMEBOUND` to the `AutoFile` parameter of `BufferedFile`, which stores a reference to the underlying `AutoFile`, allowing Clang to emit warnings if the referenced `AutoFile` might be destroyed while `BufferedFile` still exists.
Without this attribute, code with lifetime violations wouldn't trigger compiler warnings.

Co-authored-by: maflcko <6399679+maflcko@users.noreply.github.com>
2025-04-14 11:57:14 +02:00
Lőrinc
67fcc64802 log: unify error messages for (read/write)[undo]block
Co-authored-by: maflcko <6399679+maflcko@users.noreply.github.com>
2025-04-13 23:44:46 +02:00
Lőrinc
a4de160492 scripted-diff: shorten BLOCK_SERIALIZATION_HEADER_SIZE constant
Renames the constant to be less verbose and better reflect its purpose:
it represents the size of the storage header that precedes serialized block data on disk,
not to be confused with a block's own header.

-BEGIN VERIFY SCRIPT-
git grep -q "STORAGE_HEADER_BYTES" $(git ls-files) && echo "Error: Target name STORAGE_HEADER_BYTES already exists in the codebase" && exit 1
sed -i 's/BLOCK_SERIALIZATION_HEADER_SIZE/STORAGE_HEADER_BYTES/g' $(git grep -l 'BLOCK_SERIALIZATION_HEADER_SIZE')
-END VERIFY SCRIPT-
2025-04-13 23:44:46 +02:00
Lőrinc
6640dd52c9 Narrow scope of undofile write to avoid possible resource management issue
`AutoFile{OpenUndoFile(pos)}` was still in scope when `FlushUndoFile(pos.nFile)` was called, which could lead to file handle conflicts or other unexpected behavior.

Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>
Co-authored-by: maflcko <6399679+maflcko@users.noreply.github.com>
2025-04-13 23:44:46 +02:00
Lőrinc
3197155f91 refactor: collect block read operations into try block
Reorganized error handling in block-related operations by grouping related operations together within the same scope.

In `ReadBlockUndo()` and `ReadBlock()`, moved all deserialization operations, comments and checksum verification inside a single try/catch block for cleaner error handling.
In `WriteBlockUndo()`, consolidated hash calculation and data writing operations within a common block to better express their logical relationship.
2025-04-13 23:44:44 +02:00
marcofleon
3c5d1a4681 Remove checkpoints
The headers presync logic should be enough to prevent memory DoS using
low-work headers. Therefore, we no longer have any use for checkpoints.
2025-03-13 11:13:13 +00:00
Ava Chow
601a6a6917 Merge bitcoin/bitcoin#30965: kernel: Move block tree db open to block manager
0cdddeb224 kernel: Move block tree db open to BlockManager constructor (TheCharlatan)
7fbb1bc44b kernel: Move block tree db open to block manager (TheCharlatan)
57ba59c0cd refactor: Remove redundant reindex check (TheCharlatan)

Pull request description:

  Before this change the block tree db was needlessly re-opened during startup when loading a completed snapshot. Improve this by letting the block manager open it on construction. This also simplifies the test code a bit.

  The change was initially motivated to make it easier for users of the kernel library to instantiate a BlockManager that may be used to read data from disk without loading the block index into a cache.

ACKs for top commit:
  maflcko:
    re-ACK 0cdddeb224 🏪
  achow101:
    ACK 0cdddeb224
  mzumsande:
    re-ACK 0cdddeb224

Tree-SHA512: fe3d557a725367e549e6a0659f64259cfef6aaa565ec867d9a177be0143ff18a2c4a20dd57e35e15f97cf870df476d88c05b03b6a7d9e8d51c568d9eda8947ef
2025-01-31 15:28:06 -05:00
Ava Chow
9ecc7af41f Merge bitcoin/bitcoin#31674: init: Lock blocksdir in addition to datadir
2656a5658c tests: add a test for the new blocksdir lock (Cory Fields)
bdc0a68e67 init: lock blocksdir in addition to datadir (Cory Fields)
cabb2e5c24 refactor: introduce a more general LockDirectories for init (Cory Fields)
1db331ba76 init: allow a new xor key to be written if the blocksdir is newly created (Cory Fields)

Pull request description:

  This probably should've been included in #12653 when `-blocksdir` was introduced. Credit TheCharlatan for noticing that it's missing.

  This guards against 2 processes running with separate datadirs but the same blocksdir. I didn't add `walletdir` as I assume sqlite has us covered there.

  It's not likely to happen currently, but may be more relevant in the future with applications using the kernel. Note that the kernel does not currently do any dir locking, but it should.

ACKs for top commit:
  maflcko:
    review ACK 2656a5658c 🏼
  kevkevinpal:
    ACK [2656a56](2656a5658c)
  achow101:
    ACK 2656a5658c
  tdb3:
    Code review and light test ACK 2656a5658c

Tree-SHA512: 3ba17dc670126adda104148e14d1322ea4f67d671c84aaa9c08c760ef778ca1936832c0dc843cd6367e09939f64c6f0a682b0fa23a5967e821b899dff1fff961
2025-01-24 18:15:00 -05:00
TheCharlatan
0cdddeb224 kernel: Move block tree db open to BlockManager constructor
Make the block db open RAII style by calling it in the BlockManager
constructor.

Before this change the block tree db was needlessly re-opened during
startup when loading a completed snapshot. Improve this by letting the
block manager open it on construction. This also simplifies the test
code a bit.

The change was initially motivated to make it easier for users of the
kernel library to instantiate a BlockManager that may be used to read
data from disk without loading the block index into a cache.
2025-01-20 21:27:50 +01:00
Cory Fields
1db331ba76 init: allow a new xor key to be written if the blocksdir is newly created
A subsequent commit will add a .lock file to this dir at startup, meaning that
the blocksdir is never empty by the time the xor key is being read/written.

Ignore all hidden files when determining if this is the first run.
2025-01-16 21:06:21 +00:00
Lőrinc
223081ece6 scripted-diff: rename block and undo functions for consistency
Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>

-BEGIN VERIFY SCRIPT-
grep -r -wE 'WriteBlock|ReadRawBlock|ReadBlock|WriteBlockUndo|ReadBlockUndo' $(git ls-files src/ ':!src/leveldb') && \
    echo "Error: One or more target names already exist!" && exit 1
sed -i \
    -e 's/\bSaveBlockToDisk/WriteBlock/g' \
    -e 's/\bReadRawBlockFromDisk/ReadRawBlock/g' \
    -e 's/\bReadBlockFromDisk/ReadBlock/g' \
    -e 's/\bWriteUndoDataForBlock/WriteBlockUndo/g' \
    -e 's/\bUndoReadFromDisk/ReadBlockUndo/g' \
    $(git ls-files src/ ':!src/leveldb')
-END VERIFY SCRIPT-
2025-01-09 15:17:02 +01:00
Lőrinc
baaa3b2846 refactor,blocks: remove costly asserts and modernize affected logs
When the behavior was changes in a previous commit (caching `GetSerializeSize` and avoiding `AutoFile.tell`), (static)asserts were added to make sure the behavior was kept - to make sure reviewers and CI validates it.
We can safely remove them now.

Logs were also slightly modernized since they were trivial to do.

Co-authored-by: Anthony Towns <aj@erisian.com.au>
Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>
2025-01-09 15:16:49 +01:00
Lőrinc
fa39f27a0f refactor,blocks: deduplicate block's serialized size calculations
For consistency `UNDO_DATA_DISK_OVERHEAD` was also extracted to avoid the constant's ambiguity.
Asserts were added to help with the review - they are removed in the next commit.

Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
2025-01-09 15:16:28 +01:00
Lőrinc
dfb2f9d004 refactor,blocks: inline WriteBlockToDisk
Similarly, `WriteBlockToDisk` wasn't really extracting a meaningful subset of the `SaveBlockToDisk` functionality, it's tied closely to the only caller (needs the header size twice, recalculated block serializes size, returns multiple branches, mutates parameter).

The inlined code should only differ in these parts (modernization will be done in other commits):
* renamed `blockPos` to `pos` in `SaveBlockToDisk` to match the parameter name;
* changed `return false` to `return FlatFilePos()`.

Also removed remaining references to `SaveBlockToDisk`.

Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
2025-01-09 13:24:53 +01:00
Lőrinc
42bc491465 refactor,blocks: inline UndoWriteToDisk
`UndoWriteToDisk` wasn't really extracting a meaningful subset of the `WriteUndoDataForBlock` functionality, it's tied closely to the only caller (needs the header size twice, recalculated undo serializes size, returns multiple branches, modifies parameter, needs documentation).

The inlined code should only differ in these parts (modernization will be done in other commits):
* renamed `_pos` to `pos` in `WriteUndoDataForBlock` to match the parameter name;
* inlined `hashBlock` parameter usage into `hasher << block.pprev->GetBlockHash()`;
* changed `return false` to `return FatalError`;
* capitalize comment.

Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
2025-01-09 13:18:22 +01:00