Commit Graph

207 Commits

Author SHA1 Message Date
MarcoFalke
face8123fd log: [refactor] Use info level for init logs
This refactor does not change behavior.
2025-07-25 09:50:50 +02:00
MarcoFalke
fa183761cb log: Remove function name from init logs
It is redundant with -logsourcelocations and the log messages are
clearer without it.

Also, remove a double-space.

Also, add braces around `if` touched in the next commit.

This tiny behavior change requires a test fixup.
2025-07-25 09:50:24 +02:00
Lőrinc
478d40afc6 refactor: encapsulate vector/array keys into Obfuscation 2025-07-16 14:33:07 -07:00
Lőrinc
0b8bec8aa6 scripted-diff: unify xor-vs-obfuscation nomenclature
Mechanical refactor of the low-level "xor" wording to signal the intent instead of the implementation used.
The renames are ordered by heaviest-hitting substitutions first, and were constructed such that after each replacement the code is still compilable.

-BEGIN VERIFY SCRIPT-
sed -i \
  -e 's/\bGetObfuscateKey\b/GetObfuscation/g' \
  -e 's/\bxor_key\b/obfuscation/g' \
  -e 's/\bxor_pat\b/obfuscation/g' \
  -e 's/\bm_xor_key\b/m_obfuscation/g' \
  -e 's/\bm_xor\b/m_obfuscation/g' \
  -e 's/\bobfuscate_key\b/m_obfuscation/g' \
  -e 's/\bOBFUSCATE_KEY_KEY\b/OBFUSCATION_KEY_KEY/g' \
  -e 's/\bSetXor(/SetObfuscation(/g' \
  -e 's/\bdata_xor\b/obfuscation/g' \
  -e 's/\bCreateObfuscateKey\b/CreateObfuscation/g' \
  -e 's/\bobfuscate key\b/obfuscation key/g' \
  $(git ls-files '*.cpp' '*.h')
-END VERIFY SCRIPT-
2025-07-16 14:32:01 -07:00
Lőrinc
54ab0bd64c refactor: commit to 8 byte obfuscation keys
Since 31 byte xor-keys are not used in the codebase, using the common size (8 bytes) makes the benchmarks more realistic.

Co-authored-by: maflcko <6399679+maflcko@users.noreply.github.com>
2025-07-16 13:19:18 -07:00
Ava Chow
ea4285775e Merge bitcoin/bitcoin#29307: util: explicitly close all AutoFiles that have been written
c10e382d2a flatfile: check whether the file has been closed successfully (Vasil Dimov)
4bb5dd78ea util: check that a file has been closed before ~AutoFile() is called (Vasil Dimov)
8bb34f07df Explicitly close all AutoFiles that have been written (Vasil Dimov)
a69c4098b2 rpc: take ownership of the file by WriteUTXOSnapshot() (Hodlinator)

Pull request description:

  `fclose(3)` may fail to flush the previously written data to disk, thus a failing `fclose(3)` is as serious as a failing `fwrite(3)`.

  Previously the code ignored `fclose(3)` failures. This PR improves that by changing all users of `AutoFile` that use it to write data to explicitly close the file and handle a possible error.

  ---

  Other alternatives are:

  1. `fflush(3)` after each write to the file (and throw if it fails from the `AutoFile::write()` method) and hope that `fclose(3)` will then always succeed. Assert that it succeeds from the destructor 🙄. Will hurt performance.
  2. Throw nevertheless from the destructor. Exception within the exception in C++ I think results in terminating the program without a useful message.
  3. (this is implemented in the latest incarnation of this PR) Redesign `AutoFile` so that its destructor cannot fail. Adjust _all_ its users 😭. For example, if the file has been written to, then require the callers to explicitly call the `AutoFile::fclose()` method before the object goes out of scope. In the destructor, as a sanity check, assume/assert that this is indeed the case. Defeats the purpose of a RAII wrapper for `FILE*` which automatically closes the file when it goes out of scope and there are a lot of users of `AutoFile`.
  4. Pass a new callback function to the `AutoFile` constructor which will be called from the destructor to handle `fclose()` errors, as described in https://github.com/bitcoin/bitcoin/pull/29307#issuecomment-2243842400. My thinking is that if that callback is going to only log a message, then we can log the message directly from the destructor without needing a callback. If the callback is going to do more complicated error handling then it is easier to do that at the call site by directly calling `AutoFile::fclose()` instead of getting the `AutoFile` object out of scope (so that its destructor is called) and inspecting for side effects done by the callback (e.g. set a variable to indicate a failed `fclose()`).

ACKs for top commit:
  l0rinc:
    ACK c10e382d2a
  achow101:
    ACK c10e382d2a
  hodlinator:
    re-ACK c10e382d2a

Tree-SHA512: 3994ca57e5b2b649fc84f24dad144173b7500fc0e914e06291d5c32fbbf8d2b1f8eae0040abd7a5f16095ddf4e11fe1636c6092f49058cda34f3eb2ee536d7ba
2025-07-03 15:37:44 -07:00
Vasil Dimov
8bb34f07df Explicitly close all AutoFiles that have been written
There is no way to report a close error from `AutoFile` destructor.
Such an error could be serious if the file has been written to because
it may mean the file is now corrupted (same as if write fails).

So, change all users of `AutoFile` that use it to write data to
explicitly close the file and handle a possible error.
2025-06-16 15:33:15 +02:00
Roman Zeyde
6ecb9fc65f chore: use std::vector<std::byte> for BlockManager::ReadRawBlock() 2025-06-13 19:19:44 +03:00
Lőrinc
09ee8b7f27 node: avoid recomputing block hash in ReadBlock
Eliminate one SHA‑256 double‑hash computation of the header per block read by reusing the hash for:
* proof‑of‑work verification;
* (optional) integrity check against the supplied hash.
2025-05-26 23:23:44 +02:00
fanquake
2b85d31bcc refactor: starts/ends_with changes for clang-tidy 20 2025-04-22 13:16:54 +01:00
Lőrinc
8d801e3efb optimization: bulk serialization writes in WriteBlockUndo and WriteBlock
Similarly to the serialization reads optimization, buffered writes will enable batched XOR calculations.
This is especially beneficial since the current implementation requires copying the write input's `std::span` to perform obfuscation.
Batching allows us to apply XOR operations on the internal buffer instead, reducing unnecessary data copying and improving performance.

------

> macOS Sequoia 15.3.1
> C++ compiler .......................... Clang 19.1.7
> cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ && cmake --build build -j$(nproc) && build/bin/bench_bitcoin -filter='WriteBlockBench' -min-time=10000

Before:

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|        5,149,564.31 |              194.19 |    0.8% |     10.95 | `WriteBlockBench`

After:

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|        2,990,564.63 |              334.39 |    1.5% |     11.27 | `WriteBlockBench`

------

> Ubuntu 24.04.2 LTS
> C++ compiler .......................... GNU 13.3.0
> cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ && cmake --build build -j$(nproc) && build/bin/bench_bitcoin -filter='WriteBlockBench' -min-time=20000

Before:

|               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|        5,152,973.58 |              194.06 |    2.2% |   19,350,886.41 |    8,784,539.75 |  2.203 |   3,079,335.21 |    0.4% |     23.18 | `WriteBlockBench`

After:

|               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|        4,145,681.13 |              241.21 |    4.0% |   15,337,596.85 |    5,732,186.47 |  2.676 |   2,239,662.64 |    0.1% |     23.94 | `WriteBlockBench`

Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
Co-authored-by: Cory Fields <cory-nospam-@coryfields.com>
2025-04-14 12:04:06 +02:00
Lőrinc
520965e293 optimization: bulk serialization reads in UndoRead, ReadBlock
The obfuscation (XOR) operations are currently done byte-by-byte during serialization. Buffering the reads will enable batching the obfuscation operations later.

Different operating systems handle file caching differently, so reading larger batches (and processing them from memory) is measurably faster, likely because of fewer native fread calls and reduced lock contention.

Note that `ReadRawBlock` doesn't need buffering since it already reads the whole block directly.
Unlike `ReadBlockUndo`, the new `ReadBlock` implementation delegates to `ReadRawBlock`, which uses more memory than a buffered alternative but results in slightly simpler code and a small performance increase (~0.4%). This approach also clearly documents that `ReadRawBlock` is a logical subset of `ReadBlock` functionality.

The current implementation, which iterates over a fixed-size buffer, provides a more general alternative to Cory Fields' solution of reading the entire block size in advance.

Buffer sizes were selected based on benchmarking to ensure the buffered reader produces performance similar to reading the whole block into memory. Smaller buffers were slower, while larger ones showed diminishing returns.

------

> macOS Sequoia 15.3.1
> C++ compiler .......................... Clang 19.1.7
> cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ && cmake --build build -j$(nproc) && build/bin/bench_bitcoin -filter='ReadBlockBench' -min-time=10000

Before:

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|        2,271,441.67 |              440.25 |    0.1% |     11.00 | `ReadBlockBench`

After:

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|        1,738,971.29 |              575.05 |    0.2% |     10.97 | `ReadBlockBench`

------

> Ubuntu 24.04.2 LTS
> C++ compiler .......................... GNU 13.3.0
> cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ && cmake --build build -j$(nproc) && build/bin/bench_bitcoin -filter='ReadBlockBench' -min-time=20000

Before:

|               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|        6,895,987.11 |              145.01 |    0.0% |   71,055,269.86 |   23,977,374.37 |  2.963 |   5,074,828.78 |    0.4% |     22.00 | `ReadBlockBench`

After:

|               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|        5,771,882.71 |              173.25 |    0.0% |   65,741,889.82 |   20,453,232.33 |  3.214 |   3,971,321.75 |    0.3% |     22.01 | `ReadBlockBench`

Co-authored-by: maflcko <6399679+maflcko@users.noreply.github.com>
Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
Co-authored-by: Martin Leitner-Ankerl <martin.ankerl@gmail.com>
Co-authored-by: Cory Fields <cory-nospam-@coryfields.com>
2025-04-14 12:04:06 +02:00
Lőrinc
056cb3c0d2 refactor: clear up blockstorage/streams in preparation for optimization
Made every OpenBlockFile#fReadOnly value explicit.

Replaced hard-coded values in ReadRawBlock with STORAGE_HEADER_BYTES.
Changed `STORAGE_HEADER_BYTES` and `UNDO_DATA_DISK_OVERHEAD` to `uint32_t` to avoid casts.

Also added `LIFETIMEBOUND` to the `AutoFile` parameter of `BufferedFile`, which stores a reference to the underlying `AutoFile`, allowing Clang to emit warnings if the referenced `AutoFile` might be destroyed while `BufferedFile` still exists.
Without this attribute, code with lifetime violations wouldn't trigger compiler warnings.

Co-authored-by: maflcko <6399679+maflcko@users.noreply.github.com>
2025-04-14 11:57:14 +02:00
Lőrinc
67fcc64802 log: unify error messages for (read/write)[undo]block
Co-authored-by: maflcko <6399679+maflcko@users.noreply.github.com>
2025-04-13 23:44:46 +02:00
Lőrinc
a4de160492 scripted-diff: shorten BLOCK_SERIALIZATION_HEADER_SIZE constant
Renames the constant to be less verbose and better reflect its purpose:
it represents the size of the storage header that precedes serialized block data on disk,
not to be confused with a block's own header.

-BEGIN VERIFY SCRIPT-
git grep -q "STORAGE_HEADER_BYTES" $(git ls-files) && echo "Error: Target name STORAGE_HEADER_BYTES already exists in the codebase" && exit 1
sed -i 's/BLOCK_SERIALIZATION_HEADER_SIZE/STORAGE_HEADER_BYTES/g' $(git grep -l 'BLOCK_SERIALIZATION_HEADER_SIZE')
-END VERIFY SCRIPT-
2025-04-13 23:44:46 +02:00
Lőrinc
6640dd52c9 Narrow scope of undofile write to avoid possible resource management issue
`AutoFile{OpenUndoFile(pos)}` was still in scope when `FlushUndoFile(pos.nFile)` was called, which could lead to file handle conflicts or other unexpected behavior.

Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>
Co-authored-by: maflcko <6399679+maflcko@users.noreply.github.com>
2025-04-13 23:44:46 +02:00
Lőrinc
3197155f91 refactor: collect block read operations into try block
Reorganized error handling in block-related operations by grouping related operations together within the same scope.

In `ReadBlockUndo()` and `ReadBlock()`, moved all deserialization operations, comments and checksum verification inside a single try/catch block for cleaner error handling.
In `WriteBlockUndo()`, consolidated hash calculation and data writing operations within a common block to better express their logical relationship.
2025-04-13 23:44:44 +02:00
marcofleon
3c5d1a4681 Remove checkpoints
The headers presync logic should be enough to prevent memory DoS using
low-work headers. Therefore, we no longer have any use for checkpoints.
2025-03-13 11:13:13 +00:00
Ava Chow
601a6a6917 Merge bitcoin/bitcoin#30965: kernel: Move block tree db open to block manager
0cdddeb224 kernel: Move block tree db open to BlockManager constructor (TheCharlatan)
7fbb1bc44b kernel: Move block tree db open to block manager (TheCharlatan)
57ba59c0cd refactor: Remove redundant reindex check (TheCharlatan)

Pull request description:

  Before this change the block tree db was needlessly re-opened during startup when loading a completed snapshot. Improve this by letting the block manager open it on construction. This also simplifies the test code a bit.

  The change was initially motivated to make it easier for users of the kernel library to instantiate a BlockManager that may be used to read data from disk without loading the block index into a cache.

ACKs for top commit:
  maflcko:
    re-ACK 0cdddeb224 🏪
  achow101:
    ACK 0cdddeb224
  mzumsande:
    re-ACK 0cdddeb224

Tree-SHA512: fe3d557a725367e549e6a0659f64259cfef6aaa565ec867d9a177be0143ff18a2c4a20dd57e35e15f97cf870df476d88c05b03b6a7d9e8d51c568d9eda8947ef
2025-01-31 15:28:06 -05:00
Ava Chow
9ecc7af41f Merge bitcoin/bitcoin#31674: init: Lock blocksdir in addition to datadir
2656a5658c tests: add a test for the new blocksdir lock (Cory Fields)
bdc0a68e67 init: lock blocksdir in addition to datadir (Cory Fields)
cabb2e5c24 refactor: introduce a more general LockDirectories for init (Cory Fields)
1db331ba76 init: allow a new xor key to be written if the blocksdir is newly created (Cory Fields)

Pull request description:

  This probably should've been included in #12653 when `-blocksdir` was introduced. Credit TheCharlatan for noticing that it's missing.

  This guards against 2 processes running with separate datadirs but the same blocksdir. I didn't add `walletdir` as I assume sqlite has us covered there.

  It's not likely to happen currently, but may be more relevant in the future with applications using the kernel. Note that the kernel does not currently do any dir locking, but it should.

ACKs for top commit:
  maflcko:
    review ACK 2656a5658c 🏼
  kevkevinpal:
    ACK [2656a56](2656a5658c)
  achow101:
    ACK 2656a5658c
  tdb3:
    Code review and light test ACK 2656a5658c

Tree-SHA512: 3ba17dc670126adda104148e14d1322ea4f67d671c84aaa9c08c760ef778ca1936832c0dc843cd6367e09939f64c6f0a682b0fa23a5967e821b899dff1fff961
2025-01-24 18:15:00 -05:00
TheCharlatan
0cdddeb224 kernel: Move block tree db open to BlockManager constructor
Make the block db open RAII style by calling it in the BlockManager
constructor.

Before this change the block tree db was needlessly re-opened during
startup when loading a completed snapshot. Improve this by letting the
block manager open it on construction. This also simplifies the test
code a bit.

The change was initially motivated to make it easier for users of the
kernel library to instantiate a BlockManager that may be used to read
data from disk without loading the block index into a cache.
2025-01-20 21:27:50 +01:00
Cory Fields
1db331ba76 init: allow a new xor key to be written if the blocksdir is newly created
A subsequent commit will add a .lock file to this dir at startup, meaning that
the blocksdir is never empty by the time the xor key is being read/written.

Ignore all hidden files when determining if this is the first run.
2025-01-16 21:06:21 +00:00
Lőrinc
223081ece6 scripted-diff: rename block and undo functions for consistency
Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>

-BEGIN VERIFY SCRIPT-
grep -r -wE 'WriteBlock|ReadRawBlock|ReadBlock|WriteBlockUndo|ReadBlockUndo' $(git ls-files src/ ':!src/leveldb') && \
    echo "Error: One or more target names already exist!" && exit 1
sed -i \
    -e 's/\bSaveBlockToDisk/WriteBlock/g' \
    -e 's/\bReadRawBlockFromDisk/ReadRawBlock/g' \
    -e 's/\bReadBlockFromDisk/ReadBlock/g' \
    -e 's/\bWriteUndoDataForBlock/WriteBlockUndo/g' \
    -e 's/\bUndoReadFromDisk/ReadBlockUndo/g' \
    $(git ls-files src/ ':!src/leveldb')
-END VERIFY SCRIPT-
2025-01-09 15:17:02 +01:00
Lőrinc
baaa3b2846 refactor,blocks: remove costly asserts and modernize affected logs
When the behavior was changes in a previous commit (caching `GetSerializeSize` and avoiding `AutoFile.tell`), (static)asserts were added to make sure the behavior was kept - to make sure reviewers and CI validates it.
We can safely remove them now.

Logs were also slightly modernized since they were trivial to do.

Co-authored-by: Anthony Towns <aj@erisian.com.au>
Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>
2025-01-09 15:16:49 +01:00
Lőrinc
fa39f27a0f refactor,blocks: deduplicate block's serialized size calculations
For consistency `UNDO_DATA_DISK_OVERHEAD` was also extracted to avoid the constant's ambiguity.
Asserts were added to help with the review - they are removed in the next commit.

Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
2025-01-09 15:16:28 +01:00
Lőrinc
dfb2f9d004 refactor,blocks: inline WriteBlockToDisk
Similarly, `WriteBlockToDisk` wasn't really extracting a meaningful subset of the `SaveBlockToDisk` functionality, it's tied closely to the only caller (needs the header size twice, recalculated block serializes size, returns multiple branches, mutates parameter).

The inlined code should only differ in these parts (modernization will be done in other commits):
* renamed `blockPos` to `pos` in `SaveBlockToDisk` to match the parameter name;
* changed `return false` to `return FlatFilePos()`.

Also removed remaining references to `SaveBlockToDisk`.

Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
2025-01-09 13:24:53 +01:00
Lőrinc
42bc491465 refactor,blocks: inline UndoWriteToDisk
`UndoWriteToDisk` wasn't really extracting a meaningful subset of the `WriteUndoDataForBlock` functionality, it's tied closely to the only caller (needs the header size twice, recalculated undo serializes size, returns multiple branches, modifies parameter, needs documentation).

The inlined code should only differ in these parts (modernization will be done in other commits):
* renamed `_pos` to `pos` in `WriteUndoDataForBlock` to match the parameter name;
* inlined `hashBlock` parameter usage into `hasher << block.pprev->GetBlockHash()`;
* changed `return false` to `return FatalError`;
* capitalize comment.

Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
2025-01-09 13:18:22 +01:00
Pieter Wuille
67a3d59076 streams: remove unused code 2024-09-19 07:33:02 -04:00
Pieter Wuille
e624a9bef1 streams: cache file position within AutoFile 2024-09-13 07:35:41 -04:00
Ava Chow
d4b5553849 Merge bitcoin/bitcoin#30742: kernel: Use spans instead of vectors for passing block headers to validation functions
a2955f0979 validation: Use span for ImportBlocks paths (TheCharlatan)
20515ea3f5 validation: Use span for CalculateClaimedHeadersWork (TheCharlatan)
52575e96e7 validation: Use span for ProcessNewBlockHeaders (TheCharlatan)

Pull request description:

  Makes it friendlier for potential future users of the kernel library if they do not store the headers in a std::vector, but can guarantee contiguous memory.

  Take this opportunity to also change the argument of ImportBlocks previously taking a `std::vector` to a `std::span`.

ACKs for top commit:
  stickies-v:
    re-ACK a2955f0979 - no changes except further walking the ~file~ path of modernizing variable names.
  maflcko:
    ACK a2955f0979 🕑
  achow101:
    ACK a2955f0979
  danielabrozzoni:
    ACK a2955f0979

Tree-SHA512: 8b07f4ad26e270b65600d1968cd78847b85caca5bfbb83fd9860389f26656b1d9a40b85e0990339f50403d18cedcd2456990054f3b8b0bedce943e50222d2709
2024-09-03 15:40:40 -04:00
TheCharlatan
a2955f0979 validation: Use span for ImportBlocks paths
Makes it friendlier for potential future users of the kernel library if
they do not store the headers in a std::vector, but can guarantee
contiguous memory.
2024-08-30 12:39:46 +02:00
MarcoFalke
3333415890 scripted-diff: LogPrint -> LogDebug
-BEGIN VERIFY SCRIPT-
 sed -i 's/\<LogPrint\>/LogDebug/g' $( git grep -l '\<LogPrint\>'  -- ./contrib/ ./src/ ./test/ ':(exclude)src/logging.h' )
-END VERIFY SCRIPT-
2024-08-29 13:49:57 +02:00
stickies-v
2925bd537c refactor: use c++20 std::views::reverse instead of reverse_iterator.h
Use std::ranges::views::reverse instead of the implementation in
reverse_iterator.h, and remove it as it is no longer used.
2024-08-06 00:23:38 +01:00
Ava Chow
949b673472 Merge bitcoin/bitcoin#28052: blockstorage: XOR blocksdir *.dat files
fa895c7283 mingw: Document mode wbx workaround (MarcoFalke)
fa359255fe Add -blocksxor boolean option (MarcoFalke)
fa7f7ac040 Return XOR AutoFile from BlockManager::Open*File() (MarcoFalke)

Pull request description:

  Currently the *.dat files in the blocksdir store the data received from remote peers as-is. This may be problematic when a program other than Bitcoin Core tries to interpret them by accident. For example, an anti-virus program or other program may scan them and move them into quarantine, or delete them, or corrupt them. This may cause Bitcoin Core to fail a reorg, or fail to reply to block requests (via P2P, RPC, REST, ...).

  Fix this, similar to https://github.com/bitcoin/bitcoin/pull/6650, by rolling a random XOR pattern over the dat files when writing or reading them.

  Obviously this can only protect against programs that accidentally and unintentionally are trying to mess with the dat files. Any program that intentionally wants to mess with the dat files can still trivially do so.

  The XOR pattern is only applied when the blocksdir is freshly created, and there is an option to disable it (on creation), so that people can disable it, if needed.

ACKs for top commit:
  achow101:
    ACK fa895c7283
  TheCharlatan:
    Re-ACK fa895c7283
  hodlinator:
    ACK fa895c7283

Tree-SHA512: c92a6a717da83bc33a9b8671a779eeefde2c63b192362ba1d71e6535ee31d08e2802b74acc908345197de9daac6930e4771595ee25b09acd5a67f7ea34854720
2024-08-05 17:52:42 -04:00
Fabian Jahr
bf0efb4fc7 scripted-diff: Modernize naming of nChainTx and nTxCount
-BEGIN VERIFY SCRIPT-
sed -i 's/nChainTx/m_chain_tx_count/g' $(git grep -l 'nChainTx' ./src)
sed -i 's/nTxCount/tx_count/g' $(git grep -l 'nTxCount' ./src)
-END VERIFY SCRIPT-
2024-08-04 14:24:43 +02:00
MarcoFalke
fa895c7283 mingw: Document mode wbx workaround 2024-07-26 17:31:15 +02:00
MarcoFalke
fa359255fe Add -blocksxor boolean option 2024-07-26 17:30:53 +02:00
MarcoFalke
fa7f7ac040 Return XOR AutoFile from BlockManager::Open*File()
This is a refactor, because the XOR key is empty.
2024-07-26 12:28:59 +02:00
TheCharlatan
7aa8994c6f refactor: Add FlatFileSeq member variables in BlockManager
Instead of constructing a new class every time a file operation is done,
construct them once for each of the undo and block file when a new
BlockManager is created.

In future, this might make it easier to introduce an abstract block
store.
2024-07-24 09:39:35 +02:00
Ryan Ofsky
8426e018bf Merge bitcoin/bitcoin#30428: log: LogError with FlatFilePos in UndoReadFromDisk
fa14e1d9d5 log: Fix __func__ in LogError in blockstorage module (MarcoFalke)
fad59a2f0f log: LogError with FlatFilePos in UndoReadFromDisk (MarcoFalke)
aaaa3323f3 refactor: Mark IsBlockPruned const (MarcoFalke)

Pull request description:

  These errors should never happen in normal operation. If they do,
  knowing the `FlatFilePos` may be useful to determine if data corruption
  happened. Also, handle the error `pos.IsNull()` as part of `OpenUndoFile`,
  because it may as well have happened due to data corruption.

  This mirrors the `LogError` behavior from `ReadBlockFromDisk`.

  Also, two other fixup commits in this module.

ACKs for top commit:
  kevkevinpal:
    ACK [fa14e1d](fa14e1d9d5)
  tdb3:
    cr and light test ACK fa14e1d9d5
  ryanofsky:
    Code review ACK fa14e1d9d5. This should make logging clearer and more consistent

Tree-SHA512: abb492a919b4796698d1de0a7874c8eae355422b992aa80dcd6b59c2de1ee0d2949f62b3cf649cd62892976fee640358f7522867ed9d48a595d6f8f4e619df50
2024-07-15 13:42:53 -04:00
MarcoFalke
fa14e1d9d5 log: Fix __func__ in LogError in blockstorage module
These errors should never happen. However, when they do happen, it is
useful to log the correct error location (function name).

For example, this fixes an incorrect "ConnectBlock()" in
"WriteUndoDataForBlock".
2024-07-11 16:34:43 +02:00
MarcoFalke
fad59a2f0f log: LogError with FlatFilePos in UndoReadFromDisk
These errors should never happen in normal operation. If they do,
knowing the FlatFilePos may be useful to determine if data corruption
happened. Also, handle the error pos.IsNull() as part of OpenUndoFile,
because it may as well have happened due to data corruption.

This mirrors the LogError behavior from ReadBlockFromDisk.
2024-07-11 16:22:31 +02:00
MarcoFalke
aaaa3323f3 refactor: Mark IsBlockPruned const
Member fields are used read-only in this method.
2024-07-11 15:39:19 +02:00
Ava Chow
f4849f6922 Merge bitcoin/bitcoin#29668: prune, rpc: Check undo data when finding pruneheight
8789dc8f31 doc: Add note to getblockfrompeer on missing undo data (Fabian Jahr)
4a1975008b rpc: Make pruneheight also reflect undo data presence (Fabian Jahr)
96b4facc91 refactor, blockstorage: Generalize GetFirstStoredBlock (Fabian Jahr)

Pull request description:

  The function `GetFirstStoredBlock()` helps us find the first block for which we have data. So far this function only looked for a block with `BLOCK_HAVE_DATA`. However, this doesn't mean that we also have the undo data of that block, and undo data might be required for what a user would like to do with those blocks. One example of how this might happen is if some blocks were fetched using the `getblockfrompeer` RPC. Blocks fetched from a peer will have data but no undo data.

  The first commit here allows `GetFirstStoredBlock()` to check for undo data as well by passing a parameter. This alone is useful for #29553 and I would use it there.

  In the second commit I am applying the undo check to the RPCs that report `pruneheight` to the user. I find this much more intuitive because I think the user expects to be able to do all operations on blocks up until the `pruneheight` but that is not the case if undo data is missing. I personally ran into this once before and now again when testing for assumeutxo when I had used `getblockfrompeer`. The following commit adds test coverage for this change of behavior.

  The last commit adds a note in the docs of `getblockfrompeer` that undo data will not be available.

ACKs for top commit:
  achow101:
    ACK 8789dc8f31
  furszy:
    Code review ACK 8789dc8f31.
  stickies-v:
    ACK 8789dc8f31

Tree-SHA512: 90ae8bdd07a496ade579aa25240609c61c9ed173ad38d30533f6c631fe674e5a41727478ade69ca4b71a571ad94c9da4b33ebba6b5d8821109313c2de3bdfb3d
2024-07-10 15:27:05 -04:00
Fabian Jahr
96b4facc91 refactor, blockstorage: Generalize GetFirstStoredBlock
GetFirstStoredBlock is generalized to check for any data status with a
status mask that needs to be passed as a parameter. To reflect this the
function is also renamed to GetFirstBlock.

Co-authored-by: stickies-v <stickies-v@protonmail.com>
2024-06-21 15:00:16 +02:00
Ryan Ofsky
f68cba29b3 blockman: Replace m_reindexing with m_blockfiles_indexed
This is a just a mechanical change, renaming and inverting the meaning
of the indexing variable.

"m_blockfiles_indexed" is a more straightforward name for this variable
because this variable just indicates whether or not
<datadir>/blocks/blk?????.dat files have been indexed in the
<datadir>/blocks/index LevelDB database. The name "m_reindexing" was
more confusing, it could be true even if -reindex was not specified, and
false when it was specified. Also, the previous name unnecessarily
required thinking about the whole reindexing process just to understand
simple checks in validation code about whether blocks were indexed.

The motivation for this change is to follow up on previous commits,
moving away from having multiple variables called "reindex" internally,
and instead naming variables individually after what they do and
represent.
2024-06-07 19:18:46 +02:00
Ava Chow
058af75874 Merge bitcoin/bitcoin#29817: kernel: De-globalize fReindex
b47bd95920 kernel: De-globalize fReindex (TheCharlatan)

Pull request description:

  fReindex is one of the last remaining globals exposed by the kernel library, so move it into the blockstorage class to reduce the amount of global mutable state and make the kernel library a bit less awkward to use.

  ---

  This pull request is part of the [libbitcoinkernel project](https://github.com/bitcoin/bitcoin/issues/27587).

ACKs for top commit:
  achow101:
    ACK b47bd95920
  ryanofsky:
    Code review ACK b47bd95920. I rereviewed the whole PR, but the only change since last review was reverting the bugfix https://github.com/bitcoin/bitcoin/pull/29817#discussion_r1578327024 and make the change a pure refactoring.
  mzumsande:
    Code Review ACK b47bd95920
  stickies-v:
    ACK b47bd95920

Tree-SHA512: f7399d01f93bc0c0c7428fe95d19b9d29b4ed00a4f1deabca78fb0c4fecb434ec971e890feecb105938b5247c926850b1b7b4a4a9caa333a061e40777d0c8463
2024-05-17 15:50:56 -04:00
Ryan Ofsky
2f53f2273d Merge bitcoin/bitcoin#29975: blockstorage: Separate reindexing from saving new blocks
e41667b720 blockstorage: Don't move cursor backwards in UpdateBlockInfo (Ryan Ofsky)
17103637c6 blockstorage: Rename FindBlockPos and have it return a FlatFilePos (Martin Zumsande)
d9e477c4dc validation, blockstorage: Separate code paths for reindex and saving new blocks (Martin Zumsande)
064859bbad blockstorage: split up FindBlockPos function (Martin Zumsande)
fdae638e83 doc: Improve doc for functions involved in saving blocks to disk (Martin Zumsande)
0d114e3cb2 blockstorage: Add Assume for fKnown / snapshot chainstate (Martin Zumsande)

Pull request description:

  `SaveBlockToDisk` / `FindBlockPos` are used for two purposes, depending on whether they are called during reindexing (`dbp` set,  `fKnown = true`) or in the "normal" case when adding new blocks (`dbp == nullptr`,  `fKnown = false`).
  The actual tasks are quite different
  - In normal mode, preparations for saving a new block are made, which is then saved: find the correct position on disk (maybe skipping to a new blk file), check for available disk space, update the blockfile info db, save the block.
  - during reindex, most of this is not necessary (the block is already on disk after all), only the blockfile info needs to rebuilt because reindex wiped the leveldb it's saved in.

  Using one function with many conditional statements for this leads to code that is hard to read / understand and bug-prone:
  - many code paths in `FindBlockPos` are conditional on `fKnown` or `!fKnown`
  - It's not really clear what actually needs to be done during reindex (we don't need to "save a block to disk" or "find a block pos" as the function names suggest)
  - logic that should be applied to only one of the two modes is sometimes applied to both (see first commit, or #27039)

  #24858 and #27039 were recent bugs directly related to the differences between reindexing and normal mode, and in both cases the simple fix took a long time to be reviewed and merged.

  This PR proposes to clean this code up by splitting out the reindex logic into a separate function (`UpdateBlockInfo`) which will be called directly from validation. As a result, `SaveBlockToDisk` and `FindBlockPos` only need to cover the non-reindex logic.

ACKs for top commit:
  paplorinc:
    ACK e41667b720
  TheCharlatan:
    Re-ACK e41667b720
  ryanofsky:
    Code review ACK e41667b720. Just improvements to comments since last review.

Tree-SHA512: a14ff9a0facf6b1e3c1cd724a2d19a79a25d4b48de64398fdd172671532a472bc10a20cbb64ac3a3e55814dcc877d0597a3e1699cabc4f9d9a86b439b6eaba20
2024-05-16 11:16:08 -04:00
TheCharlatan
b47bd95920 kernel: De-globalize fReindex
fReindex is one of the last remaining globals exposed by the kernel
library, so move it into the blockstorage class to reduce the amount of
global mutable state and make the kernel library a bit less awkward to
use.
2024-05-16 11:28:46 +02:00
Ryan Ofsky
e41667b720 blockstorage: Don't move cursor backwards in UpdateBlockInfo
Previously, it was possible to move the cursor back to an older file
during reindex if blocks are enocuntered out of order during reindex.
This would mean that MaxBlockfileNum() would be incorrect, and
a wrong DB_LAST_BLOCK could be written to disk.

This improves the logic by only ever moving the cursor forward (if possible)
but not backwards.

Co-authored-by: Martin Zumsande <mzumsande@gmail.com>
2024-05-14 14:54:27 -04:00