Merge bitcoin/bitcoin#33657: rest: allow reading partial block data from storage

07135290c1 rest: allow reading partial block data from storage (Roman Zeyde)
4e2af1c065 blockstorage: allow reading partial block data from storage (Roman Zeyde)
f2fd1aa21c blockstorage: return an error code from `ReadRawBlock()` (Roman Zeyde)

Pull request description:

  It allows fetching specific transactions using an external index, following https://github.com/bitcoin/bitcoin/pull/32541#issuecomment-3267485313.

  Currently, electrs and other indexers map between an address/scripthash to the list of the relevant transactions.

  However, in order to fetch those transactions from bitcoind, electrs relies on reading the whole block and post-filtering for a specific transaction[^1]. Other indexers use a `txindex` to fetch a transaction using its txid [^2][^3][^4].

  The above approach has significant storage and CPU overhead, since the `txid` is a pseudo-random 32-byte value. Also, mainnet `txindex` takes ~60GB today.

  This PR is adding support for using the transaction's position within its block to be able to fetch it directly using [REST API](https://github.com/bitcoin/bitcoin/blob/master/doc/REST-interface.md), using the following HTTP request:

  ```
  GET /rest/blockpart/BLOCKHASH.bin?offset=OFFSET&size=SIZE
  ```

  - The offsets' index can be encoded much more efficiently ([~1.3GB today](https://github.com/romanz/bindex-rs/pull/66#issuecomment-3508476436)).

  - Address history query performance can be tested on mainnet using [1BitcoinEaterAddressDontSendf59kuE](https://mempool.space/address/1BitcoinEaterAddressDontSendf59kuE) - assuming warm OS block cache, [it takes <1s to fetch 5200 txs, i.e. <0.2ms per tx](https://github.com/romanz/bindex-rs/pull/66#issuecomment-3508476436) with [bindex](https://github.com/romanz/bindex-rs).

  - Only binary and hex response formats are supported.

  [^1]: https://github.com/romanz/electrs/blob/master/doc/schema.md
  [^2]: https://github.com/Blockstream/electrs/blob/new-index/doc/schema.md#txstore
  [^3]: https://github.com/spesmilo/electrumx/blob/master/docs/HOWTO.rst#prerequisites
  [^4]: https://github.com/cculianu/Fulcrum/blob/master/README.md#requirements

ACKs for top commit:
  maflcko:
    review ACK 07135290c1 🏪
  l0rinc:
    ACK 07135290c1
  hodlinator:
    re-ACK 07135290c1

Tree-SHA512: bcce7bf4b9a3e5e920ab5a83e656f50d5d7840cdde6b7147d329cf578f8a2db555fc1aa5334e8ee64d5630d25839ece77a2cf421c6c3ac1fa379bb453163bd4f
This commit is contained in:
merge-script
2025-12-12 13:22:00 +00:00
11 changed files with 216 additions and 48 deletions

View File

@@ -1006,14 +1006,14 @@ bool BlockManager::ReadBlock(CBlock& block, const FlatFilePos& pos, const std::o
block.SetNull();
// Open history file to read
std::vector<std::byte> block_data;
if (!ReadRawBlock(block_data, pos)) {
const auto block_data{ReadRawBlock(pos)};
if (!block_data) {
return false;
}
try {
// Read block
SpanReader{block_data} >> TX_WITH_WITNESS(block);
SpanReader{*block_data} >> TX_WITH_WITNESS(block);
} catch (const std::exception& e) {
LogError("Deserialize or I/O error - %s at %s while reading block", e.what(), pos.ToString());
return false;
@@ -1048,19 +1048,19 @@ bool BlockManager::ReadBlock(CBlock& block, const CBlockIndex& index) const
return ReadBlock(block, block_pos, index.GetBlockHash());
}
bool BlockManager::ReadRawBlock(std::vector<std::byte>& block, const FlatFilePos& pos) const
BlockManager::ReadRawBlockResult BlockManager::ReadRawBlock(const FlatFilePos& pos, std::optional<std::pair<size_t, size_t>> block_part) const
{
if (pos.nPos < STORAGE_HEADER_BYTES) {
// If nPos is less than STORAGE_HEADER_BYTES, we can't read the header that precedes the block data
// This would cause an unsigned integer underflow when trying to position the file cursor
// This can happen after pruning or default constructed positions
LogError("Failed for %s while reading raw block storage header", pos.ToString());
return false;
return util::Unexpected{ReadRawError::IO};
}
AutoFile filein{OpenBlockFile({pos.nFile, pos.nPos - STORAGE_HEADER_BYTES}, /*fReadOnly=*/true)};
if (filein.IsNull()) {
LogError("OpenBlockFile failed for %s while reading raw block", pos.ToString());
return false;
return util::Unexpected{ReadRawError::IO};
}
try {
@@ -1072,23 +1072,31 @@ bool BlockManager::ReadRawBlock(std::vector<std::byte>& block, const FlatFilePos
if (blk_start != GetParams().MessageStart()) {
LogError("Block magic mismatch for %s: %s versus expected %s while reading raw block",
pos.ToString(), HexStr(blk_start), HexStr(GetParams().MessageStart()));
return false;
return util::Unexpected{ReadRawError::IO};
}
if (blk_size > MAX_SIZE) {
LogError("Block data is larger than maximum deserialization size for %s: %s versus %s while reading raw block",
pos.ToString(), blk_size, MAX_SIZE);
return false;
return util::Unexpected{ReadRawError::IO};
}
block.resize(blk_size); // Zeroing of memory is intentional here
filein.read(block);
if (block_part) {
const auto [offset, size]{*block_part};
if (size == 0 || offset >= blk_size || size > blk_size - offset) {
return util::Unexpected{ReadRawError::BadPartRange}; // Avoid logging - offset/size come from untrusted REST input
}
filein.seek(offset, SEEK_CUR);
blk_size = size;
}
std::vector<std::byte> data(blk_size); // Zeroing of memory is intentional here
filein.read(data);
return data;
} catch (const std::exception& e) {
LogError("Read from block file failed: %s for %s while reading raw block", e.what(), pos.ToString());
return false;
return util::Unexpected{ReadRawError::IO};
}
return true;
}
FlatFilePos BlockManager::WriteBlock(const CBlock& block, int nHeight)