Merge bitcoin/bitcoin#16981: Improve runtime performance of --reindex

db929893ef Faster -reindex by initially deserializing only headers (Larry Ruane)
c72de9990a util: add CBufferedFile::SkipTo() to move ahead in the stream (Larry Ruane)
48a68908ba Add LoadExternalBlockFile() benchmark (Larry Ruane)

Pull request description:

  ### Background
  During the first part of reindexing, `LoadExternalBlockFile()` sequentially reads raw blocks from the `blocks/blk00nnn.dat` files (rather than receiving them from peers, as with initial block download) and eventually adds all of them to the block index. When an individual block is initially read, it can't be immediately added unless all its ancestors have been added, which is rare (only about 8% of the time), because the blocks are not sorted by height. When the block can't be immediately added to the block index, its disk location is saved in a map so it can be added later. When its parent is later added to the block index, `LoadExternalBlockFile()` reads and deserializes the block from disk a second time and adds it to the block index. Most blocks (92%) get deserialized twice.

  ### This PR
  During the initial read, it's rarely useful to deserialize the entire block; only the header is needed to determine if the block can be added to the block index immediately. This change to `LoadExternalBlockFile()` initially deserializes only a block's header, then deserializes the entire block only if it can be added immediately. This reduces reindex time on mainnet by 7 hours on a Raspberry Pi, which translates to around a 25% reduction in the first part of reindexing (adding blocks to the index), and about a 6% reduction in overall reindex time.

  Summary: The performance gain is the result of deserializing each block only once, except its header which is deserialized twice, but the header is only 80 bytes.

ACKs for top commit:
  andrewtoth:
    ACK db929893ef
  achow101:
    ACK db929893ef
  aureleoules:
    ACK db929893ef - minor changes and new benchmark since last review
  theStack:
    re-ACK db929893ef
  stickies-v:
    re-ACK db929893e

Tree-SHA512: 5a5377192c11edb5b662e18f511c9beb8f250bc88aeadf2f404c92c3232a7617bade50477ebf16c0602b9bd3b68306d3ee7615de58acfd8cae664d28bb7b0136
This commit is contained in:
Andrew Chow
2022-11-15 19:07:35 -05:00
6 changed files with 238 additions and 37 deletions

View File

@@ -4389,6 +4389,8 @@ void Chainstate::LoadExternalBlockFile(
try {
// This takes over fileIn and calls fclose() on it in the CBufferedFile destructor
CBufferedFile blkdat(fileIn, 2*MAX_BLOCK_SERIALIZED_SIZE, MAX_BLOCK_SERIALIZED_SIZE+8, SER_DISK, CLIENT_VERSION);
// nRewind indicates where to resume scanning in case something goes wrong,
// such as a block fails to deserialize.
uint64_t nRewind = blkdat.GetPos();
while (!blkdat.eof()) {
if (ShutdownRequested()) return;
@@ -4412,28 +4414,30 @@ void Chainstate::LoadExternalBlockFile(
continue;
} catch (const std::exception&) {
// no valid block header found; don't complain
// (this happens at the end of every blk.dat file)
break;
}
try {
// read block
uint64_t nBlockPos = blkdat.GetPos();
// read block header
const uint64_t nBlockPos{blkdat.GetPos()};
if (dbp)
dbp->nPos = nBlockPos;
blkdat.SetLimit(nBlockPos + nSize);
std::shared_ptr<CBlock> pblock = std::make_shared<CBlock>();
CBlock& block = *pblock;
blkdat >> block;
nRewind = blkdat.GetPos();
uint256 hash = block.GetHash();
CBlockHeader header;
blkdat >> header;
const uint256 hash{header.GetHash()};
// Skip the rest of this block (this may read from disk into memory); position to the marker before the
// next block, but it's still possible to rewind to the start of the current block (without a disk read).
nRewind = nBlockPos + nSize;
blkdat.SkipTo(nRewind);
{
LOCK(cs_main);
// detect out of order blocks, and store them for later
if (hash != params.GetConsensus().hashGenesisBlock && !m_blockman.LookupBlockIndex(block.hashPrevBlock)) {
if (hash != params.GetConsensus().hashGenesisBlock && !m_blockman.LookupBlockIndex(header.hashPrevBlock)) {
LogPrint(BCLog::REINDEX, "%s: Out of order block %s, parent %s not known\n", __func__, hash.ToString(),
block.hashPrevBlock.ToString());
header.hashPrevBlock.ToString());
if (dbp && blocks_with_unknown_parent) {
blocks_with_unknown_parent->emplace(block.hashPrevBlock, *dbp);
blocks_with_unknown_parent->emplace(header.hashPrevBlock, *dbp);
}
continue;
}
@@ -4441,13 +4445,19 @@ void Chainstate::LoadExternalBlockFile(
// process in case the block isn't known yet
const CBlockIndex* pindex = m_blockman.LookupBlockIndex(hash);
if (!pindex || (pindex->nStatus & BLOCK_HAVE_DATA) == 0) {
BlockValidationState state;
if (AcceptBlock(pblock, state, nullptr, true, dbp, nullptr, true)) {
nLoaded++;
}
if (state.IsError()) {
break;
}
// This block can be processed immediately; rewind to its start, read and deserialize it.
blkdat.SetPos(nBlockPos);
std::shared_ptr<CBlock> pblock{std::make_shared<CBlock>()};
blkdat >> *pblock;
nRewind = blkdat.GetPos();
BlockValidationState state;
if (AcceptBlock(pblock, state, nullptr, true, dbp, nullptr, true)) {
nLoaded++;
}
if (state.IsError()) {
break;
}
} else if (hash != params.GetConsensus().hashGenesisBlock && pindex->nHeight % 1000 == 0) {
LogPrint(BCLog::REINDEX, "Block Import: already had block %s at height %d\n", hash.ToString(), pindex->nHeight);
}