Do merkle root and txid duplicates check simultaneously

Move the txid duplicates check into BuildMerkleTree, where it can be done much more efficiently (without needing to build a full txid set to detect duplicates). The previous version (using the std::set<uint256> to detect duplicates) was also slightly too weak. A block mined with actual duplicate transactions (which is invalid, due to the inputs of the duplicated transactions being seen as double spends) would trigger the duplicates logic, resulting in the block not being stored on disk, and rerequested. This change fixes that by only triggering in the case of duplicated transactions that can actually result in an identical merkle root.
2025-11-12 23:18:14 +01:00 · 2014-09-16 00:30:05 +02:00
parent 7a04f3d708
commit 584a358997
3 changed files with 57 additions and 17 deletions
--- a/src/core.cpp
+++ b/src/core.cpp
@@ -224,29 +224,66 @@ uint256 CBlockHeader::GetHash() const
    return Hash(BEGIN(nVersion), END(nNonce));
 }

-uint256 CBlock::BuildMerkleTree() const
+uint256 CBlock::BuildMerkleTree(bool* fMutated) const
 {
-    // WARNING! If you're reading this because you're learning about crypto
-    // and/or designing a new system that will use merkle trees, keep in mind
-    // that the following merkle tree algorithm has a serious flaw related to
-    // duplicate txids, resulting in a vulnerability. (CVE-2012-2459) Bitcoin
-    // has since worked around the flaw, but for new applications you should
-    // use something different; don't just copy-and-paste this code without
-    // understanding the problem first.
+    /* WARNING! If you're reading this because you're learning about crypto
+       and/or designing a new system that will use merkle trees, keep in mind
+       that the following merkle tree algorithm has a serious flaw related to
+       duplicate txids, resulting in a vulnerability (CVE-2012-2459).
+
+       The reason is that if the number of hashes in the list at a given time
+       is odd, the last one is duplicated before computing the next level (which
+       is unusual in Merkle trees). This results in certain sequences of
+       transactions leading to the same merkle root. For example, these two
+       trees:
+
+                    A               A
+                  /  \            /   \
+                B     C         B       C
+               / \    |        / \     / \
+              D   E   F       D   E   F   F
+             / \ / \ / \     / \ / \ / \ / \
+             1 2 3 4 5 6     1 2 3 4 5 6 5 6
+
+       for transaction lists [1,2,3,4,5,6] and [1,2,3,4,5,6,5,6] (where 5 and
+       6 are repeated) result in the same root hash A (because the hash of both
+       of (F) and (F,F) is C).
+
+       The vulnerability results from being able to send a block with such a
+       transaction list, with the same merkle root, and the same block hash as
+       the original without duplication, resulting in failed validation. If the
+       receiving node proceeds to mark that block as permanently invalid
+       however, it will fail to accept further unmodified (and thus potentially
+       valid) versions of the same block. We defend against this by detecting
+       the case where we would hash two identical hashes at the end of the list
+       together, and treating that identically to the block having an invalid
+       merkle root. Assuming no double-SHA256 collisions, this will detect all
+       known ways of changing the transactions without affecting the merkle
+       root.
+    */
    vMerkleTree.clear();
+    vMerkleTree.reserve(vtx.size() * 2 + 16); // Safe upper bound for the number of total nodes.
    BOOST_FOREACH(const CTransaction& tx, vtx)
        vMerkleTree.push_back(tx.GetHash());
    int j = 0;
+    bool mutated = false;
    for (int nSize = vtx.size(); nSize > 1; nSize = (nSize + 1) / 2)
    {
        for (int i = 0; i < nSize; i += 2)
        {
            int i2 = std::min(i+1, nSize-1);
+            if (i2 == i + 1 && i2 + 1 == nSize && vMerkleTree[j+i] == vMerkleTree[j+i2]) {
+                // Two identical hashes at the end of the list at a particular level.
+                mutated = true;
+            }
            vMerkleTree.push_back(Hash(BEGIN(vMerkleTree[j+i]),  END(vMerkleTree[j+i]),
                                       BEGIN(vMerkleTree[j+i2]), END(vMerkleTree[j+i2])));
        }
        j += nSize;
    }
+    if (fMutated) {
+        *fMutated = mutated;
+    }
    return (vMerkleTree.empty() ? 0 : vMerkleTree.back());
 }