This separates the existing fuzz test into:
* clusterlin_search_finder: establishes SearchCandidateFinder's correctness using the
simpler SimpleCandidateFinder.
* clusterlin_simple_finder: establishes SimpleCandidateFinder's correctness using the
(even) simpler ExhaustiveCandidateFinder.
rather than trying to do both at once.
Only count the number of actual new subsets added. If the queue contains
a work item that completely covers a component, no transaction can be added
to it without creating a disconnected component. In this case, also don't
count it as an iteration.
With this, the number of iterations performed by SimpleCandidateFinder is
bounded by the number of distinct connected topologically-valid subsets of
the cluster.
This can be reproduced according to the developer notes with something
like
( cd ./src/ && ../contrib/devtools/run-clang-tidy.py -p ../bld-cmake -fix -j $(nproc) )
Also, the header related changes were done manually.
This abstracts out the finding of the connected component that includes
a given element from FindConnectedComponent (which just finds any connected
component).
Use this in the txgraph fuzz test, which was effectively reimplementing this
logic. At the same time, improve its performance by replacing a vector with a
set.
Since cluster_linearize.h does not actually have a Cluster type anymore, it is more
appropriate to rename the index type to DepGraphIndex.
-BEGIN VERIFY SCRIPT-
sed -i 's/Data type to represent transaction indices in clusters./Data type to represent transaction indices in DepGraphs and the clusters they represent./' $(git grep -l 'using ClusterIndex')
sed -i 's|\<ClusterIndex\>|DepGraphIndex|g' $(git grep -l 'ClusterIndex')
-END VERIFY SCRIPT-
This function takes an existing ordering for transactions in a DepGraph, and
makes it a valid linearization for it (i.e., topological). Any topological
prefix of the input remains untouched.
This combines the clusterlin_add_dependency and clusterlin_cluster_serialization
fuzz tests into a single clusterlin_depgraph_sim fuzz test. This tests starts
from an empty DepGraph and performs a arbitrary number of AddTransaction,
AddDependencies, and RemoveTransactions operations on it, and compares the
resulting state with a naive reimplementation.
This commits introduces support in DepGraph for the transaction positions to be
non-continuous. Specifically, it adds:
* DepGraph::RemoveTransactions which removes 0 or more positions from a DepGraph.
* DepGraph::Positions() to get a set of which positions are in use.
* DepGraph::PositionRange() to get the highest used position in a DepGraph + 1.
In addition, it extends the DepGraphFormatter format to support holes in a
compatible way (it serializes non-holey DepGraphs identically to the old code,
and deserializes them the same way)
This changes DepGraph::AddDependency into DepGraph::AddDependencies, which takes
in a single child, but a set of parent transactions, making them all dependencies
at once.
This is important for performance. N transactions can have O(N^2) parents combined,
so constructing a full DepGraph using just AddDependency (which is O(N) on its own)
could take O(N^3) time, while doing the same with AddDependencies (also O(N) on its
own) only takes O(N^2).
Notably, this matters for DepGraphFormatter::Unser, which goes from O(N^3) to O(N^2).
Co-Authored-By: Greg Sanders <gsanders87@gmail.com>
A fuzz test already relies on these operations, and a future commit will need
the same logic too. Therefore, abstract them out into proper member functions,
with proper testing.
Automatically add topologically-valid subsets of the potential set pot
to inc. It can be proven that these must be part of the best reachable
topologically-valid set from that work item.
This is a crucial optimization that (apparently) reduces the maximum
number of iterations from ~2^(N-1) to ~sqrt(2^N).
Co-Authored-By: Suhas Daftuar <sdaftuar@gmail.com>
In each work item, keep track of a conservative overestimate of the best
possible feerate that can be reached from it, and then use these to avoid
exploring hopeless work items.
Before this commit, the worst case for linearization involves clusters which
break apart in several smaller components after the first candidate is
included in the output linearization.
Address this by never considering work items that span multiple components
of what remains of the cluster.
It encapsulates a given linearization in chunked form, permitting arbitrary
subsets of transactions to be removed from the linearization. Its purpose
is adding the Intersect function, which is a crucial operation that will
be used in a further commit to make Linearize improve existing linearizations.
This adds a first version of the overall linearization interface, which given
a DepGraph constructs a good linearization, by incrementally including good
candidate sets (found using AncestorCandidateFinder and SearchCandidateFinder).
This introduces a bespoke fuzzing-focused serialization format for DepGraphs,
and then tests that this format can represent any graph, roundtrips, and then
uses that to test the correctness of DepGraph itself.
This forms the basis for future fuzz tests that need to work with interesting
graphs.