Commit Graph

52 Commits

Author SHA1 Message Date
Pieter Wuille
91399a7912 clusterlin: remove unused MergeLinearizations (cleanup)
This ended up never being used in txgraph.
2025-12-18 16:01:31 -05:00
Pieter Wuille
13aad26b78 clusterlin: randomize various decisions in SFL (feature)
This introduces a local RNG inside the SFL state, which is used to randomize
various decisions inside the algorithm, in order to make it hard to create
pathological clusters which predictably have bad performance.

The decisions being randomized are:
* When deciding what chunk to attempt to split, the queue order is
  randomized.
* When deciding which dependency to split on, a uniformly random one is
  chosen among those with higher top feerate than bottom feerate within
  the chosen chunk.
* When deciding which chunks to merge, a uniformly random one among those
  with the higher feerate difference is picked.
* When merging two chunks, a uniformly random dependency between them is
  now activated.
* When making the state topological, the queue of chunks to process is
  randomized.
2025-12-18 16:01:31 -05:00
Pieter Wuille
ddbfa4dfac clusterlin: keep FIFO queue of improvable chunks (preparation)
This introduces a queue of chunks that still need processing, in both
MakeTopological() and OptimizationStep(). This is simultaneously:
* A preparation for introducing randomization, by allowing permuting the
  queue.
* An improvement to the fairness of suboptimal solutions, by distributing
  the work more fairly over chunks.
* An optimization, by avoiding retrying chunks over and over again which
  are already known to be optimal.
2025-12-18 16:01:31 -05:00
Pieter Wuille
3efc94d656 clusterlin: replace cluster linearization with SFL (feature)
This replaces the existing LIMO linearization algorithm (which internally uses
ancestor set finding and candidate set finding) with the much more performant
spanning-forest linearization algorithm.

This removes the old candidate-set search algorithm, and several of its tests,
benchmarks, and needed utility code.

The worst case time per cost is similar to the previous algorithm, so
ACCEPTABLE_ITERS is unchanged.
2025-12-18 16:01:31 -05:00
Pieter Wuille
6a8fa821b8 clusterlin: add support for loading existing linearization (feature) 2025-12-18 16:01:22 -05:00
Pieter Wuille
da48ed9f34 clusterlin: ReadLinearization for non-topological (tests)
Rather than using an ad-hoc no-dependency copy of the graph when a potentially
non-topological linearization is needed in the clusterlin fuzz test, add this
directly as a feature in ReadLinearization().

This is preparation for a later commit where another use for such a function
is added.
2025-12-18 15:49:07 -05:00
Pieter Wuille
c461259fb6 clusterlin: add class implementing SFL state (preparation)
This adds a data structure representing the optimization state for the spanning-forest
linearization algorithm (SFL), plus a fuzz test for its correctness.

This is preparation for switching over Linearize() to use this algorithm.

See https://delvingbitcoin.org/t/spanning-forest-cluster-linearization/1419 for
a description of the algorithm.
2025-12-18 15:49:01 -05:00
marcofleon
a70a14a3f4 refactor: Separate out logic for building a tree-shaped dependency graph 2025-12-12 16:09:53 +01:00
marcofleon
ce29d7d626 fuzz: Fix variable in clusterlin_postlinearize_tree check
The test intends to verify that running `PostLinearize` a
second time on a tree-structured graph doesn't change the
result. But `PostLinearize` was being called on the original
variable, not the copy. So the check was comparing the
unmodified copy against itself, which is useless.

Fix by post-linearizing the correct variable.
2025-12-12 15:04:10 +00:00
marcofleon
876e2849b4 fuzz: Fix incorrect loop bounds in clusterlin_postlinearize_tree
The dependency graphs generated by this test can have holes
(unused indices) in them. This means some of the transactions
were skipped when using `depgraph_gen.TxCount()` as the upper
bound of the loop. Switch to using `depgraph.Positions()` to
correctly handle sparse graphs.
2025-12-12 15:02:26 +00:00
Pieter Wuille
bb5cb222ae depgraph: add memory usage control (feature)
Co-Authored-By: Lőrinc <pap.lorinc@gmail.com>
2025-10-11 17:25:09 -04:00
merge-script
2f410ad78c Merge bitcoin/bitcoin#32263: cluster mempool: add TxGraph work controls
62ed1f92ef txgraph: check that DoWork finds optimal if given high budget (tests) (Pieter Wuille)
f3c2fc867f txgraph: add work limit to DoWork(), try optimal (feature) (Pieter Wuille)
e96b00d99e txgraph: make number of acceptable iterations configurable (feature) (Pieter Wuille)
cfe9958852 txgraph: track amount of work done in linearization (preparation) (Pieter Wuille)
6ba316eaa0 txgraph: 1-or-2-tx split-off clusters are optimal (optimization) (Pieter Wuille)
fad0eb091e txgraph: reset quality when merging clusters (bugfix) (Pieter Wuille)

Pull request description:

  Part of #30289. Builds on top of #31553.

  So far, the `TxGraph::DoWork()` function took no parameters, and just made all clusters reach the "acceptable" internal quality level by performing a minimum number of improvement iterations on it, but:
  * Did not attempt to go beyond that.
  * Was broken, as the QualityLevel of optimal clusters that merge together was not being reset.

  Fix this by adding an argument to `DoWork()` to control how much work it is allowed to do right now, which will first be used to get all clusters to the acceptable level, and if more budget remains, use it to try to get some or all clusters optimal. The function will now return `true` if all clusters are known to be optimal (and thus no further work remains). This is verified in the tests, by remembering whether the graph is optimal, and if it is at the end of the simulation run, verify that the overall linearization cannot be improved further.

ACKs for top commit:
  instagibbs:
    ACK 62ed1f92ef
  ismaelsadeeq:
    Code review ACK 62ed1f92ef
  glozow:
    ACK 62ed1f92ef

Tree-SHA512: 5f57d4052e369f3444e72e724f04c02004e0f66e365faa59c9f145323e606508380fc97bb038b68783a62ae9c10757f1b628b3b00b2ce9a46161fca2d4336d73
2025-07-29 09:07:10 -04:00
ismaelsadeeq
31c4e77a25 test: fix ReadTopologicalSet unsigned integer overflow 2025-07-18 14:57:39 +01:00
Pieter Wuille
62ed1f92ef txgraph: check that DoWork finds optimal if given high budget (tests) 2025-07-14 10:37:00 -04:00
Pieter Wuille
cfe9958852 txgraph: track amount of work done in linearization (preparation) 2025-07-14 09:41:17 -04:00
Pieter Wuille
d7fca5c171 clusterlin: add big comment explaning the relation between tests 2025-06-14 18:35:33 -04:00
Pieter Wuille
b64e61d2de clusterlin: abstract try-permutations into ExhaustiveLinearize function
Rather than this exhaustive linearization check happening inline inside
clusterlin_simple_linearize, abstract it out into a Linearize()-like
function for clarity.

Note that this isn't exactly a refactor, because the old code would compare the
found linearization against all (valid) permutations, while the new code instead
first computes the best linearization from all valid permutations, and then
compares it with the found one.
2025-06-14 18:28:13 -04:00
Pieter Wuille
1fa55a64ed clusterlin tests: verify that chunks are minimal 2025-06-14 18:27:24 -04:00
Pieter Wuille
da23ecef29 clusterlin tests: support non-empty ReadTopologicalSubset()
In several call sites for ReadTopologicalSubset, a non-empty result is
expected, necessitating a special case at the call site for empty results.

Fix this by adding a bool non_empty argument, which does this special
casing (more efficiently) inside ReadTopologicalSubset itself.
2025-06-14 18:27:24 -04:00
Pieter Wuille
94f3e17c33 clusterlin tests: compare with fuzz-provided linearizations 2025-06-14 18:27:24 -04:00
Pieter Wuille
5f92ebee0d clusterlin tests: compare with fuzz-provided topological sets 2025-06-14 18:27:24 -04:00
Pieter Wuille
6e37824ac3 clusterlin tests: optimize clusterlin_simple_linearize
Whenever a non-topological permutation is encountered, fast forward to the
last permutation with the same non-topological prefix, skipping over
potentially many permutations that are non-topological for the same reason.

With that, increase the checking of all permutations to clusters of size 8
instead of 7.
2025-06-14 18:27:24 -04:00
Pieter Wuille
98c1c88b6f clusterlin tests: separate testing of SimpleLinearize and Linearize
The separates the existing fuzz test into:

* clusterlin_linearize: establishes the correctness of Linearize() using the
                        simpler SimpleLinearize() function.
* clusterlin_simple_linearize: establishes the correctness of SimpleLinearize() by
                        comparing with all valid linearizations computed by
                        std::next_permutation.

rather than combining the first two into a single fuzz test.
2025-06-14 18:27:24 -04:00
Pieter Wuille
10e90f7aef clusterlin tests: make SimpleCandidateFinder always find connected
Make a small change to guarantee that SimpleCandidateFinder only ever returns
connected solutions, even when non-optimal. Then test this property.
2025-06-14 18:27:24 -04:00
Pieter Wuille
a38c38951e clusterlin tests: separate testing of Search- and SimpleCandidateFinder
This separates the existing fuzz test into:

* clusterlin_search_finder: establishes SearchCandidateFinder's correctness using the
                            simpler SimpleCandidateFinder.
* clusterlin_simple_finder: establishes SimpleCandidateFinder's correctness using the
                            (even) simpler ExhaustiveCandidateFinder.

rather than trying to do both at once.
2025-06-14 18:27:24 -04:00
Pieter Wuille
77a432ee70 clusterlin tests: count SimpleCandidateFinder iterations better
Only count the number of actual new subsets added. If the queue contains
a work item that completely covers a component, no transaction can be added
to it without creating a disconnected component. In this case, also don't
count it as an iteration.

With this, the number of iterations performed by SimpleCandidateFinder is
bounded by the number of distinct connected topologically-valid subsets of
the cluster.
2025-06-14 18:27:24 -04:00
MarcoFalke
fa9ca13f35 refactor: Sort includes of touched source files 2025-06-03 19:56:55 +02:00
MarcoFalke
fae71d30f7 clang-tidy: Apply modernize-deprecated-headers
This can be reproduced according to the developer notes with something
like

( cd ./src/ && ../contrib/devtools/run-clang-tidy.py -p ../bld-cmake -fix -j $(nproc) )

Also, the header related changes were done manually.
2025-06-03 15:13:54 +02:00
Pieter Wuille
a52b53926b clusterlin: add GetConnectedComponent
This abstracts out the finding of the connected component that includes
a given element from FindConnectedComponent (which just finds any connected
component).

Use this in the txgraph fuzz test, which was effectively reimplementing this
logic. At the same time, improve its performance by replacing a vector with a
set.
2025-03-27 15:48:44 -04:00
Pieter Wuille
d449773899 scripted-diff: (refactor) ClusterIndex -> DepGraphIndex
Since cluster_linearize.h does not actually have a Cluster type anymore, it is more
appropriate to rename the index type to DepGraphIndex.

-BEGIN VERIFY SCRIPT-
sed -i 's/Data type to represent transaction indices in clusters./Data type to represent transaction indices in DepGraphs and the clusters they represent./' $(git grep -l 'using ClusterIndex')
sed -i 's|\<ClusterIndex\>|DepGraphIndex|g' $(git grep -l 'ClusterIndex')
-END VERIFY SCRIPT-
2025-03-24 09:34:54 -04:00
Pieter Wuille
bfeb69f6e0 clusterlin: Make IsAcyclic() a DepGraph member function
... instead of being a separate test-only function.

Also add a fuzz test for it returning false.
2025-03-24 09:34:54 -04:00
Pieter Wuille
0aa874a357 clusterlin: Add FixLinearization function + fuzz test
This function takes an existing ordering for transactions in a DepGraph, and
makes it a valid linearization for it (i.e., topological). Any topological
prefix of the input remains untouched.
2025-03-24 09:34:54 -04:00
Pieter Wuille
1c24c62510 clusterlin: merge two DepGraph fuzz tests into simulation test
This combines the clusterlin_add_dependency and clusterlin_cluster_serialization
fuzz tests into a single clusterlin_depgraph_sim fuzz test. This tests starts
from an empty DepGraph and performs a arbitrary number of AddTransaction,
AddDependencies, and RemoveTransactions operations on it, and compares the
resulting state with a naive reimplementation.
2024-10-07 13:49:36 -04:00
Pieter Wuille
0606e66fdb clusterlin: add DepGraph::RemoveTransactions and support for holes in DepGraph
This commits introduces support in DepGraph for the transaction positions to be
non-continuous. Specifically, it adds:
* DepGraph::RemoveTransactions which removes 0 or more positions from a DepGraph.
* DepGraph::Positions() to get a set of which positions are in use.
* DepGraph::PositionRange() to get the highest used position in a DepGraph + 1.

In addition, it extends the DepGraphFormatter format to support holes in a
compatible way (it serializes non-holey DepGraphs identically to the old code,
and deserializes them the same way)
2024-10-07 13:49:35 -04:00
Pieter Wuille
75b5d42419 clusterlin: make DepGraph::AddDependency support multiple dependencies at once
This changes DepGraph::AddDependency into DepGraph::AddDependencies, which takes
in a single child, but a set of parent transactions, making them all dependencies
at once.

This is important for performance. N transactions can have O(N^2) parents combined,
so constructing a full DepGraph using just AddDependency (which is O(N) on its own)
could take O(N^3) time, while doing the same with AddDependencies (also O(N) on its
own) only takes O(N^2).

Notably, this matters for DepGraphFormatter::Unser, which goes from O(N^3) to O(N^2).

Co-Authored-By: Greg Sanders <gsanders87@gmail.com>
2024-10-07 13:47:52 -04:00
Pieter Wuille
5901cf7100 clusterlin: abstract out DepGraph::GetReduced{Parents,Children}
A fuzz test already relies on these operations, and a future commit will need
the same logic too. Therefore, abstract them out into proper member functions,
with proper testing.
2024-10-07 13:46:48 -04:00
Pieter Wuille
9ad2fe7e69 clusterlin: only start/use search when enough iterations left 2024-09-12 15:15:36 -04:00
Pieter Wuille
71f2629398 clusterlin: include topological pot subsets automatically (optimization)
Automatically add topologically-valid subsets of the potential set pot
to inc. It can be proven that these must be part of the best reachable
topologically-valid set from that work item.

This is a crucial optimization that (apparently) reduces the maximum
number of iterations from ~2^(N-1) to ~sqrt(2^N).

Co-Authored-By: Suhas Daftuar <sdaftuar@gmail.com>
2024-09-12 15:15:36 -04:00
Pieter Wuille
2965fbf203 clusterlin: track upper bound potential set for work items (optimization)
In each work item, keep track of a conservative overestimate of the best
possible feerate that can be reached from it, and then use these to avoid
exploring hopeless work items.
2024-09-12 15:15:36 -04:00
Pieter Wuille
85a285a306 clusterlin: separate initial search entries per component (optimization)
Before this commit, the worst case for linearization involves clusters which
break apart in several smaller components after the first candidate is
included in the output linearization.

Address this by never considering work items that span multiple components
of what remains of the cluster.
2024-09-12 15:15:36 -04:00
Pieter Wuille
04d7a04ea4 clusterlin: add MergeLinearizations function + fuzz test + benchmark 2024-08-01 16:03:34 -04:00
Pieter Wuille
4f8958d756 clusterlin: add PostLinearize + benchmarks + fuzz tests 2024-08-01 16:02:09 -04:00
Pieter Wuille
0e2812d293 clusterlin: add algorithms for connectedness/connected components
Add utility functions to DepGraph for finding connected components.
2024-08-01 15:43:59 -04:00
Pieter Wuille
0e52728a2d clusterlin: rename Intersect -> IntersectPrefixes
This makes it clearer what the function does.
2024-08-01 14:07:54 -04:00
Pieter Wuille
28549791b3 clusterlin: permit passing in existing linearization to Linearize
This implements the LIMO algorithm for linearizing by improving an existing
linearization. See
https://delvingbitcoin.org/t/limo-combining-the-best-parts-of-linearization-search-and-merging
for details.
2024-07-25 10:16:40 -04:00
Pieter Wuille
97d98718b0 clusterlin: add LinearizationChunking class
It encapsulates a given linearization in chunked form, permitting arbitrary
subsets of transactions to be removed from the linearization. Its purpose
is adding the Intersect function, which is a crucial operation that will
be used in a further commit to make Linearize improve existing linearizations.
2024-07-25 10:16:40 -04:00
Pieter Wuille
d5918dc3c6 clusterlin: randomize the SearchCandidateFinder search order
To make search non-deterministic, change the BFS logic from always picking
the first queue item to randomly picking the first or second queue item.
2024-07-25 10:16:40 -04:00
Pieter Wuille
46aad9b099 clusterlin: add Linearize function
This adds a first version of the overall linearization interface, which given
a DepGraph constructs a good linearization, by incrementally including good
candidate sets (found using AncestorCandidateFinder and SearchCandidateFinder).
2024-07-25 10:16:37 -04:00
Pieter Wuille
ee0ddfe4f6 clusterlin: add chunking algorithm
A fuzz test is added which verifies various of its expected properties, including
correctness
2024-07-25 10:16:37 -04:00
Pieter Wuille
2a41f151af clusterlin: add SearchCandidateFinder class
Similar to AncestorCandidateFinder, this encapsulates the state needed for
finding good candidate sets using a search algorithm.
2024-07-25 10:16:37 -04:00