The msgStream's backpressure queue previously used a drop predicate that
always returned false, meaning messages were never dropped based on
queue length.
This commit introduces a new drop predicate mechanism for the msgStream
queue, controlled by build tags.
For non-integration builds, the predicate combines a type-based check
with Random Early Detection (RED):
- Certain critical message types (`lnwire.LinkUpdater`,
`lnwire.AnnounceSignatures1`) are marked as protected and are never
dropped.
- For other message types, RED is applied based on the queue length,
using `redMinThreshold` and `redMaxThreshold` to determine the drop
probability.
For integration builds, the predicate always returns false, preserving
the previous behavior to avoid interfering with tests.
This change allows the msgStream queue to proactively drop less critical
messages under high load, preventing unbounded queue growth while
ensuring essential messages are prioritized.
In this commit, we replace the old condition variable based msgStream
with the new back pressure queue. The implementation details at this
abstraction level have been greatly simplified. For now we just pass a
predicate that'll never drop the incoming packets.
In this commit, we add a new type of queue: the back pressure queue.
This is a bounded queue based on a simple channel, that will consult a
predicate to decide if we should preemptively drop a message or not.
We then provide a sample predicate for this use case, based on random
early dropping. Given a min and max threshold, we'll start to drop
message randomly once we get past the min threshold, ramping up to the
max threshold where we'll start to always drop the message.
The `lncli listchannels` command was changed and might cause breaking
changes for people relying on the `chan_id` return value in their
automation scripts.
In this commit, we add a new CLI option to control if we D/C on slow
pongs or not. Due to the existence of head-of-the-line blocking at
various levels of abstraction (app buffer, slow processing, TCP kernel
buffers, etc), if there's a flurry of gossip messages (eg: 1K channel
updates), then even with a reasonable processing latency, a peer may
still not read our ping in time.
To give users another option, we add a flag that allows users to disable
this behavior. The default remains.
In this commit, we ensure that any topology update is forced to go via
the `handleTopologySubscriptions` handler so that client subscriptions
and updates are handled correctly and in the correct order.
This removes a bug that could result from a client missing a
notification about a channel being closed if the client is subscribed
and shortly after, `PruneGraph` is called which would notify all
subscribed clients and possibly do so before the client subscription has
actually been persisted.
We remove the mutex that was previously held between DB calls and calls
that update the graphCache. This is done so that the underlying DB calls
can make use of any batch requests which they currently cannot since the
mutex prevents multiple requests from calling the methods at once.
The reason the cacheMu was originally added here was during a code
refactor that moved the `graphCache` out of the `KVStore` and into the
`ChannelGraph` and the aim was then to have a best effort way of
ensuring that updates to the DB and updates to the graphCache were as
consistent/atomic as possible.
In this commit, we update the `tlv` package version which includes type
constraints on the `tlv.SizeBigSize` method parameter. This exposes a
bug in the MilliSatoshi Record method which is fixed here.
This was not caught in tests before since currently only
our TLV encoding code makes use of this SizeFunc (so we would write 0
size to disk) but then when we read the bytes from disk and decode, we
dont use the SizeFunc and our MilliSatoshi decode method makes direct
use of the `tlv.DBigSize` function which _currently does not make use of
the `l` length variable passed to it_. So it currently does correctly
read the data.
It can take some time to unmarshal large mission control data sets such
that the macaroon can expire during that phase. We postpone the
connection establishment to give the user more time to answer the
prompt.
Mission control may have outdated success/failure amounts for node pairs
that have channels with differing capacities. In that case we assume to
still find the liquidity as before and rescale the amounts to the
according range.
We skip the evaluation of probabilities when the amount is lower than
the last success amount, as the probability would be evaluated to 1 in
that case.
If the success and fail amounts indicate that a channel doesn't obey a
bimodal distribution, we fall back to a uniform/linear success
probability model. This also helps to avoid numerical normalization
issues with the bimodal model.
This is achieved by adding a very small summand to the balance
distribution P(x) ~ exp(-x/s) + exp((x-c)/s), 1/c that helps to
regularize the probability distribution. The distribution becomes finite
for intermediate balances where the exponentials would be evaluated to
an exact zero (float) otherwise. This regularization is effective in
edge cases and leads to falling back to a uniform model should the
bimodal model fail.
This affects the normalization to be s * (-2 * exp(-c/s) + 2 + 1/s) and
the primitive function to receive an extra term x/(cs).
The previously added fuzz seed is expected to be resolved with this.
This test demonstrates an error found in a fuzz test by adding a
previously found seed, which will be fixed in an upcoming commit.
The following fuzz test is expected to fail:
go test -v -fuzz=Prob ./routing/
The bimodal model doesn't depend on the unit, which is why updating to
more realistic values doesn't require changes in tests.
We pin the scale in the fuzz test to not invalidate the corpus.