In this commit, we add a new atomic bool to only permit a single gossip
backlog goroutine per peer. If we get a new reuqest that needs a backlog
while we're still processing the other, then we'll drop that request.
In this commit, we complete the integration of the asynchronous
timestamp range queue by modifying ProcessRemoteAnnouncement to use
the new queuing mechanism instead of calling ApplyGossipFilter
synchronously.
This change ensures that when a peer sends a GossipTimestampRange
message, it is queued for asynchronous processing rather than
blocking the gossiper's main message processing loop. The modification
prevents the peer's readHandler from blocking on potentially slow
gossip filter operations, maintaining connection stability during
periods of high synchronization activity.
If the queue is full when attempting to enqueue a message, we log
a warning but return success to prevent peer disconnection. This
design choice prioritizes connection stability over guaranteed
delivery of every gossip filter request, which is acceptable since
peers can always resend timestamp range messages if needed.
In this commit, we introduce an asynchronous processing queue for
GossipTimestampRange messages in the GossipSyncer. This change addresses
a critical issue where the gossiper could block indefinitely when
processing timestamp range messages during periods of high load.
Previously, when a peer sent a GossipTimestampRange message, the
gossiper would synchronously call ApplyGossipFilter, which could block
on semaphore acquisition, database queries, and rate limiting. This
synchronous processing created a bottleneck where the entire peer
message processing pipeline would stall, potentially causing timeouts
and disconnections.
The new design adds a timestampRangeQueue channel with a capacity of 1
message and a dedicated goroutine for processing these messages
asynchronously. This follows the established pattern used for other
message types in the syncer. When the queue is full, we drop messages
and log a warning rather than blocking indefinitely, providing graceful
degradation under extreme load conditions.
In this commit, we make the gossip filter semaphore capacity configurable
through a new FilterConcurrency field. This change allows node operators
to tune the number of concurrent gossip filter applications based on
their node's resources and network position.
The previous hard-coded limit of 5 concurrent filter applications could
become a bottleneck when multiple peers attempt to synchronize
simultaneously. By making this value configurable via the new
gossip.filter-concurrency option, operators can increase this limit
for better performance on well-resourced nodes or maintain conservative
values on resource-constrained systems.
We keep the default value at 5 to maintain backward compatibility and
avoid unexpected resource usage increases for existing deployments. The
sample configuration file is updated to document this new option.
In this commit, we move the serialisation details of a channel's
features to the DB layer and change the `models` field to instead use a
more useful `*lnwire.FeatureVector` type.
This makes the features easier to work with and moves the serialisation
to where it is actually used.
Create an abstract hashAccumulator interface for the Channel graph
bootstrapper so that we can later introduce a deterministic accumulator
to be used during testing.
If a method returns an error, we should assume all other parameters to
be nil unless the documentation explicitly says otherwise. So here, we
fix a log line where a dereference is made to an object that will be nil
due to an error being returned.
Replace all usages of the "github.com/go-errors/errors" and
"github.com/pkg/errors" packages with the standard lib's "errors"
package. This ensures that error wrapping and `errors.Is` checks will
work as expected.
Remove the previously added TODOs which would extract InboundFee info
from the ExtraOpaqueData of a ChannelUpdate at the time of
ChannelEdgePolicy construction. These can now be replaced by using the
newly added InboundFee record on the ChannelUpdate message.
In this commit, we make sure to set the new field wherever appropriate.
This will be any place where the ChannelEdgePolicy is constructed other
than its disk deserialisation.
We add logging to we can draw conclusions how long the processing
of gossip message last and potentially see whether the syncer
buffer channel size is a bottleneck in processing.
The `GraphSource` interface in the `autopilot` package is directly
implemented by the `graphdb.KVStore` and so we will eventually thread
contexts through to this interface. So in this commit, we start updating
the autopilot system to thread contexts through in preparation for
passing the context through to any calls made to the GraphSource.
Two context.TODOs are added here which will be addressed in follow up
commits.
For any method that takes a context that has a select that listens on
the systems quit channel, we should also listen on the ctx since we
should not need to worry about if this context is derived internally or
externally.
With this, we move a context.TODO() out of the gossiper and into the
brontide package - this will be removed in a future PR which focuses on
threading contexts through that code.
The `GossiperSyncer` makes various calls to the `ChannelGraphTimeSeries`
interface which threads through to the graph DB. So in preparation for
threading context through to all the methods on that interface, we
update the GossipSyncer accordingly by passing contexts through.
Two `context.TODO()`s are added in this commit. They will be removed in
the upcoming commits.
Pass the parent LND context to the gossiper, let it derive a child
context that gets cancelled on Stop. Pass the context through to any
methods that will eventually thread it through to any graph DB calls.
One `context.TODO()` is added here - this will be removed in the next
commit.
NOTE: for any internal methods that the context gets passed to, if those
methods already listen on the gossiper's `quit` channel, then then don't
need to also listen on the passed context's Done() channel because the
quit channel is closed at the same time that the context is cancelled.
In this commit, we revamp the old message based rate limiting. First, we
move to meter by bytes/s instead of messages/s. The old logic had an
error in that it limited groups of message replies, instead of each
message. With this new approach, we'll use the newly added
SerializedSize method to implement fine grained bandwidth metering.
We need to pick two values, the burst rate, and the msg bytes rate. The
burst rate is the max amt that can be sent in a given period of time. We
need to set this above 65 KB, or the max msg limit, otherwise no
messages can be sent. The bucket starts with this many tokens (bytes).
As those are depleted, the amount of tokens is refilled at the msg
bytes rate.
As conservative values, we've chosen 200 KB as the burst rate, and 100
KB/s as the limit.
Here we introduce the access manager which has caches that will
determine the access control status of our peers. Peers that have
had their funding transaction confirm with us are protected. Peers
that only have pending-open channels with us are temporary access
and can have their access revoked. The rest of the peers are granted
restricted access.
Previously, we would set the state of the syncer after sending the msg,
which has the following flow,
1. In state `queryNewChannels`, we send the msg `QueryShortChanIDs`.
2. Once the msg is sent, we change to state `waitingQueryChanReply`.
But there's no guarantee the remote won't reply back inbetween the two
step. When that happens, our syncer would still be in state
`queryNewChannels`, causing the following error,
```
[ERR] DISC gossiper.go:873: Process query msg from peer [Alice] got unexpected msg *lnwire.ReplyShortChanIDsEnd received in state queryNewChannels
```
To fix it, we now make sure the state is updated before sending the msg.
This commit adds a test to demonstrate that if we receive two identical
updates (which can happen if we get the same update from two peers in
quick succession), then our rate limiting logic will be hit early as
both updates might be counted towards the rate limit. This will be fixed
in an upcoming commit.
This commit is a pure refactor. We move the transaction validation
(existence, spentness, correctness) from the `graph.Builder` to the
gossiper since this is where all protocol level checks should happen.
All tests involved are also updated/moved.
As we move the funding transaction validation logic out of the builder
and into the gossiper, we want to ensure that the behaviour stays
consistent with what we have today. So we should aquire this lock before
performing any expensive checks such as building the funding tx or
valdating it.
In preparation for an upcoming commit which will move all channel
funding tx validation to the gossiper, we first move the helper method
which helps build the expected funding transaction script based on the
fields in the channel announcement. We will still want this script later
on in the builder for updating the ChainView though, and so we pass this
field along with the ChannelEdgeInfo. With this change, we can remove
the TapscriptRoot field from the ChannelEdgeInfo since the only reason
it was there was so that the builder could reconstruct the full funding
script.
Here, we add a new fundingTxOption modifier which will configure how we
set-up expected calls to the mock Chain once we have moved funding tx
logic to the gossiper. Note that in this commit, these modifiers don't
yet do anything.
This is in preparation for the commit where we move across all the
funding tx validation so that we can test that we are correctly updating
the zombie index.