Commit Graph

539 Commits

Author SHA1 Message Date
Olaoluwa Osuntokun
ce4fdd3117 discovery: only permit a single gossip backlog goroutine per peer
In this commit, we add a new atomic bool to only permit a single gossip
backlog goroutine per peer. If we get a new reuqest that needs a backlog
while we're still processing the other, then we'll drop that request.
2025-08-06 11:34:43 +02:00
Olaoluwa Osuntokun
bb5825387e discovery: add tests for for async timestamp range queue 2025-08-06 11:34:42 +02:00
Olaoluwa Osuntokun
8eda486227 discovery: integrate async queue in ProcessRemoteAnnouncement
In this commit, we complete the integration of the asynchronous
timestamp range queue by modifying ProcessRemoteAnnouncement to use
the new queuing mechanism instead of calling ApplyGossipFilter
synchronously.

This change ensures that when a peer sends a GossipTimestampRange
message, it is queued for asynchronous processing rather than
blocking the gossiper's main message processing loop. The modification
prevents the peer's readHandler from blocking on potentially slow
gossip filter operations, maintaining connection stability during
periods of high synchronization activity.

If the queue is full when attempting to enqueue a message, we log
a warning but return success to prevent peer disconnection. This
design choice prioritizes connection stability over guaranteed
delivery of every gossip filter request, which is acceptable since
peers can always resend timestamp range messages if needed.
2025-08-06 11:34:42 +02:00
Olaoluwa Osuntokun
80e0ea0d40 discovery: add async timestamp range queue to prevent blocking
In this commit, we introduce an asynchronous processing queue for
GossipTimestampRange messages in the GossipSyncer. This change addresses
a critical issue where the gossiper could block indefinitely when
processing timestamp range messages during periods of high load.

Previously, when a peer sent a GossipTimestampRange message, the
gossiper would synchronously call ApplyGossipFilter, which could block
on semaphore acquisition, database queries, and rate limiting. This
synchronous processing created a bottleneck where the entire peer
message processing pipeline would stall, potentially causing timeouts
and disconnections.

The new design adds a timestampRangeQueue channel with a capacity of 1
message and a dedicated goroutine for processing these messages
asynchronously. This follows the established pattern used for other
message types in the syncer. When the queue is full, we drop messages
and log a warning rather than blocking indefinitely, providing graceful
degradation under extreme load conditions.
2025-08-06 11:34:42 +02:00
Olaoluwa Osuntokun
57872b9cff discovery: make gossip filter semaphore capacity configurable
In this commit, we make the gossip filter semaphore capacity configurable
through a new FilterConcurrency field. This change allows node operators
to tune the number of concurrent gossip filter applications based on
their node's resources and network position.

The previous hard-coded limit of 5 concurrent filter applications could
become a bottleneck when multiple peers attempt to synchronize
simultaneously. By making this value configurable via the new
gossip.filter-concurrency option, operators can increase this limit
for better performance on well-resourced nodes or maintain conservative
values on resource-constrained systems.

We keep the default value at 5 to maintain backward compatibility and
avoid unexpected resource usage increases for existing deployments. The
sample configuration file is updated to document this new option.
2025-08-06 11:34:42 +02:00
yyforyongyu
8746a6e204 discovery: increase default msg rates to 1MB 2025-08-06 11:30:37 +02:00
ziggie
a68dec8c19 discovery: add comments 2025-07-03 16:18:03 +02:00
Elle Mouton
9b877b94c3 multi: use the "errors" package everywhere
Replace all usages of the "github.com/go-errors/errors" and
"github.com/pkg/errors" packages with the standard lib's "errors"
package. This ensures that error wrapping and `errors.Is` checks will
work as expected.
2025-07-01 20:08:12 +02:00
Elle Mouton
4efcb075de discovery: fix log line panic
If a method returns an error, we should assume all other parameters to
be nil unless the documentation explicitly says otherwise. So here, we
fix a log line where a dereference is made to an object that will be nil
due to an error being returned.
2025-07-01 08:46:28 +02:00
ziggie
48e440e560 discovery: increase syncer gossip chan buffer 2025-06-04 12:21:00 +02:00
ziggie
6f8a94c094 brontide: increase logging when processing gossip msgs
We add logging to we can draw conclusions how long the processing
of gossip message last and potentially see whether the syncer
buffer channel size is a bottleneck in processing.
2025-06-04 12:21:00 +02:00
ziggie
45ebb9b900 discovery: add comments to the ctx creation
We highlight why we do not use the returned cancel method of the
context guard.
2025-06-04 10:55:22 +02:00
Elle Mouton
6202597eec discovery: revert passing ctx through to Start methods 2025-06-04 10:54:35 +02:00
Elle Mouton
cd4a59071d autopilot: start threading contexts through
The `GraphSource` interface in the `autopilot` package is directly
implemented by the `graphdb.KVStore` and so we will eventually thread
contexts through to this interface. So in this commit, we start updating
the autopilot system to thread contexts through in preparation for
passing the context through to any calls made to the GraphSource.

Two context.TODOs are added here which will be addressed in follow up
commits.
2025-06-04 10:54:34 +02:00
Elle Mouton
0ab61e08fa discovery: listen on ctx in any select
For any method that takes a context that has a select that listens on
the systems quit channel, we should also listen on the ctx since we
should not need to worry about if this context is derived internally or
externally.
2025-06-04 10:54:33 +02:00
Elle Mouton
f2fb4827c7 discovery: remove unnecessary context.Background() calls 2025-06-04 10:54:33 +02:00
Elle Mouton
6b95b7933c discovery: pass context through to bootstrapper SampleNodeAddrs
Since the ChannelGraphBootstrapper implementation makes a call to the
graph DB.
2025-06-04 10:54:33 +02:00
Elle Mouton
1a8e7587f9 discovery: pass context to ProcessRemoteAnnouncement
With this, we move a context.TODO() out of the gossiper and into the
brontide package - this will be removed in a future PR which focuses on
threading contexts through that code.
2025-06-04 10:54:33 +02:00
Elle Mouton
1a5821a873 discovery: thread contexts through sync manager
Here, we remove one context.TODO() by threading a context through to the
SyncManager.
2025-06-04 10:54:32 +02:00
Elle Mouton
a1a7d771da discovery: thread contexts to syncer
The `GossiperSyncer` makes various calls to the `ChannelGraphTimeSeries`
interface which threads through to the graph DB. So in preparation for
threading context through to all the methods on that interface, we
update the GossipSyncer accordingly by passing contexts through.

Two `context.TODO()`s are added in this commit. They will be removed in
the upcoming commits.
2025-06-04 10:54:32 +02:00
Elle Mouton
5430157d0c discovery: pass context through to reliable sender
And remove a context.TODO() that was added in the previous commit.
2025-06-04 10:54:32 +02:00
Elle Mouton
e2c184f235 discovery: thread context through to gossiper
Pass the parent LND context to the gossiper, let it derive a child
context that gets cancelled on Stop. Pass the context through to any
methods that will eventually thread it through to any graph DB calls.

One `context.TODO()` is added here - this will be removed in the next
commit.

NOTE: for any internal methods that the context gets passed to, if those
methods already listen on the gossiper's `quit` channel, then then don't
need to also listen on the passed context's Done() channel because the
quit channel is closed at the same time that the context is cancelled.
2025-06-04 10:54:32 +02:00
Olaoluwa Osuntokun
30f3d7ce89 discovery: lower bandwidth rate limiting log to Debugf 2025-05-19 17:46:08 -07:00
Olaoluwa Osuntokun
c7ed5d65c6 multi: add new config options to tune gossip msg allocated bandwidth
We go with the defaults of if no values are set.
2025-03-24 19:21:45 -07:00
Olaoluwa Osuntokun
05702d48b2 discovery: switch to bytes based rate limiting for outbound msgs
In this commit, we revamp the old message based rate limiting. First, we
move to meter by bytes/s instead of messages/s. The old logic had an
error in that it limited groups of message replies, instead of each
message. With this new approach, we'll use the newly added
SerializedSize method to implement fine grained bandwidth metering.

We need to pick two values, the burst rate, and the msg bytes rate. The
burst rate is the max amt that can be sent in a given period of time. We
need to set this above 65 KB, or the max msg limit, otherwise no
messages can be sent. The bucket starts with this many tokens (bytes).
As those are depleted, the amount of tokens is refilled at the msg
bytes rate.

As conservative values, we've chosen 200 KB as the burst rate, and 100
KB/s as the limit.
2025-03-24 19:21:45 -07:00
Eugene Siegel
6eb746fbba server.go+accessman.go: introduce caches for access permissions
Here we introduce the access manager which has caches that will
determine the access control status of our peers. Peers that have
had their funding transaction confirm with us are protected. Peers
that only have pending-open channels with us are temporary access
and can have their access revoked. The rest of the peers are granted
restricted access.
2025-03-11 20:42:34 -04:00
yyforyongyu
37799b95b7 discovery: fix state transition in GossipSyncer
Previously, we would set the state of the syncer after sending the msg,
which has the following flow,

1. In state `queryNewChannels`, we send the msg `QueryShortChanIDs`.
2. Once the msg is sent, we change to state `waitingQueryChanReply`.

But there's no guarantee the remote won't reply back inbetween the two
step. When that happens, our syncer would still be in state
`queryNewChannels`, causing the following error,
```
[ERR] DISC gossiper.go:873: Process query msg from peer [Alice] got unexpected msg *lnwire.ReplyShortChanIDsEnd received in state queryNewChannels
```

To fix it, we now make sure the state is updated before sending the msg.
2025-03-10 16:58:16 +08:00
Elle Mouton
2e85e08556 discovery: grab channel mutex before any DB calls
In `handleChanUpdate`, make sure to grab the `channelMtx` lock before
making any DB calls so that the logic remains consistent.
2025-03-05 14:12:55 +02:00
Elle Mouton
5e35bd8328 discovery: demonstrate channel update rate limiting bug
This commit adds a test to demonstrate that if we receive two identical
updates (which can happen if we get the same update from two peers in
quick succession), then our rate limiting logic will be hit early as
both updates might be counted towards the rate limit. This will be fixed
in an upcoming commit.
2025-03-05 14:12:23 +02:00
Elle Mouton
e5db0d6314 graph+discovery: move funding tx validation to gossiper
This commit is a pure refactor. We move the transaction validation
(existence, spentness, correctness) from the `graph.Builder` to the
gossiper since this is where all protocol level checks should happen.
All tests involved are also updated/moved.
2025-02-12 15:48:08 +02:00
Elle Mouton
39bb23ea5e discovery: lock the channelMtx before making the funding script
As we move the funding transaction validation logic out of the builder
and into the gossiper, we want to ensure that the behaviour stays
consistent with what we have today. So we should aquire this lock before
performing any expensive checks such as building the funding tx or
valdating it.
2025-02-12 13:59:09 +02:00
Elle Mouton
7853e36488 graph+discovery: calculate funding tx script in gossiper
In preparation for an upcoming commit which will move all channel
funding tx validation to the gossiper, we first move the helper method
which helps build the expected funding transaction script based on the
fields in the channel announcement. We will still want this script later
on in the builder for updating the ChainView though, and so we pass this
field along with the ChannelEdgeInfo. With this change, we can remove
the TapscriptRoot field from the ChannelEdgeInfo since the only reason
it was there was so that the builder could reconstruct the full funding
script.
2025-02-12 13:15:54 +02:00
Elle Mouton
8a07bb0950 discovery: prepare tests for preparing the mock chain
Here, we add a new fundingTxOption modifier which will configure how we
set-up expected calls to the mock Chain once we have moved funding tx
logic to the gossiper. Note that in this commit, these modifiers don't
yet do anything.
2025-02-12 13:15:54 +02:00
Elle Mouton
22e391f055 discovery: add AssumeChannelValid config option
in preparation for later on when we need to know when to skip funding
transaction validation.
2025-02-12 13:15:54 +02:00
Elle Mouton
00f5fd9b7f graph: add IsZombieEdge method
This is in preparation for the commit where we move across all the
funding tx validation so that we can test that we are correctly updating
the zombie index.
2025-02-12 13:15:54 +02:00
Elle Mouton
870c865763 graph: export addZombieEdge and rename to MarkZombieEdge
The `graph.Builder`'s `addZombieEdge` method is currently called during
funding transaction validation for the case where the funding tx is not
found. In preparation for moving this code to the gossiper, we export
the method and add it to the ChannelGraphSource interface so that the
gossiper will be able to call it later on.
2025-02-12 13:15:53 +02:00
Elle Mouton
011d819315 discovery: update chanAnn creation methods to take modifier options
In preparation for adding more modifiers. We want to later add a
modifier that will tweak the errors returned by the mock chain once
funding transaction validation has been moved to the gossiper.
2025-02-07 16:29:19 +02:00
Elle Mouton
b6210632f2 discovery: prep testCtx with a mock Chain
This is in preparation for moving the funding transaction validation
code to the gossiper from the graph.Builder since then the gossiper will
start making GetBlockHash/GetBlock and GetUtxo calls.
2025-02-07 16:28:39 +02:00
Elle Mouton
8f37699db3 discovery: prepare tests for shared chain state
Convert a bunch of the helper functions to instead be methods on the
testCtx type. This is in preparation for adding a mockChain to the
testCtx that these helpers can then use to add blocks and utxos to.

See `notifications_test.go` for an idea of what we are trying to emulate
here. Once the funding tx code has moved to the gossiper, then the logic
in `notifications_test.go` will be removed.
2025-02-07 15:26:34 +02:00
Elle Mouton
b117daaa3c discovery+graph: convert errors from codes to variables
In preparation for moving funding transaction validiation from the
Builder to the Gossiper in later commit, we first convert these graph
Error Codes to normal error variables. This will help make the later
commit a pure code move.
2025-02-07 15:26:16 +02:00
Oliver Gugger
3c0350e481 Merge pull request #9476 from ellemouton/graph1
graph: refactor `graph.Builder` update handling
2025-02-07 07:23:41 -06:00
Elle Mouton
1974903fb2 multi: move node ann validation code to netann pkg
The `netann` package is a more appropriate place for this code to live.
Also, once the funding transaction code is moved out of the
`graph.Builder`, then no `lnwire` validation will occur in the `graph`
package.
2025-02-07 07:30:00 +02:00
Elle Mouton
b7509897d5 models: create a helper to convert wire NodeAnn to models.LNNode type
And use it in the gossiper. This helps ensure that we do this conversion
consistently.
2025-02-05 08:20:10 +02:00
Eugene Siegel
323b633895 graph -> discovery: move ValidationBarrier to discovery 2025-01-23 13:04:39 -05:00
Eugene Siegel
6a47a501c3 discovery+graph: track job set dependencies in ValidationBarrier
This commit does two things:
- removes the concept of allow / deny. Having this in place was a
  minor optimization and removing it makes the solution simpler.
- changes the job dependency tracking to track sets of abstact
  parent jobs rather than individual parent jobs.

As a note, the purpose of the ValidationBarrier is that it allows us
to launch gossip validation jobs in goroutines while still ensuring
that the validation order of these goroutines is adhered to when it
comes to validating ChannelAnnouncement _before_ ChannelUpdate and
_before_ NodeAnnouncement.
2025-01-23 11:43:07 -05:00
Oliver Gugger
49affa2dc3 Merge pull request #9424 from yyforyongyu/fix-gossip-ann
multi: fix inconsistent state in gossip syncer
2025-01-23 05:25:01 -06:00
yyforyongyu
27a05694cb multi: make ProofMatureDelta configurable
We add a new config option to set the `ProofMatureDelta` so the users
can tune their graphs based on their own preference over the num of
confs found in the announcement signatures.
2025-01-17 21:44:23 +08:00
yyforyongyu
772a9d5f42 discovery: fix mocked peer in unit tests
The mocked peer used here blocks on `sendToPeer`, which is not the
behavior of the `SendMessageLazy` of `lnpeer.Peer`. To reflect the
reality, we now make sure the `sendToPeer` is non-blocking in the tests.
2025-01-17 17:59:06 +08:00
yyforyongyu
9fecfed3b5 discovery: fix race access to syncer's state
This commit fixes the following race,
1. syncer(state=syncingChans) sends QueryChannelRange
2. remote peer replies ReplyChannelRange
3. ProcessQueryMsg fails to process the remote peer's msg as its state
   is neither waitingQueryChanReply nor waitingQueryRangeReply.
4. syncer marks its new state waitingQueryChanReply, but too late.

The historical sync will now fail, and the syncer will be stuck at this
state. What's worse is it cannot forward channel announcements to other
connected peers now as it will skip the broadcasting during initial
graph sync.

This is now fixed to make sure the following two steps are atomic,
1. syncer(state=syncingChans) sends QueryChannelRange
2. syncer marks its new state waitingQueryChanReply.
2025-01-17 02:39:07 +08:00
yyforyongyu
4b30b09d1c discovery: add new method handleSyncingChans
This is a pure refactor to add a dedicated handler when the gossiper is
in state syncingChans.
2025-01-17 00:22:22 +08:00