587 Commits

Author SHA1 Message Date
Olaoluwa Osuntokun
b8ca876409 discovery/test: add comprehensive tests for state handler error exits
Add comprehensive test coverage to verify that state handler errors cause
the channelGraphSyncer goroutine to exit cleanly without entering endless
retry loops. These tests use mutation testing principles to ensure they
would fail if the fixes were removed.

TestGossipSyncerStateHandlerErrors is a table-driven test covering four
scenarios: context cancellation and peer disconnect during syncingChans
state, and context cancellation and network errors during queryNewChannels
state. Each test case verifies both attempt count (no endless loop) and
clean shutdown (no deadlock).

TestGossipSyncerProcessChanRangeReplyError verifies that errors from
processChanRangeReply in the waitingQueryRangeReply state cause clean
exit. This test sends multiple malformed messages and checks that only
the first is processed before the goroutine exits, using channel queue
depth to detect if the goroutine is still running.

All tests are race-detector clean and use mutation testing validation:
removing any of the error return statements causes the corresponding
tests to fail, confirming the tests properly verify the fixes.
2025-11-03 10:44:13 -08:00
Olaoluwa Osuntokun
06c7d60452 discovery: fix endless loop in gossip syncer on context cancellation
This commit fixes a critical bug where the channelGraphSyncer goroutine
would enter an endless loop when context cancellation or peer disconnect
errors occurred during the syncingChans or queryNewChannels states.

The root cause was that state handler functions (handleSyncingChans and
synchronizeChanIDs) did not return errors to the main goroutine loop.
When these functions encountered fatal errors like context cancellation,
they would log the error and return early without changing the syncer's
state. This caused the main loop to immediately re-enter the same state
handler, encounter the same error, and loop indefinitely while spamming
error logs.

The fix makes error handling explicit by having state handlers return
errors. The main channelGraphSyncer loop now checks these errors and
exits cleanly when fatal errors occur. We return any error (not just
context cancellation) because fatal errors can manifest in multiple
forms: context.Canceled, ErrGossipSyncerExiting from the rate limiter,
lnpeer.ErrPeerExiting from Brontide, or network errors like connection
closed. This approach matches the error handling pattern already used in
other goroutines like replyHandler.
2025-11-03 10:44:12 -08:00
Elle Mouton
b8abe130a5 multi: rename lnwire.NodeAnnouncement
In preparation for adding a NodeAnnouncement2 struct along with a
NodeAnnouncement interface, this commit renames the existing
NodeAnnouncment struct to NodeAnnouncement1.
2025-10-01 13:13:32 +02:00
Olaoluwa Osuntokun
f3a8fd842d graph+discovery: update graph/db gossip backlog interfaces to use iter.Seq2
This lets us emit a rich error if things fail when first creating the
iterator, or if any of the yield attempts fail.
2025-09-26 17:01:11 -07:00
Olaoluwa Osuntokun
0b3816afb9 discovery/test: update mock to support iterator-based UpdatesInHorizon
In this commit, we update the mockChannelGraphTimeSeries to implement
the new iterator-based UpdatesInHorizon interface. The mock maintains
its existing behavior of receiving messages through a channel and
returning them to the caller, but now wraps this in an iterator
function.

The implementation creates an iterator that pulls the entire message
slice from the mock's response channel, then yields each message
individually. This preserves the test semantics while conforming to the
new interface, ensuring all existing tests continue to pass without
modification.
2025-09-26 17:00:41 -07:00
Olaoluwa Osuntokun
77f3b35640 discovery: update ApplyGossipFilter to use lazy iterator with Pull2
In this commit, we update ApplyGossipFilter to leverage the new
iterator-based UpdatesInHorizon method. The key innovation here is using
iter.Pull2 to create a pull-based iterator that allows us to check if
any updates exist before launching the background goroutine.

This approach provides several benefits over the previous implementation.
First, we avoid the overhead of launching a goroutine when there are no
updates to send, which was previously unavoidable without materializing
the entire result set. Second, we maintain lazy loading throughout the
sending process, only pulling messages from the database as they're
needed for transmission.

The implementation uses Pull2 to peek at the first message, determining
whether to proceed with sending updates. If updates exist, ownership of
the iterator is transferred to the goroutine, which continues pulling
and sending messages until exhausted. This design ensures memory usage
remains bounded regardless of the number of updates being synchronized.
2025-09-26 17:00:11 -07:00
Olaoluwa Osuntokun
d8f6fd29f7 discovery: convert UpdatesInHorizon to return iter.Seq2[lnwire.Message, error]
In this commit, we complete the iterator conversion work started in PR
10128 by threading the iterator pattern through to the higher-level
UpdatesInHorizon method. This change converts the method from returning
a fully materialized slice of messages to returning a lazy iterator that
yields messages on demand.

The new signature uses iter.Seq2 to allow error propagation during
iteration, eliminating the need for a separate error return value. This
approach enables callers to handle errors as they occur during iteration
rather than failing upfront.

The implementation now lazily processes channel and node updates,
yielding them as they're generated rather than accumulating them in
memory. This maintains the same ordering guarantees (channels before
nodes) while significantly reducing memory pressure when dealing with
large update sets during gossip synchronization.
2025-09-26 16:59:41 -07:00
Olaoluwa Osuntokun
fda989da9c discovery+graph: update callers to use new iterator APIs
In this commit, we update all callers of NodeUpdatesInHorizon and
ChanUpdatesInHorizon to use the new iterator-based APIs. The changes
use fn.Collect to maintain existing behavior while benefiting from the
memory efficiency of iterators when possible.
2025-09-26 16:59:11 -07:00
Olaoluwa Osuntokun
069888b51a graph/db: convert ChanUpdatesInHorizon to use iterators
In this commit, we refactor the ChanUpdatesInHorizon method to return
an iterator instead of a slice. This change significantly reduces
memory usage when dealing with large result sets by allowing callers to
process items incrementally rather than loading everything into memory
at once.
2025-09-26 16:57:11 -07:00
Olaoluwa Osuntokun
1d6d54e5db graph/db: convert NodeUpdatesInHorizon to use iterators
In this commit, we refactor the NodeUpdatesInHorizon method to return
an iterator instead of a slice. This change significantly reduces
memory usage when dealing with large result sets by allowing callers to
process items incrementally rather than loading everything into memory
at once.

The new implementation uses Go 1.23's iter.Seq type to provide a
standard iterator interface. The method now supports configurable batch
sizes through functional options, allowing fine-tuned control over
memory usage and performance characteristics.

Rather than reading all the entries from disk into memory (before this
commit, we did consult the cache for most entries, skipping the disk
hits), we now expose a chunked iterator instead.

We also make the process of filtering out public nodes first class. This
saves many newly created db transactions later.
2025-09-26 16:56:41 -07:00
Elle
055fb436e1 Merge pull request #9175 from ellemouton/g175UpdateMessageStructure
lnwire+netann: update structure of g175 messages to be pure TLV
2025-09-22 10:04:44 +02:00
Elle Mouton
a5cf958c2c autopilot: fix nil map assignment
Use the `clear` call to reset a map on `reset` instead of assigning nil
to the entry.
2025-09-18 07:37:53 +02:00
Olaoluwa Osuntokun
2cc4079a0e routing+htlcswitch+discovery+peer+netann: optimize debug logging with lazy evaluation
In this commit, we update the network and routing layer components to use
lnutils.SpewLogClosure for debug logging.
2025-09-05 18:20:51 -07:00
Elle Mouton
cd3bd05810 multi: rename FetchLightningNode
to FetchNode
2025-09-03 10:14:35 +02:00
Elle Mouton
c663a557c4 multi: rename models.LightningNode to models.Node 2025-09-03 10:14:35 +02:00
Elle Mouton
70b5016bc8 discovery+gossip: make sure we dont advertise node anns with bad DNS
We may have already persisted node announcements that have multiple DNS
addresses since we may have received them before updating our code to
check for this. So here we just make sure not to send these on to our
peers.
2025-09-03 01:11:35 +00:00
yyforyongyu
5bc4666efc multi: add new config peer-msg-rate-bytes 2025-09-02 21:23:07 +08:00
yyforyongyu
778456769a discovery: rate limiting sending msgs per peer
We now add another layer of rate limiting before sending the messages
inside `GossipSyncer`.
2025-09-02 21:23:07 +08:00
yyforyongyu
6826703c77 discovery: introduce rate limiter to GossipSyncer 2025-09-02 21:13:46 +08:00
yyforyongyu
19bc941cbd discovery: create common helper methods for rate limiter
This allows us to reuse them in the upcoming commits where we introduce
a rate limiter to the gossip syncer.
2025-09-02 21:13:46 +08:00
Elle Mouton
d68d1fb355 netann: update ChanAnn2 validation to work for P2WSH channels
This commit expands the ChannelAnnouncement2 validation for the case
where it is announcing a P2WSH channel.
2025-09-01 12:10:31 +02:00
Elle Mouton
06bf0c28b6 multi: let FetchPkScript take SCID by value
Instead of a pointer.
2025-09-01 12:10:30 +02:00
Boris Nagaev
dee8ad3754 multi: context.Background() -> t.Context()
Use the new feature of Go 1.24, fix linter warnings.

This change was produced by:
 - running golangci-lint run --fix
 - sed 's/context.Background/t.Context/' -i `git grep -l context.Background | grep test.go`
 - manually fixing broken tests
 - itest, lntest: use ht.Context() where ht or hn is available
 - in HarnessNode.Stop() we keep using context.Background(), because it is
   called from a cleanup handler in which t.Context() is canceled already.
2025-08-30 14:13:44 -03:00
Boris Nagaev
2a599bedc0 multi: don't pass err.Error() to Fatalf
It resulted in interpreting the error message as a format string.
Use Fatal(err) instead.
2025-08-30 14:13:44 -03:00
yyforyongyu
fc11e48585 discovery: fix make lint 2025-08-04 15:47:12 +08:00
yyforyongyu
60603f0854 multi: allow disable banning peers
When users set `gossip.ban-threshold` to 0, it's now treated as setting
the ban score to max uint64, which effectively disables the banning. We
still want to record the peer's ban score in case we need it for future
debugging.
2025-08-04 15:46:16 +08:00
yyforyongyu
a6f8617e7c multi: add new config ban-threshold 2025-08-04 15:45:47 +08:00
yyforyongyu
3109a8f3fe discovery: pass banThreshold to banman
So we can configure it via gossip's config in a following commit.
2025-08-04 15:44:35 +08:00
yyforyongyu
11c53b7212 discovery: increase peer's ban score when received skewed channel_update 2025-08-04 15:44:34 +08:00
yyforyongyu
c3780a230c discovery: use handleBadPeer to increase peer's ban score 2025-08-04 15:44:34 +08:00
yyforyongyu
f1b2a47717 discovery: add new method handleBadPeer
So we can use the same piece of code elsewhere.
2025-08-04 15:44:34 +08:00
yyforyongyu
9cae62dcd7 discovery: only send node_announcement associated with channels
If a node doesn't have any channels, there's little point to send its
node_announcement as it cannot be used for routing.
2025-08-04 15:44:34 +08:00
yyforyongyu
f00073e62c discovery: skip sending channel_announcement with no channel_updates
Base on BOLT07:
> If a channel_announcement has no corresponding channel_updates:
>   - MUST NOT send the channel_announcement.
2025-08-04 15:44:33 +08:00
Olaoluwa Osuntokun
7e767eac82 discovery: only permit a single gossip backlog goroutine per peer
In this commit, we add a new atomic bool to only permit a single gossip
backlog goroutine per peer. If we get a new reuqest that needs a backlog
while we're still processing the other, then we'll drop that request.
2025-08-01 11:20:22 -05:00
Olaoluwa Osuntokun
5fcd33c50c discovery: add tests for for async timestamp range queue 2025-08-01 11:20:22 -05:00
Olaoluwa Osuntokun
f3ba372441 discovery: integrate async queue in ProcessRemoteAnnouncement
In this commit, we complete the integration of the asynchronous
timestamp range queue by modifying ProcessRemoteAnnouncement to use
the new queuing mechanism instead of calling ApplyGossipFilter
synchronously.

This change ensures that when a peer sends a GossipTimestampRange
message, it is queued for asynchronous processing rather than
blocking the gossiper's main message processing loop. The modification
prevents the peer's readHandler from blocking on potentially slow
gossip filter operations, maintaining connection stability during
periods of high synchronization activity.

If the queue is full when attempting to enqueue a message, we log
a warning but return success to prevent peer disconnection. This
design choice prioritizes connection stability over guaranteed
delivery of every gossip filter request, which is acceptable since
peers can always resend timestamp range messages if needed.
2025-08-01 11:20:21 -05:00
Olaoluwa Osuntokun
7fb289f24f discovery: add async timestamp range queue to prevent blocking
In this commit, we introduce an asynchronous processing queue for
GossipTimestampRange messages in the GossipSyncer. This change addresses
a critical issue where the gossiper could block indefinitely when
processing timestamp range messages during periods of high load.

Previously, when a peer sent a GossipTimestampRange message, the
gossiper would synchronously call ApplyGossipFilter, which could block
on semaphore acquisition, database queries, and rate limiting. This
synchronous processing created a bottleneck where the entire peer
message processing pipeline would stall, potentially causing timeouts
and disconnections.

The new design adds a timestampRangeQueue channel with a capacity of 1
message and a dedicated goroutine for processing these messages
asynchronously. This follows the established pattern used for other
message types in the syncer. When the queue is full, we drop messages
and log a warning rather than blocking indefinitely, providing graceful
degradation under extreme load conditions.
2025-08-01 11:20:21 -05:00
Olaoluwa Osuntokun
694cc15a73 discovery: make gossip filter semaphore capacity configurable
In this commit, we make the gossip filter semaphore capacity configurable
through a new FilterConcurrency field. This change allows node operators
to tune the number of concurrent gossip filter applications based on
their node's resources and network position.

The previous hard-coded limit of 5 concurrent filter applications could
become a bottleneck when multiple peers attempt to synchronize
simultaneously. By making this value configurable via the new
gossip.filter-concurrency option, operators can increase this limit
for better performance on well-resourced nodes or maintain conservative
values on resource-constrained systems.

We keep the default value at 5 to maintain backward compatibility and
avoid unexpected resource usage increases for existing deployments. The
sample configuration file is updated to document this new option.
2025-08-01 11:20:19 -05:00
yyforyongyu
cadf870dcb discovery: increase default msg rates to 1MB 2025-07-25 17:40:59 +08:00
Elle Mouton
c32bf642d2 multi: pass reset to ForEachNodeCached 2025-07-15 11:23:27 +02:00
Elle Mouton
e5fbca8299 multi: let ForEachNodeChannel take a reset param 2025-07-15 11:23:27 +02:00
Elle Mouton
793c1057bb graph: remove one context.TODO
By threading a context through to the Builder's ForAllOutgoingChannels
method.
2025-07-08 15:11:02 +02:00
Yong
ffd944e8b7 Merge pull request #10012 from ziggie1984/fix-goroutine-leak
multi: prevent goroutine leak in brontide
2025-07-03 20:11:31 +08:00
ziggie
dedb75aea4 discovery: add comments 2025-07-03 06:27:38 +02:00
Elle Mouton
2f2845dfc0 refactor+multi: use *lnwire.FeatureVector for ChannelEdgeInfo features
In this commit, we move the serialisation details of a channel's
features to the DB layer and change the `models` field to instead use a
more useful `*lnwire.FeatureVector` type.

This makes the features easier to work with and moves the serialisation
to where it is actually used.
2025-07-01 17:02:07 +02:00
Elle Mouton
37d6390642 discovery: use a no-op hash accumluator for local networks
If LND is running on a local network, then use deterministic sampling so
that we can have deterministic peer bootstrapping.
2025-07-01 11:27:18 +02:00
Elle Mouton
339dd0c1a7 discovery: introduce hashAccumulator interface
Create an abstract hashAccumulator interface for the Channel graph
bootstrapper so that we can later introduce a deterministic accumulator
to be used during testing.
2025-07-01 11:26:14 +02:00
Elle Mouton
25daf253c0 discovery: fix log line panic
If a method returns an error, we should assume all other parameters to
be nil unless the documentation explicitly says otherwise. So here, we
fix a log line where a dereference is made to an object that will be nil
due to an error being returned.
2025-06-30 18:27:56 +02:00
Elle Mouton
8cf567b948 multi: use the "errors" package everywhere
Replace all usages of the "github.com/go-errors/errors" and
"github.com/pkg/errors" packages with the standard lib's "errors"
package. This ensures that error wrapping and `errors.Is` checks will
work as expected.
2025-06-30 09:46:55 +02:00
Elle Mouton
e0801f5a5d discovery: use errors.Is for error check
Ensure that the check works for wrapped errors. Otherwise, this breaks
for a SQLStore graph backend which wraps the error it returns.
2025-06-26 10:11:02 +02:00