Add comprehensive test coverage to verify that state handler errors cause
the channelGraphSyncer goroutine to exit cleanly without entering endless
retry loops. These tests use mutation testing principles to ensure they
would fail if the fixes were removed.
TestGossipSyncerStateHandlerErrors is a table-driven test covering four
scenarios: context cancellation and peer disconnect during syncingChans
state, and context cancellation and network errors during queryNewChannels
state. Each test case verifies both attempt count (no endless loop) and
clean shutdown (no deadlock).
TestGossipSyncerProcessChanRangeReplyError verifies that errors from
processChanRangeReply in the waitingQueryRangeReply state cause clean
exit. This test sends multiple malformed messages and checks that only
the first is processed before the goroutine exits, using channel queue
depth to detect if the goroutine is still running.
All tests are race-detector clean and use mutation testing validation:
removing any of the error return statements causes the corresponding
tests to fail, confirming the tests properly verify the fixes.
This commit fixes a critical bug where the channelGraphSyncer goroutine
would enter an endless loop when context cancellation or peer disconnect
errors occurred during the syncingChans or queryNewChannels states.
The root cause was that state handler functions (handleSyncingChans and
synchronizeChanIDs) did not return errors to the main goroutine loop.
When these functions encountered fatal errors like context cancellation,
they would log the error and return early without changing the syncer's
state. This caused the main loop to immediately re-enter the same state
handler, encounter the same error, and loop indefinitely while spamming
error logs.
The fix makes error handling explicit by having state handlers return
errors. The main channelGraphSyncer loop now checks these errors and
exits cleanly when fatal errors occur. We return any error (not just
context cancellation) because fatal errors can manifest in multiple
forms: context.Canceled, ErrGossipSyncerExiting from the rate limiter,
lnpeer.ErrPeerExiting from Brontide, or network errors like connection
closed. This approach matches the error handling pattern already used in
other goroutines like replyHandler.
This commit addresses a regression where Neutrino rescanning starts
from an outdated height (~100k blocks behind) instead of using the
current synced height.
Root Cause:
In commit 16a8b623b, the initialization order was changed so that
Chain Notifier starts before wallet syncing completes. This means
the rescan begins using the stale height from BuildChainControl
rather than the fully synced height.
Old behavior (commit 1dfb5a0c2):
1. RPC server starts
2. Headers sync as part of daemon server
3. Chain Notifier starts after sync completes
4. Rescan begins from current (synced) height
Current behavior (regression):
1. Chain Notifier starts in newServer (before RPC)
2. Wallet sync happens after RPC server starts
3. Rescan uses outdated height from BuildChainControl
Solution:
- Ensure headers are fully synced before starting the chain notifier,
and after starting the RPC server.
- Move chain notifier startup to its correct location after headers are
fully synced.
- Make sure the starting beat is lazily called after chain notifier
started and before that starting beat result is used.
My key recently expired, in this commit, we update the keys to the new
refreshed version. These are the same keys, but with an expiry further
out.
Here's a clear sign of the latest Bitcoin block hash:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
000000000000000000013215ef7c32bc0427f388fc83623affe712f388
-----BEGIN PGP SIGNATURE-----
iHUEARYKAB0WIQQpYhJoGq3wVlaize6QUl997uCthgUCaPi6xwAKCRCQUl997uCt
hpqNAQC5VnnbO6h/PjywGhU4LLRvH8SdgdDEMSc7xrtWd1vgPgD+IDrHqiAb+h38
ORBnUVJCVuZrPebtdnYXVQhII91eaw4=
=WRbl
-----END PGP SIGNATURE-----
In this commit, we add a call to "go clean -cache" after each platform
build in the release script to prevent the Go build cache from accumulating
unbounded disk space during the sequential 15-platform build process.
When building for multiple platforms in sequence with "go build -v", Go
creates intermediate build artifacts and caches compiled packages for each
target platform. While this caching improves build performance within a
single platform build, it causes the cache to grow substantially when
building for many platforms sequentially. With 15 different platform/
architecture combinations, each with their own cached artifacts, this
accumulation was contributing to the disk space exhaustion.
By clearing the build cache after each platform completes, we prevent this
unbounded growth while still allowing each individual platform build to
benefit from caching during its own compilation. The module cache is
preserved (we only clear the build cache), so dependencies don't need to be
re-downloaded between platforms.
In this commit, we replace the basic inline cleanup command in the release
workflow with the comprehensive cleanup-space action that was previously
only used in the main CI workflow. The previous release workflow cleanup
simply removed the hostedtoolcache directory, which freed only a few
gigabytes and proved insufficient for multi-platform release builds.
By switching to the cleanup-space action (now enhanced to free 20-25GB),
the release workflow will have substantially more disk space available
before beginning the build process. This should resolve the disk space
exhaustion issues that were occurring during the Windows ARM build phase,
which is one of the final platforms in the 15-platform build sequence.
In this commit, we significantly expand the cleanup-space GitHub Actions
workflow to free up substantially more disk space on GitHub runners. The
previous cleanup only removed three large toolsets (dotnet, android,
ghc), which should free ~14GB. This enhancement adds removal of several
additional large packages and caches, bringing the total freed space to
approximately 20-25GB.
The specific additions include removing Swift and Julia language runtimes,
the hosted toolcache directory, all Docker images, numerous large apt
packages (aspnetcore, llvm, php, mongodb, mysql, azure-cli, browsers, and
development tools), and various cache directories. We also add disk space
reporting before and after cleanup to provide visibility into how much
space is actually being freed during workflow runs.
This enhancement was motivated by release builds running out of disk space
when building for all 15 supported platforms (darwin, freebsd, linux,
netbsd, openbsd, windows across multiple architectures). The sequential
builds with verbose output were consuming more space than the basic cleanup
could provide.
Older LND versions had a bug which would create HTLCs with
0 locktime. The utxonursery will have problems dealing with such
htlc outputs because we do not allow height hints of 0. Now we
will fetch the closeSummary of the channel and will add a
conservative height for rescanning.
It can happen that we are handling 2 of the same node announcements in
the same batch transaction. In that case, our `UpsertNode` conflict
assertion may fail. We need to handle this gracefully.
On MacOS, the default BSD gzip produces a different output than the GNU gzip
on Linux. To ensure reproducible builds, we need to use GNU gzip.
This is similar to what we do to enforce GNU tar.
In this commit, the lnwire.NodeAnnouncement2 type is defined. This will
be used to represent the `node_announcement_2` message used in the
Gossip 2 (1.75) protocol.