server: stagger initial reconnects

This commit adds optional jitter to our initial reconnection to our persistent peers. Currently we will attempt reconnections to all peers simultaneously, which results in large amount of contention as the number of channels a node has grows. We resolve this by adding a randomized delay between 0 and 30 seconds for all persistent peers. This spreads out the load and contention to resources such as the database, read/write pools, and memory allocations. On my node, this allows to start up with about 80% of the memory burst compared to the all-at-once approach. This also has a second-order effect in better distributing messages sent at constant intervals, such as pings. This reduces the concurrent jobs submitted to the read and write pools at any given time, resulting in better reuse of read/write buffers and fewer bursty allocation and garbage collection cycles.
2025-08-28 22:50:58 +02:00 · 2019-04-04 02:25:31 -07:00
parent 4de7d0c561
commit cf80476e01
2 changed files with 46 additions and 1 deletions
--- a/config.go
+++ b/config.go
@@ -252,6 +252,8 @@ type config struct {

 	RejectPush bool `long:"rejectpush" description:"If true, lnd will not accept channel opening requests with non-zero push amounts. This should prevent accidental pushes to merchant nodes."`

+	StaggerInitialReconnect bool `long:"stagger-initial-reconnect" description:"If true, will apply a randomized staggering between 0s and 30s when reconnecting to persistent peers on startup. The first 10 reconnections will be attempted instantly, regardless of the flag's value"`
+
 	net tor.Net

 	Routing *routing.Conf `group:"routing" namespace:"routing"`