Validator Graceful Handoff #505
realJDP
started this conversation in
XLS Ideas (pre standard proposal)
Replies: 1 comment
-
|
possibly add to the information here. XRPLF/rippled#5755 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Title: Validator Graceful Handoff
Revision: 1 (2026-03-24)
Type: Idea
Author: JDP
Abstract
This proposal introduces a tmHandoff peer protocol message and a corresponding warm-standby process mode for rippled. Together they enable a validator to signal a planned, time-bounded absence to its peers before going offline, allowing the network to temporarily exclude it from quorum calculations without penalizing its agreement score. The primary use case is near-zero-downtime software upgrades on a single server, but the mechanism is general enough to cover any planned maintenance window.
Motivation and Rationale
Upgrading a rippled validator currently requires restarting the process. Even a clean restart causes 30–60 seconds of missed validations. The network cannot distinguish between a deliberate maintenance window and an unexpected failure, so the validator’s agreement score degrades regardless of intent.
The two-server standby pattern solves this at significant operational cost. Smaller validators and independent operators are disproportionately impacted — they face a choice between delaying upgrades (a security and compatibility risk) or restarting promptly and incurring unnecessary agreement score penalties.
A lightweight signaling mechanism would allow validators to communicate intent before going offline. Peers can then exclude them gracefully from the current consensus round rather than waiting for a timeout. Combined with a warm-standby process mode in rippled itself, this enables a software swap that completes within a single ledger close (~3–4 seconds).
Amendment
This proposal adds two components to rippled: a new peer protocol message and a new process startup mode.
Component 1: tmHandoff Peer Protocol Message
A new signed peer message type broadcast before a planned shutdown:
message TMHandoff {
required bytes validator_public_key = 1;
required uint32 absent_ledgers = 2; // max 10
required uint32 ledger_sequence = 3; // must be current +1
required bytes signature = 4;
}
Peer behavior on receipt:
Safety constraints:
∙ Maximum absence — 10 ledger closes (~40–50 seconds). Hard cap enforced by peers.
∙ Rate limit — A validator may not broadcast tmHandoff more than once per 256 ledgers (~15 minutes). Peers track the last handoff sequence per key and reject duplicates.
∙ Concurrent cap — If more than floor((1 - quorum_threshold) × UNL_size) validators are simultaneously in handoff state, peers reject additional messages to protect quorum.
∙ Replay prevention — Messages must include the current ledger sequence as a nonce. Peers reject any message with a sequence older than current minus 3.
∙ Signature requirement — Messages must be signed by the validator’s current manifest signing key. Unsigned or incorrectly signed messages are silently dropped.
Component 2: Warm Standby Process Mode
A new rippled startup flag –warm-standby launches the process in a non-signing state that connects to peers, fully syncs the ledger database, holds the signing key in memory without using it, and accepts an activate command via Unix socket to atomically promote itself to full validator mode.
The warm standby uses a separate peer port and a read-only snapshot of the primary database so it does not conflict with the running instance. Once fully synced it is ready to activate within milliseconds.
Upgrade flow:
Example shell sequence:
sudo apt-get install -y rippled
rippled –warm-standby –peer-port 51236 &
rippled –warm-standby server_info | grep ‘synced’
rippled submit_handoff –ledgers 3
echo ‘activate’ | nc -U /var/run/rippled-standby.sock
sudo systemctl stop rippled
Security Considerations
Quorum protection — The concurrent handoff cap ensures the network can never drop below quorum through coordinated signals. For a typical UNL of 35 validators at 80% quorum threshold, at most 7 simultaneous handoffs are permitted: floor((1 - 0.80) × 35) = 7.
Replay prevention — Messages are bound to a ledger sequence number. Peers reject anything older than current minus 3, preventing replay of captured handoff signals.
Key compromise — If a signing key is compromised, an attacker can force at most one 40–50 second absence per ~15 minutes due to the rate limit. This is a minor incremental risk and does not change the recommended key rotation response.
No new trust assumptions — Peers already track validator keys and manifests. tmHandoff extends this existing infrastructure with no new trusted parties, cryptographic primitives, or persistent state beyond a per-key timestamp.
Affected Components
Open Questions
References
∙ XRPL Consensus Research: https://arxiv.org/abs/1802.07242
∙ rippled peer protocol source: https://github.com/XRPLF/rippled/tree/develop/src/xrpld/overlay
∙ Validator setup: https://xrpl.org/docs/infrastructure/configuration/server-modes/run-rippled-as-a-validator
∙ XRPL Amendment process: https://xrpl.org/docs/concepts/networks-and-servers/amendments
∙ Auto-update on Linux: https://xrpl.org/docs/infrastructure/installation/update-rippled-automatically-on-linux
Beta Was this translation helpful? Give feedback.
All reactions