Skip to content

Cluster QUIC transport skips certificate verification and client auth by default; frames lack MAC (2 sub-items) #58

@hollanf

Description

@hollanf

The production constructor for the Raft / cluster transport uses a TLS configuration that (a) accepts any server certificate on the client side (SkipServerVerification) and (b) disables client authentication on the server side (with_no_client_auth). Combined with Raft frames being covered only by CRC32C (a checksum, not a MAC), any network peer that can reach the QUIC port can forge or replay Raft and SWIM messages — win elections, overwrite the replicated log, install crafted snapshots.


1. make_raft_client_config uses SkipServerVerification; make_raft_server_config uses with_no_client_auth

File: nodedb-cluster/src/transport/config.rs:49-112, invoked by client.rs:70-111

// config.rs
pub fn make_raft_server_config(tuning: &ClusterTransportTuning) -> Result<quinn::ServerConfig> {
    let (cert, key) = nexar::transport::tls::generate_self_signed_cert()...;
    let mut tls_config = rustls::ServerConfig::builder_with_provider(...)
        .with_no_client_auth()                                // ← server trusts any client
        .with_single_cert(vec![cert], key)?;
    ...
}

pub fn make_raft_client_config(tuning: &ClusterTransportTuning) -> Result<quinn::ClientConfig> {
    let mut tls_config = rustls::ClientConfig::builder_with_provider(...)
        .dangerous()
        .with_custom_certificate_verifier(Arc::new(SkipServerVerification))  // ← client trusts any server
        .with_no_client_auth();
    ...
}

struct SkipServerVerification;
impl rustls::client::danger::ServerCertVerifier for SkipServerVerification {
    // returns Ok(ServerCertVerified::assertion()) for every cert
    ...
}
// client.rs::with_tuning — the production constructor
let server_config = config::make_raft_server_config(tuning)?;
...
let client_config = config::make_raft_client_config(tuning)?;

The mTLS variants (make_raft_server_config_mtls at line 119, make_raft_client_config_mtls at line 170) exist but with_tuning unconditionally calls the non-mTLS pair. There is no runtime switch or config flag flipping production builds to the mTLS path — by default the cluster is open.

Concrete exploitation:

  • Any network peer that can send UDP to the QUIC port completes a QUIC handshake with a self-signed cert of its own choosing.
  • Both directions of TLS verification are effectively Ok(()).
  • The peer can now send or receive Raft RPCs framed by rpc_codec.
  • Forged RequestVoteRequest with term = u64::MAX wins the next election.
  • Forged AppendEntriesRequest writes attacker-chosen entries into the replicated log.
  • Forged InstallSnapshotRequest replaces state wholesale — including credentials, RLS policies, and tenant catalog.

Repro:

# Any host that can reach the QUIC raft port:
# 1. Open a quinn client with the non-mTLS config builder.
# 2. Handshake succeeds because server uses with_no_client_auth().
# 3. Call write_frame(RPC_TYPE_REQUEST_VOTE, forged_payload, &mut buf), send.

2. Raft RPC frames are protected only by CRC32C — no MAC, no origin binding, no anti-replay

File: nodedb-cluster/src/rpc_codec/header.rs:26-88, rpc_codec/raft_rpc.rs:74-107

pub fn write_frame(rpc_type: u8, payload: &[u8], out: &mut Vec<u8>) -> Result<()> {
    ...
    let crc = crc32c::crc32c(payload);
    out.push(WIRE_VERSION as u8);
    out.push(rpc_type);
    out.extend_from_slice(&payload_len.to_le_bytes());
    out.extend_from_slice(&crc.to_le_bytes());
    out.extend_from_slice(payload);
    ...
}

CRC32C is a transmission checksum, not a MAC. Frames carry no from_node_id, no sequence number, no timestamp, no signature. Even if an attacker cannot bypass TLS (after #1 is fixed), frame replay within a compromised TLS session, or between sessions that share a transport identity, is undetectable at the codec layer. Receiver has no way to detect that a captured AppendEntriesResponse{term, success} is being replayed.

This compounds with #1 but is independently exploitable in any scenario where the transport key rotates, one mTLS key leaks, a node is decommissioned, or a CA misconfigures.

Repro:

# Capture any AppendEntriesResponse frame on the wire.
# Replay N times; receiver processes each as a fresh valid response.

Checklist

  • 1. with_tuning (and all non-test constructors) must call the mTLS builders by default. Either remove the non-mTLS builders entirely or gate them behind an explicit --insecure-cluster-transport CLI flag that emits a large warning at startup.
  • 2. Extend the frame format with {from_node_id: u64, seq: u64, signature_or_mac}; reject frames whose origin or sequence is unexpected. Alternatively, ensure every RPC payload includes these fields (some already do at the app layer — verify and enforce).

Both items are independently verifiable by reading the cited source files.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions