What version of nebula are you using? (nebula -version)
1.10.3
What operating system are you using?
Linux
Describe the Bug
Summary
The internal SSH server in Nebula (sshd package) handles the initial protocol handshake synchronously within the main TCP accept loop. A single TCP connection that connects but does not send the protocol version string can block the listener, preventing legitimate administrators from accessing the management interface.
Technical Analysis
I analyzed the sshd/server.go file and identified a concurrency issue in the Run method. The server accepts a new TCP connection and immediately calls ssh.NewServerConn(c, s.config) in the same thread/goroutine as the accept loop.
Vulnerable Code: sshd/server.go
func (s *SSHServer) run() {
for {
// 1. Accept new TCP connection
c, err := s.listener.Accept()
if err != nil { ... }
// VULNERABILITY: Blocking call.
// The loop halts here waiting for the SSH handshake to complete.
// If the client does not send the version string, this blocks forever.
conn, chans, reqs, err := ssh.NewServerConn(c, s.config)
// The goroutine is started ONLY after the handshake returns
go func() { ... }()
}
}
The ssh.NewServerConn function waits for the client to send the SSH version string (e.g., SSH-2.0-Client...). If a client connects (completes TCP handshake) but never sends this version string, the NewServerConn function blocks indefinitely. Because this happens inside the main for loop, s.listener.Accept() is never called for subsequent connections, effectively locking out all other users.
Steps to Reproduce
- Setup Environment
Generate the necessary CA, certificates, and a dummy SSH host key to run Nebula.
# Generate CA and Host Certificate
./nebula-cert ca -name "TestOrg"
./nebula-cert sign -name "host" -ip "192.168.100.1/24"
# Generate a dummy host key for SSHD
ssh-keygen -t ed25519 -f ssh_host_key -N ""
- Run Nebula
Start the nebula instance:
./nebula -config config.yml
- Attack (The Exploit)
Open a terminal and connect using netcat, but do not send any data. This simulates a hung client:
(Leave this command running. The server sends its banner, but we send nothing back.)
- Verification
Open a second terminal and attempt to connect as a legitimate administrator:
ssh -v -p 2222 admin@127.0.0.1
Result:
The legitimate SSH client successfully establishes a TCP connection (debug1: Connection established) but hangs indefinitely at debug1: Local version string.... It never receives the remote protocol version because the server's accept loop is blocked by the netcat connection.
Impact
Primary Impact: Denial of Service (DoS) of Management Interface
Loss of Administrative Access: An unauthenticated attacker can render the SSH management interface completely unavailable to legitimate administrators. This prevents the operations team from logging in to monitor, debug, reconfigure, or restart the node remotely.
Operational Blindness: During an active attack, administrators effectively lose control over the node's configuration and status monitoring via SSH.
Ease of Exploitation: The attack requires minimal resources (a single standard TCP connection using netcat) and no authentication. It can be sustained indefinitely with negligible cost to the attacker.
Scope: This vulnerability specifically impacts the Management Plane. While the overlay VPN traffic (Data Plane) may continue to function, the inability to manage the node represents a significant availability risk in production environments.
Recommended Fix
The handshake logic should be handled asynchronously to ensure the main accept loop is never blocked. Ideally, a deadline should also be set.
Suggested Patch Pattern:
func (s *SSHServer) run() {
for {
c, err := s.listener.Accept()
if err != nil { continue }
// Fix: Hand off to goroutine immediately
go s.handleHandshake(c)
}
}
func (s *SSHServer) handleHandshake(c net.Conn) {
// Set deadline to prevent resource leaks
c.SetDeadline(time.Now().Add(10 * time.Second))
conn, chans, reqs, err := ssh.NewServerConn(c, s.config)
// ... handle connection ...
}
Logs from affected hosts
1️⃣ Nebula Startup Logs
$ ./nebula -config config.yml
INFO[0000] Firewall rule added
INFO[0000] Firewall rule added
INFO[0000] Firewall started
INFO[0000] no ssh users to authorize
INFO[0000] listening on 0.0.0.0:4242
INFO[0000] Main HostMap created
INFO[0000] punchy disabled
WARN[0000] No lighthouse.hosts configured, this host will only be able to initiate tunnels with static_host_map entries
INFO[0000] Loaded send_recv_error config
INFO[0000] Loaded accept_recv_error config
INFO[0000] Nebula interface is active
INFO[0000] SSH server is listening sshListener="127.0.0.1:2222" subsystem=sshd
2️⃣ Attacker Connection (Stalled Handshake)
$ nc 127.0.0.1 2222
SSH-2.0-Nebula???
3️⃣ Legitimate Administrator Attempt (Blocked)
$ ssh -v -p 2222 admin@127.0.0.1
debug1: OpenSSH_10.2p1, OpenSSL 3.6.1 27 Jan 2026
debug1: Reading configuration data
debug1: Connecting to 127.0.0.1 [127.0.0.1] port 2222.
debug1: Connection established.
debug1: Local version string SSH-2.0-OpenSSH_10.2
Config files from affected hosts
Create config.yml
Save the following configuration in the same directory:
pki:
ca: ca.crt
cert: host.crt
key: host.key
sshd:
enabled: true
listen: 127.0.0.1:2222
host_key: ./ssh_host_key
tun:
# Disabled to simplify reproduction (no root required)
disabled: true
listen:
host: 0.0.0.0
port: 4242
firewall:
outbound: [{port: any, proto: any, host: any}]
inbound: [{port: any, proto: any, host: any}]
What version of
nebulaare you using? (nebula -version)1.10.3
What operating system are you using?
Linux
Describe the Bug
Summary
The internal SSH server in Nebula (sshd package) handles the initial protocol handshake synchronously within the main TCP accept loop. A single TCP connection that connects but does not send the protocol version string can block the listener, preventing legitimate administrators from accessing the management interface.
Technical Analysis
I analyzed the sshd/server.go file and identified a concurrency issue in the Run method. The server accepts a new TCP connection and immediately calls ssh.NewServerConn(c, s.config) in the same thread/goroutine as the accept loop.
Vulnerable Code: sshd/server.go
The ssh.NewServerConn function waits for the client to send the SSH version string (e.g., SSH-2.0-Client...). If a client connects (completes TCP handshake) but never sends this version string, the NewServerConn function blocks indefinitely. Because this happens inside the main for loop, s.listener.Accept() is never called for subsequent connections, effectively locking out all other users.
Steps to Reproduce
Generate the necessary CA, certificates, and a dummy SSH host key to run Nebula.
Start the nebula instance:
Open a terminal and connect using netcat, but do not send any data. This simulates a hung client:
(Leave this command running. The server sends its banner, but we send nothing back.)
Open a second terminal and attempt to connect as a legitimate administrator:
Result:
The legitimate SSH client successfully establishes a TCP connection (debug1: Connection established) but hangs indefinitely at debug1: Local version string.... It never receives the remote protocol version because the server's accept loop is blocked by the netcat connection.
Impact
Primary Impact: Denial of Service (DoS) of Management Interface
Loss of Administrative Access: An unauthenticated attacker can render the SSH management interface completely unavailable to legitimate administrators. This prevents the operations team from logging in to monitor, debug, reconfigure, or restart the node remotely.
Operational Blindness: During an active attack, administrators effectively lose control over the node's configuration and status monitoring via SSH.
Ease of Exploitation: The attack requires minimal resources (a single standard TCP connection using netcat) and no authentication. It can be sustained indefinitely with negligible cost to the attacker.
Scope: This vulnerability specifically impacts the Management Plane. While the overlay VPN traffic (Data Plane) may continue to function, the inability to manage the node represents a significant availability risk in production environments.
Recommended Fix
The handshake logic should be handled asynchronously to ensure the main accept loop is never blocked. Ideally, a deadline should also be set.
Suggested Patch Pattern:
Logs from affected hosts
1️⃣ Nebula Startup Logs
2️⃣ Attacker Connection (Stalled Handshake)
3️⃣ Legitimate Administrator Attempt (Blocked)
Config files from affected hosts
Create config.yml
Save the following configuration in the same directory: