What version of nebula are you using? (nebula -version)
1.10.3
What operating system are you using?
rocky linux 8 (rhel8 clone)
Describe the Bug
I use nebula on about 100 hosts to run a p2p app inside the mesh. Each host is typically connected to about 50 other hosts max.
I've been running this setup for a few years already but I only noticed today whilst looking at prometheus stats that nebula has been slowly leaking memory for many months until it gets restarted. This happened with 1.9.7., and since I upgraded all my nodes to 1.10.3 today I can see memory slowly going up as well.
nebula_runtime_MemStats_Alloc for the last 5 months using v1.9.7 (each drop to 0 is a host restarting):
and the last 5h with v1.10.3 (memory trend going up slowly too):
interestingly enough, my 3 lighthouses do NOT show this leak but they also do not do any p2p traffic so maybe that explains the difference.
Logs from affected hosts
I'm not sure what logs and how much to include that could be relevant to the memory leak...
At info level I have mostly Handshake timed out messages (10-20 per min) but I think it's normal since not all my nodes are always accessible. Also I have a ssh user logged in every minute (I have a cronjob getting nebula hostmaps over ssh each minute)
here are a couple minutes of logs on one of the hosts:
Mar 17 17:54:00 host1 nebula[2606833]: time="2026-03-17T17:54:00+01:00" level=info msg="Handshake timed out" durationNs=6856008331 handshake="map[stage:1 style:ix_psk0]" initiatorIndex=353>
Mar 17 17:54:00 host1 nebula[2606833]: time="2026-03-17T17:54:00+01:00" level=info msg="Handshake timed out" durationNs=6743008693 handshake="map[stage:1 style:ix_psk0]" initiatorIndex=254>
Mar 17 17:54:01 host1 nebula[2606833]: time="2026-03-17T17:54:01+01:00" level=info msg="ssh user logged in" remoteAddress="127.0.0.1:59224" sshFingerprint="SHA256:xxxxxxxxxxxx>
Mar 17 17:54:02 host1 nebula[2606833]: time="2026-03-17T17:54:02+01:00" level=info msg="Handshake timed out" durationNs=6795351794 handshake="map[stage:1 style:ix_psk0]" initiatorIndex=400>
Mar 17 17:54:07 host1 nebula[2606833]: time="2026-03-17T17:54:07+01:00" level=info msg="Handshake timed out" durationNs=7088424603 handshake="map[stage:1 style:ix_psk0]" initiatorIndex=370>
Mar 17 17:54:07 host1 nebula[2606833]: time="2026-03-17T17:54:07+01:00" level=info msg="Handshake timed out" durationNs=6606159160 handshake="map[stage:1 style:ix_psk0]" initiatorIndex=247>
Mar 17 17:54:11 host1 nebula[2606833]: time="2026-03-17T17:54:11+01:00" level=info msg="Handshake timed out" durationNs=6661422710 handshake="map[stage:1 style:ix_psk0]" initiatorIndex=134>
Mar 17 17:54:16 host1 nebula[2606833]: time="2026-03-17T17:54:16+01:00" level=info msg="Handshake timed out" durationNs=6861751980 handshake="map[stage:1 style:ix_psk0]" initiatorIndex=359>
Mar 17 17:54:16 host1 nebula[2606833]: time="2026-03-17T17:54:16+01:00" level=info msg="Handshake timed out" durationNs=6750871014 handshake="map[stage:1 style:ix_psk0]" initiatorIndex=238>
Mar 17 17:54:17 host1 nebula[2606833]: time="2026-03-17T17:54:17+01:00" level=info msg="Handshake timed out" durationNs=6693127148 handshake="map[stage:1 style:ix_psk0]" initiatorIndex=902>
Mar 17 17:54:23 host1 nebula[2606833]: time="2026-03-17T17:54:23+01:00" level=info msg="Handshake timed out" durationNs=6761373363 handshake="map[stage:1 style:ix_psk0]" initiatorIndex=151>
Mar 17 17:54:23 host1 nebula[2606833]: time="2026-03-17T17:54:23+01:00" level=info msg="Handshake timed out" durationNs=6648436805 handshake="map[stage:1 style:ix_psk0]" initiatorIndex=145>
Mar 17 17:54:25 host1 nebula[2606833]: time="2026-03-17T17:54:25+01:00" level=info msg="Handshake timed out" durationNs=6663000386 handshake="map[stage:1 style:ix_psk0]" initiatorIndex=212>
Mar 17 17:54:30 host1 nebula[2606833]: time="2026-03-17T17:54:30+01:00" level=info msg="Handshake timed out" durationNs=6659522520 handshake="map[stage:1 style:ix_psk0]" initiatorIndex=796>
Mar 17 17:54:30 host1 nebula[2606833]: time="2026-03-17T17:54:30+01:00" level=info msg="Handshake timed out" durationNs=6644052190 handshake="map[stage:1 style:ix_psk0]" initiatorIndex=357>
Mar 17 17:54:32 host1 nebula[2606833]: time="2026-03-17T17:54:32+01:00" level=info msg="Handshake timed out" durationNs=6796309453 handshake="map[stage:1 style:ix_psk0]" initiatorIndex=238>
Mar 17 17:54:37 host1 nebula[2606833]: time="2026-03-17T17:54:37+01:00" level=info msg="Handshake timed out" durationNs=6808914034 handshake="map[stage:1 style:ix_psk0]" initiatorIndex=278>
Mar 17 17:54:37 host1 nebula[2606833]: time="2026-03-17T17:54:37+01:00" level=info msg="Handshake timed out" durationNs=6990890344 handshake="map[stage:1 style:ix_psk0]" initiatorIndex=279>
Mar 17 17:54:41 host1 nebula[2606833]: time="2026-03-17T17:54:41+01:00" level=info msg="Handshake timed out" durationNs=6961631643 handshake="map[stage:1 style:ix_psk0]" initiatorIndex=302>
Mar 17 17:54:44 host1 nebula[2606833]: time="2026-03-17T17:54:44+01:00" level=info msg="Handshake timed out" durationNs=6733695836 handshake="map[stage:1 style:ix_psk0]" initiatorIndex=228>
Mar 17 17:54:44 host1 nebula[2606833]: time="2026-03-17T17:54:44+01:00" level=info msg="Handshake timed out" durationNs=6733697616 handshake="map[stage:1 style:ix_psk0]" initiatorIndex=112>
Mar 17 17:54:44 host1 nebula[2606833]: time="2026-03-17T17:54:44+01:00" level=info msg="Handshake timed out" durationNs=6733715816 handshake="map[stage:1 style:ix_psk0]" initiatorIndex=350>
Mar 17 17:54:44 host1 nebula[2606833]: time="2026-03-17T17:54:44+01:00" level=info msg="Handshake timed out" durationNs=6707769459 handshake="map[stage:1 style:ix_psk0]" initiatorIndex=389>
Mar 17 17:54:46 host1 nebula[2606833]: time="2026-03-17T17:54:46+01:00" level=info msg="Handshake timed out" durationNs=6660714223 handshake="map[stage:1 style:ix_psk0]" initiatorIndex=332>
Mar 17 17:54:48 host1 nebula[2606833]: time="2026-03-17T17:54:48+01:00" level=info msg="Handshake timed out" durationNs=6849693245 handshake="map[stage:1 style:ix_psk0]" initiatorIndex=292>
Mar 17 17:54:51 host1 nebula[2606833]: time="2026-03-17T17:54:51+01:00" level=info msg="Handshake timed out" durationNs=6748233156 handshake="map[stage:1 style:ix_psk0]" initiatorIndex=162>
Mar 17 17:54:51 host1 nebula[2606833]: time="2026-03-17T17:54:51+01:00" level=info msg="Handshake timed out" durationNs=6703145971 handshake="map[stage:1 style:ix_psk0]" initiatorIndex=102>
Mar 17 17:54:51 host1 nebula[2606833]: time="2026-03-17T17:54:51+01:00" level=info msg="Handshake timed out" durationNs=6703208511 handshake="map[stage:1 style:ix_psk0]" initiatorIndex=306>
Mar 17 17:54:51 host1 nebula[2606833]: time="2026-03-17T17:54:51+01:00" level=info msg="Handshake timed out" durationNs=6703229851 handshake="map[stage:1 style:ix_psk0]" initiatorIndex=225>
Mar 17 17:54:53 host1 nebula[2606833]: time="2026-03-17T17:54:53+01:00" level=info msg="Handshake timed out" durationNs=6748145156 handshake="map[stage:1 style:ix_psk0]" initiatorIndex=256>
Mar 17 17:54:55 host1 nebula[2606833]: time="2026-03-17T17:54:55+01:00" level=info msg="Handshake timed out" durationNs=6746614980 handshake="map[stage:1 style:ix_psk0]" initiatorIndex=151>
Mar 17 17:55:00 host1 nebula[2606833]: time="2026-03-17T17:55:00+01:00" level=info msg="Handshake timed out" durationNs=6758704525 handshake="map[stage:1 style:ix_psk0]" initiatorIndex=596>
Mar 17 17:55:00 host1 nebula[2606833]: time="2026-03-17T17:55:00+01:00" level=info msg="Handshake timed out" durationNs=6644429325 handshake="map[stage:1 style:ix_psk0]" initiatorIndex=138>
Mar 17 17:55:01 host1 nebula[2606833]: time="2026-03-17T17:55:01+01:00" level=info msg="ssh user logged in" remoteAddress="127.0.0.1:51440" sshFingerprint="SHA256:xxxxxxxxxxxx>
Mar 17 17:55:02 host1 nebula[2606833]: time="2026-03-17T17:55:02+01:00" level=info msg="Handshake timed out" durationNs=6895768992 handshake="map[stage:1 style:ix_psk0]" initiatorIndex=327>
Mar 17 17:55:07 host1 nebula[2606833]: time="2026-03-17T17:55:07+01:00" level=info msg="Handshake timed out" durationNs=6690390610 handshake="map[stage:1 style:ix_psk0]" initiatorIndex=231>
Mar 17 17:55:08 host1 nebula[2606833]: time="2026-03-17T17:55:08+01:00" level=info msg="Handshake timed out" durationNs=6669996449 handshake="map[stage:1 style:ix_psk0]" initiatorIndex=338>
Config files from affected hosts
configs for the p2p hosts:
pki:
ca: /opt/nebula/ca.crt
cert: /opt/nebula/host.crt
key: /opt/nebula/host.key
blocklist:
disconnect_invalid: true
static_host_map:
"100.96.0.1":
- "lh1:4242"
"100.96.0.2":
- "lh2:4242"
"100.96.0.3":
- "lh3:4242"
lighthouse:
am_lighthouse: false
serve_dns: false
dns:
host: 0.0.0.0
port: 15353
interval: 60
hosts:
- "100.96.0.3"
- "100.96.0.1"
- "100.96.0.2"
local_allow_list:
interfaces:
"docker.*": false
"br-.*": false
"nebula.*": false
remote_allow_list:
"0.0.0.0/0": true
"::/0": false
listen:
host: 0.0.0.0
port: 4242
read_buffer: 20000000
write_buffer: 20000000
punchy:
punch: true
respond: true
cipher: aes
sshd:
enabled: true
listen: 127.0.0.1:12222
host_key: /opt/nebula/ssh_host.key
authorized_users:
- user: root
keys:
- "xxxxxxxxxxxxxxxxxxxx"
relay:
am_relay: false
use_relays: true
tun:
disabled: false
dev: nebula1
drop_local_broadcast: false
drop_multicast: false
tx_queue: 5000
mtu: 1300
routes:
unsafe_routes:
logging:
level: info
format: text
stats:
type: prometheus
listen: 0.0.0.0:18888
path: /metrics
subsystem: nebula
interval: 10s
lighthouse_metrics: true
firewall:
outbound_action: drop
inbound_action: drop
conntrack:
tcp_timeout: 12m
udp_timeout: 3m
default_timeout: 10m
outbound:
- port: any
proto: any
host: any
inbound:
...
configs for the lighthouses:
pki:
ca: /config/ca.crt
cert: /config/host.crt
key: /config/host.key
disconnect_invalid: true
lighthouse:
am_lighthouse: true
serve_dns: true
dns:
host: 0.0.0.0
port: 53
interval: 60
listen:
host: 0.0.0.0
port: 4242
punchy:
punch: true
relay:
am_relay: true
use_relays: true
tun:
disabled: false
dev: nebula1
drop_local_broadcast: false
drop_multicast: false
tx_queue: 500
mtu: 1440
routes:
unsafe_routes:
logging:
level: info
format: text
sshd:
enabled: true
listen: 0.0.0.0:12222
host_key: /ssh/ssh_host.key
authorized_users:
- user: root
keys:
- "xxxxxxxxxxxxxx"
stats:
type: prometheus
listen: 0.0.0.0:8080
path: /metrics
subsystem: nebula
interval: 30s
lighthouse_metrics: true
firewall:
outbound_action: drop
inbound_action: drop
conntrack:
tcp_timeout: 12m
udp_timeout: 3m
default_timeout: 10m
outbound:
- port: any
proto: any
host: any
inbound:
...
What version of
nebulaare you using? (nebula -version)1.10.3
What operating system are you using?
rocky linux 8 (rhel8 clone)
Describe the Bug
I use nebula on about 100 hosts to run a p2p app inside the mesh. Each host is typically connected to about 50 other hosts max.
I've been running this setup for a few years already but I only noticed today whilst looking at prometheus stats that nebula has been slowly leaking memory for many months until it gets restarted. This happened with 1.9.7., and since I upgraded all my nodes to 1.10.3 today I can see memory slowly going up as well.
nebula_runtime_MemStats_Allocfor the last 5 months using v1.9.7 (each drop to 0 is a host restarting):and the last 5h with v1.10.3 (memory trend going up slowly too):
interestingly enough, my 3 lighthouses do NOT show this leak but they also do not do any p2p traffic so maybe that explains the difference.
Logs from affected hosts
I'm not sure what logs and how much to include that could be relevant to the memory leak...
At
infolevel I have mostlyHandshake timed outmessages (10-20 per min) but I think it's normal since not all my nodes are always accessible. Also I have assh user logged inevery minute (I have a cronjob getting nebula hostmaps over ssh each minute)here are a couple minutes of logs on one of the hosts:
Config files from affected hosts
configs for the p2p hosts:
configs for the lighthouses: