Skip to content

batch packets#1599

Draft
jrwren wants to merge 5 commits intomasterfrom
jay.wren-batch-packets
Draft

batch packets#1599
jrwren wants to merge 5 commits intomasterfrom
jay.wren-batch-packets

Conversation

@jrwren
Copy link
Copy Markdown
Contributor

@jrwren jrwren commented Feb 4, 2026

I used claude to come up with this plan and implement it.

Each commit here is a stage of the plan. The last step adds tun virtio header support.

Each commit is usable. I tested each commit on a memcached server with load applied. Surprisingly, There was almost no difference in CPU usage, nebula pps, or memcached commands per second.

I don't know if I'm missing some required tuning for the newer code or something else. The exact same configuration file was used throughout.


Plan: Implement UDP GSO (Generic Segmentation Offload)

Goal

Add UDP GSO support to reduce per-packet kernel overhead beyond what sendmmsg provides.

Why GSO Over sendmmsg?

Approach How It Works Kernel Work
sendmmsg 64 separate packets in one syscall Kernel processes each packet individually
GSO Coalesced buffer + segment size Kernel does ONE segmentation pass

GSO reduces CPU usage 30-50% at high packet rates and improves throughput 2-3x.

Current State

  • sendmmsg batching implemented (commit 15333f9)
  • TUN batch reads implemented (commit 30db76e)
  • listen.batch config controls both

GSO Implementation Plan

Phase 1: GSO Detection & Socket Setup

File: udp/udp_linux.go

Add GSO capability detection (reference: WireGuard's features_linux.go):

func supportsUDPOffload(fd int) (txOffload, rxOffload bool) {
    // Check UDP_SEGMENT support (TX)
    _, err := unix.GetsockoptInt(fd, unix.IPPROTO_UDP, unix.UDP_SEGMENT)
    txOffload = err == nil

    // Check UDP_GRO support (RX)
    opt, err := unix.GetsockoptInt(fd, unix.IPPROTO_UDP, unix.UDP_GRO)
    rxOffload = err == nil && opt == 1
    return
}

Enable GRO on socket creation if supported:

unix.SetsockoptInt(fd, unix.IPPROTO_UDP, unix.UDP_GRO, 1)

Phase 2: TX Path - Coalesce Packets Before Send

File: udp/udp_linux.go

Modify WriteBatch() to coalesce same-destination packets:

  1. Group packets by destination address
  2. For each group with multiple packets of same size:
    • Concatenate payloads into single buffer
    • Set UDP_SEGMENT control message with packet size
  3. Fall back to regular sendmmsg for mixed sizes or non-GSO

Coalescing constraints (from WireGuard):

  • Max 64 packets per GSO message (udpSegmentMaxDatagrams)
  • All packets except last must be same size
  • Max payload: 65507 bytes (IPv4) / 65527 bytes (IPv6)

Phase 3: RX Path - Split Coalesced Packets

File: udp/udp_linux.go

Modify receive path to handle GRO-coalesced packets:

  1. Enable IP_RECVORIGDSTADDR to get control messages
  2. Parse UDP_GRO from control message to get segment size
  3. Split coalesced buffer into individual packets
  4. Process each packet through existing handler

Phase 4: Graceful Fallback

  • Detect GSO support at startup
  • If GSO unavailable (older kernel, no NIC support): use current sendmmsg
  • If GSO fails with EIO (NIC doesn't support checksum offload): fall back

Files to Modify

  1. udp/udp_linux.go - Main GSO implementation

    • Add supportsUDPOffload() detection
    • Modify WriteBatch() to coalesce + use UDP_SEGMENT
    • Modify ListenOut() to handle UDP_GRO coalesced receives
  2. udp/conn.go - Interface updates

    • Add SupportsGSO() bool method to interface
    • Potentially new WriteBatchGSO() or modify existing
  3. udp/udp_generic.go / udp/udp_darwin.go - Stubs

    • Return false for GSO support (Linux-only)

Reference Implementation

WireGuard's GSO code in golang.zx2c4.com/wireguard/conn:

  • gso_linux.go - GSO control message helpers
  • features_linux.go - Capability detection
  • bind_std.go lines 450-544 - Coalesce/split logic

Verification

  1. Build: go build ./...

  2. Test: go test ./...

  3. Benchmark: Compare with/without GSO

    # Without GSO (set env to disable)
    NEBULA_DISABLE_GSO=1 ./nebula ...
    
    # With GSO (default if supported)
    ./nebula ...
  4. Check kernel support: Linux 5.0+ required for UDP_SEGMENT

Risks & Mitigations

Risk Mitigation
Older kernels Runtime detection, fallback to sendmmsg
NIC without checksum offload Catch EIO, disable GSO for session
Mixed packet sizes Only coalesce uniform sizes, sendmmsg for rest
IPv4/IPv6 mixing Separate coalescing per address family

Scope Decision

TX GSO only - Start with send-side GSO, validate performance, add RX GRO later.

Out of Scope (Future Work)

  • RX GRO (receive-side coalescing) - add after TX GSO proven
  • TUN GSO/GRO (virtio headers) - different optimization

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant