Draft
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I used claude to come up with this plan and implement it.
Each commit here is a stage of the plan. The last step adds tun virtio header support.
Each commit is usable. I tested each commit on a memcached server with load applied. Surprisingly, There was almost no difference in CPU usage, nebula pps, or memcached commands per second.
I don't know if I'm missing some required tuning for the newer code or something else. The exact same configuration file was used throughout.
Plan: Implement UDP GSO (Generic Segmentation Offload)
Goal
Add UDP GSO support to reduce per-packet kernel overhead beyond what
sendmmsgprovides.Why GSO Over sendmmsg?
GSO reduces CPU usage 30-50% at high packet rates and improves throughput 2-3x.
Current State
sendmmsgbatching implemented (commit 15333f9)listen.batchconfig controls bothGSO Implementation Plan
Phase 1: GSO Detection & Socket Setup
File:
udp/udp_linux.goAdd GSO capability detection (reference: WireGuard's
features_linux.go):Enable GRO on socket creation if supported:
Phase 2: TX Path - Coalesce Packets Before Send
File:
udp/udp_linux.goModify
WriteBatch()to coalesce same-destination packets:UDP_SEGMENTcontrol message with packet sizeCoalescing constraints (from WireGuard):
udpSegmentMaxDatagrams)Phase 3: RX Path - Split Coalesced Packets
File:
udp/udp_linux.goModify receive path to handle GRO-coalesced packets:
IP_RECVORIGDSTADDRto get control messagesUDP_GROfrom control message to get segment sizePhase 4: Graceful Fallback
Files to Modify
udp/udp_linux.go- Main GSO implementationsupportsUDPOffload()detectionWriteBatch()to coalesce + use UDP_SEGMENTListenOut()to handle UDP_GRO coalesced receivesudp/conn.go- Interface updatesSupportsGSO() boolmethod to interfaceWriteBatchGSO()or modify existingudp/udp_generic.go/udp/udp_darwin.go- StubsReference Implementation
WireGuard's GSO code in
golang.zx2c4.com/wireguard/conn:gso_linux.go- GSO control message helpersfeatures_linux.go- Capability detectionbind_std.golines 450-544 - Coalesce/split logicVerification
Build:
go build ./...Test:
go test ./...Benchmark: Compare with/without GSO
Check kernel support: Linux 5.0+ required for UDP_SEGMENT
Risks & Mitigations
Scope Decision
TX GSO only - Start with send-side GSO, validate performance, add RX GRO later.
Out of Scope (Future Work)