netstack: let ICMP echo handlers consume requests#13189
Conversation
IPv4 echo requests are delivered to the transport dispatcher before netstack builds the in-stack echo reply, but a per-stack default handler cannot tell the ICMP endpoint that it has taken ownership of the request. This leaves embedders that install a custom ICMP handler with no direct way to suppress the built-in reply. Add a transport dispatcher extension that reports whether the per-stack default transport handler consumed the packet, and use that signal in the IPv4 and IPv6 ICMP echo paths. For IPv6, deliver echo requests through the same endpoint/default-handler path before deciding whether to synthesize the built-in reply. Keep ordinary endpoint delivery distinct from default-handler ownership so registered ICMP endpoints and raw sockets can observe echo requests without suppressing the normal reply path. This also preserves the IPv4 temporary-address behavior introduced by the earlier echo-handling fix. Signed-off-by: Zi Li <zi.li@linux.dev> Signed-off-by: Amaindex <amaindex@outlook.com>
|
cc for gVisor context and review: @nybidari @ayushr2 @avagin Hi all, sorry for taking so long to come back to this. After #11609 I was focused on other work for a while, but I recently revisited the remaining ICMP Echo handler semantics from #8657 and prepared a smaller follow-up change. The scope is intentionally narrower than the full It also keeps the temporary-address behavior from #11609, so the existing promiscuous-mode fix is not rolled back. While preparing the change, I discussed the direction with several downstream/heavy netstack users to validate the embedding use case and compatibility expectations. I also used AI-assisted review as an additional pass to look for missing edge cases. The technical argument is still based on the source-level reasoning and tests in the description. The unit tests exercise the behavior directly without setting up a TUN device: make test TARGETS=//pkg/tcpip/network/ipv4:ipv4_test OPTIONS=--test_filter=TestICMPEcho
make test TARGETS=//pkg/tcpip/network/ipv6:ipv6_test OPTIONS=--test_filter=TestICMPEchoThe expected matrix is: I also ran TUN-based manual tests. They are not required for review, but they make the behavior visible with Optional IPv4 manual TUN testUse the synthetic Go branch prepared from this change. It mirrors git clone -b go-netstack-icmp-echo-default-handler https://github.com/Amaindex/gvisor.git
cd gvisor
cat >./icmp_echo_tun_testprog.go <<'EOF'
package main
import (
"flag"
"fmt"
"log"
"net"
"gvisor.dev/gvisor/pkg/rawfile"
"gvisor.dev/gvisor/pkg/tcpip"
"gvisor.dev/gvisor/pkg/tcpip/header"
"gvisor.dev/gvisor/pkg/tcpip/link/fdbased"
"gvisor.dev/gvisor/pkg/tcpip/link/tun"
"gvisor.dev/gvisor/pkg/tcpip/network/ipv4"
"gvisor.dev/gvisor/pkg/tcpip/stack"
"gvisor.dev/gvisor/pkg/tcpip/transport/icmp"
)
func main() {
tunName := flag.String("tun", "gvicmp0", "TUN device name")
handlerMode := flag.String("handler", "false", "none, false, or true")
flag.Parse()
s := stack.New(stack.Options{
NetworkProtocols: []stack.NetworkProtocolFactory{ipv4.NewProtocol},
TransportProtocols: []stack.TransportProtocolFactory{icmp.NewProtocol4},
})
mtu, err := rawfile.GetMTU(*tunName)
if err != nil {
log.Fatal(err)
}
fd, err := tun.Open(*tunName)
if err != nil {
log.Fatal(err)
}
linkEP, err := fdbased.New(&fdbased.Options{
FDs: []int{fd},
MTU: mtu,
EthernetHeader: false,
})
if err != nil {
log.Fatal(err)
}
const nicID = 1
if err := s.CreateNIC(nicID, linkEP); err != nil {
log.Fatal(err)
}
if err := s.SetPromiscuousMode(nicID, true); err != nil {
log.Fatal(err)
}
if err := s.SetSpoofing(nicID, true); err != nil {
log.Fatal(err)
}
stackAddr := tcpip.AddrFromSlice(net.IPv4(11, 0, 0, 2).To4())
if err := s.AddProtocolAddress(nicID, tcpip.ProtocolAddress{
Protocol: ipv4.ProtocolNumber,
AddressWithPrefix: tcpip.AddressWithPrefix{
Address: stackAddr,
PrefixLen: 24,
},
}, stack.AddressProperties{}); err != nil {
log.Fatal(err)
}
s.SetRouteTable([]tcpip.Route{{
Destination: header.IPv4EmptySubnet,
NIC: nicID,
}})
switch *handlerMode {
case "none":
case "false", "true":
handled := *handlerMode == "true"
s.SetTransportProtocolHandler(icmp.ProtocolNumber4, func(id stack.TransportEndpointID, pkt *stack.PacketBuffer) bool {
h := header.ICMPv4(pkt.TransportHeader().Slice())
fmt.Printf("handler: %s -> %s type=%d temporary=%t returning=%t\n",
id.RemoteAddress, id.LocalAddress, h.Type(), pkt.NetworkPacketInfo.LocalAddressTemporary, handled)
return handled
})
default:
log.Fatalf("unknown -handler=%q; want none, false, or true", *handlerMode)
}
fmt.Printf("stack address: 11.0.0.2/24, handler=%s\n", *handlerMode)
select {}
}
EOF
sudo ip tuntap del dev gvicmp0 mode tun 2>/dev/null || true
sudo ip tuntap add dev gvicmp0 mode tun
sudo ip addr add 11.0.0.1/24 dev gvicmp0
sudo ip link set gvicmp0 upTerminal 1: sudo go run ./icmp_echo_tun_testprog.go -tun=gvicmp0 -handler=falseTerminal 2: ping -c 2 11.0.0.2Expected result: the handler logs the Echo Requests and Then restart Terminal 1 with: sudo go run ./icmp_echo_tun_testprog.go -tun=gvicmp0 -handler=trueRun the same Expected result: the handler logs the Echo Requests and The existing #11609 temporary-address behavior can be verified with: sudo go run ./icmp_echo_tun_testprog.go -tun=gvicmp0 -handler=false
ping -c 2 11.0.0.99Expected result: the handler logs Cleanup: sudo ip tuntap del dev gvicmp0 mode tun
rm -f ./icmp_echo_tun_testprog.goI also exercised the same TUN-style flow for IPv6 with Related regression tests also pass: make test TARGETS=//pkg/tcpip/network/ipv4:ipv4_test OPTIONS=--test_filter=TestIcmpRateLimit
make test TARGETS=//pkg/tcpip/network/ipv6:ipv6_test OPTIONS=--test_filter=TestNeighborSolicitationResponse
make test TARGETS="//pkg/tcpip/network/ipv4:ipv4_test //pkg/tcpip/network/ipv6:ipv6_test"
make test TARGETS=//pkg/tcpip/stack:stack_testI would especially appreciate feedback on two design points:
Thanks again for the earlier discussion and for your patience. |
|
@Amaindex thanks for the CC. For our use case the LocalAddressTemporary suppression has been a blocker. We use gVisor for many addresses. Because of the LAT suppression we need to add all of these to the gVisor stack, but certain code paths in gVisor (e.g., https://github.com/google/gvisor/blob/master/pkg/tcpip/stack/addressable_endpoint_state.go#L567) are O(n) with respect to number of registered addresses. The ideal solution would be to support ICMP replies on temporary addresses, or to allow handlers to explicitly invoke the default stack response. |
@ericpauley thanks, this is helpful context. I agree that My current thinking is to keep this PR focused on the default-handler ownership rule and preserve the existing That follow-up API should let a handler explicitly delegate an Echo Request back to netstack, e.g. with something like Once that explicit delegation API exists, I think we can also discuss making |
Overview
This change makes ICMP Echo Request handling follow the usual
SetTransportProtocolHandlerdefault-handler contract: if an ICMP default handler returns true, the stack treats the request as consumed and does not also synthesize a built-in Echo Reply.The default behavior is preserved when there is no ICMP default handler, or when the handler returns
false.The intended scope is small:
LocalAddressTemporarybehavior from netstack: allow defaultHandler respond ICMPv4Echo in promiscuous mode #11609Echo Requestdefault-handler contract for IPv6ICMPv6 NDPand other ICMP control paths unchangedRelated discussion:
Context
TCP and UDP default handlers already have a clear ownership signal:
ICMP Echois different today becauseEcho Requests are intercepted by IPv4/IPv6 network endpoints before the built-in reply is generated.That makes it possible for an embedder to receive an
Echo Requestand still get a second reply from netstack.The earlier discussion started in #8657, where @fredwangwang reported that
ICMP Echocould still be answered by the stack even when an ICMP handler was installed withSetTransportProtocolHandler. @deepcode2019 noted the same need, and @kevinGC suggested that a tested PR would be welcome.#11609 took a narrow first step for IPv4 promiscuous-mode temporary addresses. After it merged, @ericpauley pointed out that restoring the old behavior could require reimplementing ICMP in a custom handler, and @dyhkwong noted the remaining IPv4/IPv6 inconsistency.
This change addresses that follow-up without widening the behavior change beyond Echo Request default-handler ownership.
The compatibility-sensitive case is
runsc. It registers ICMP as a normal transport protocol, but does not install an ICMP default handler for this path:runsc/boot/loader.goSo the no-default-handler path remains the ordinary built-in
Echo Replypath.Why This Matters
The main beneficiary is not ICMP itself, but embedders that use netstack inside user-space tunneling or virtual-networking software.
Several widely used downstream or adjacent projects use netstack, or downstream forks of it, in this role, including Tailscale's userspace networking, wireguard-go's netstack TUN backend, sing-box/mihomo-style TUN stacks, and tun2socks.
For those programs, an
Echo Replyis often a policy decision:If netstack synthesizes a local
Echo Replyafter the embedder has consumed the request, unreachable or policy-blocked addresses can appear reachable.After this change, ICMP default handlers can return
trueand take responsibility for Echo behavior. That lets tunneling software return replies that reflect its own forwarding policy and the current state of the tunnel.Flow
Current behavior:
New behavior:
The asymmetry above is intentional: the new default-handler ownership rule is shared by IPv4 and IPv6, while
LocalAddressTemporaryis a pre-existing IPv4 compatibility guard from #11609.Implementation
Relevant code points:
SetTransportProtocolHandlerstores a per-stack default handler.TransportDispatcherWithDefaultHandlerResultcarries the narrower ownership signal.DeliverTransportPacketWithDefaultHandlerResultshares the NIC transport delivery path.defaultHandlerresult propagation reports true only when the default handler consumed the packet.network/ipv4/icmp.gouses that signal while preservingLocalAddressTemporary.network/ipv6/icmp.godelivers Echo Requests through the ICMP endpoint/default-handler path before deciding whether to synthesize the built-in reply.TransportPacketHandledis too broad for this decision:Only one branch should suppress the built-in Echo Reply:
This change therefore adds a narrow dispatcher extension:
The second result is
trueonly when the per-stackdefaultHandlerreturnedtrue:This keeps endpoint visibility separate from default-handler ownership.
For IPv6, the built-in reply path now saves the data needed for the reply before transport delivery, because
DeliverTransportPacketmay modify thePacketBuffer:This also makes IPv6 Echo Requests observable through the same registered-endpoint/default-handler delivery path used by IPv4. A registered ICMP endpoint does not suppress the built-in reply; only a per-stack default handler returning
truedoes.Compatibility
Default runsc/netstack behavior should remain unchanged.
This change separates two decisions:
Behavior matrix:
ICMPv6 scope:
I am not trying to make IPv6 mirror the IPv4
LocalAddressTemporarypath in this change. That would be a separate behavior change. Here, IPv6 gains the Echo Request endpoint/default-handler delivery path and the default-handler ownership rule.LocalAddressTemporaryremains useful as an IPv4 signal for embedders that already need to distinguish promiscuous-mode temporary local delivery from ordinary local delivery.Follow-ups
I plan to keep the remaining work split into smaller follow-ups.
If this direction looks acceptable, I can follow up promptly with the mechanical helper extraction first. The
ICMP ForwarderAPI can then be discussed on top of that smaller refactor, so the public API shape does not have to be decided in this change.First, factor the built-in IPv4/IPv6
Echo Replyconstruction into an internal helper while preserving route lookup, rate limiting, stats, checksums, IPv4 options, IPv6 traffic class handling, and output hooks.Then add an
ICMP ForwarderAPI, analogous to the TCP/UDP forwarders:The goal of
ForwarderRequest.Reply()is to let embedders explicitly delegateEcho Replygeneration back to the stack, without copying netstack's reply construction logic downstream. That should also provide a cleaner way to restore the old "let netstack reply" behavior when a custom handler decides not to own a particular Echo Request.Tests
The new tests cover the behavior matrix directly.
IPv4:
IPv6:
Commands:
All of the commands above pass in my fork branch,
netstack-icmp-echo-default-handler.Review Context
While preparing the change, I discussed the direction with several downstream/heavy netstack users to validate the embedding use case and compatibility expectations.
I also used AI-assisted review as an additional pass to look for missing edge cases. The technical argument here is still based on the source links, behavior matrix, and tests above.
Fixes #8657.