Managing TCP window size and resource (memory) usage

Memory usage of TCP applications in Mirage are a well-known problem [^1][^2]. There are workarounds (such as the restart-on-failure mode mentioned in the linked article), but they are not guaranteed to always work.

Defending against this with hardcoded limits is difficult, because packets can be of different size, and it is difficult to predict how much memory they'd use once decoded and handled by the Mirage stack (and user application).
Instead I proporse to (approximatively) track the memory used by a connection, and take actions based on that. The tracking doesn't have to be entirely precise, as long as the real memory used is within some constant factor of the desired target.

### mirage-tcpip could implement soft limits and window shrinking:

* allow the user to specify the desired memory target (especially important for unikernels, which may have hard limits on how much memory you can assign to them)

* have various parts of the code track (approximately) how much memory it is using (e.g. size of buffers, count of entries in various tables). This doesn't have to be entirely accurate (e.g. could be an overestimate), but at least it'd ensure that memory usage is bounded by at most a constant factor of the configured memory target.

* slow down the rate at which new connections are accepted if we get close to that memory limit, starting to drop packets once we actually hit it

* reduce the size of the TCP window for connections which have too high backlog of pending packets (this puts the burden on the other side to retry, see below about window shrinking and zero window). [^3]

* ensure keepalives are used in all corners cases. This is needed, because otherwise certain TCP states would require us to keep state forever , see [^6]

* Dropping packets should only be done as a last resort (hard limit), see [^4] for when dropping packets is not ideal (or even allowed by the RFC) 
 
### Hard limits

I have patches that implement some missing hard limits, and they improve the availability of mirage applications.
But for best results they should be combined with some soft limits as described above (I don't yet have patches for that).

Nevertheless hard limits are useful as a last line of defense against bugs or inaccuracies in the soft limit implementation.

### Testing

I have some code that triggers various issues in the Mirage stack, that can be used to test the effectiveness of various solutions.
For obvious reasons I won't be publishing that code.

### Similar approaches

I've successfully implemented a similar approach (tracking resource usage at runtime) to defend `oxenstored` against out of memory issues:
https://xenbits.xen.org/xsa/advisory-326.html

### Next steps

I propose to open some PRs which implement the fallback hard limits as a starting point (once we find a solution for https://github.com/mirage/mirage-tcpip/issues/533).

I'd be happy to hear your thoughts on the suggested window shrinking defense (or if you have any other defenses in mind).

cc @Firobe 

### Background reading:

[^1]: https://tarides.com/blog/2024-01-24-mirageos-designing-a-more-resilient-networking-stack-with-tcp/
[^2]: https://hannes.robur.coop/Posts/TCP-ns
[^3]: https://blog.cloudflare.com/unbounded-memory-usage-by-tcp-for-receive-buffers-and-how-we-fixed-it/#shrinking-the-window
[^4]: https://blog.cloudflare.com/unbounded-memory-usage-by-tcp-for-receive-buffers-and-how-we-fixed-it/#drop-incoming-packets
[^5]: https://blog.cloudflare.com/when-the-window-is-not-fully-open-your-tcp-stack-is-doing-more-than-you-think/
[^6]: https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die/#idle-estab-is-forever


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Managing TCP window size and resource (memory) usage #534

mirage-tcpip could implement soft limits and window shrinking:

Hard limits

Testing

Similar approaches

Next steps

Background reading:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Managing TCP window size and resource (memory) usage #534

Description

mirage-tcpip could implement soft limits and window shrinking:

Hard limits

Testing

Similar approaches

Next steps

Background reading:

Footnotes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions