We have exception like this in our logs after bumping netty from Netty 4.2.10.Final to Netty 4.2.12.Final:
{"@timestamp":"2026-04-22T03:43:25,124","level":"ERROR","thread":"beaconchain-async-4","class":"teku-status-log","message":"PLEASE FIX OR REPORT | Unexpected exception thrown for beaconchain-async-4","throwable":"java.util.concurrent.CompletionException: io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 131072 byte(s) of direct memory (used: 33554432, max: 33554432)\n\tat java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:315)\n\tat java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:320)\n\tat java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:936)\n\tat java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:911)\n\tat java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)\n\tat java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179)\n\tat tech.pegasys.teku.infrastructure.async.SafeFuture.lambda$propagateResult$2(SafeFuture.java:150)\n\tat java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)\n\tat java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)\n\tat java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)\n\tat java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179)\n\tat tech.pegasys.teku.infrastructure.async.SafeFuture.lambda$propagateResult$2(SafeFuture.java:150)\n\tat java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)\n\tat java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)\n\tat java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)\n\tat java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179)\n\tat tech.pegasys.teku.infrastructure.async.SafeFuture.lambda$propagateToAsync$29(SafeFuture.java:461)\n\tat tech.pegasys.teku.infrastructure.async.SafeFuture.of(SafeFuture.java:82)\n\tat tech.pegasys.teku.infrastructure.async.AsyncRunner.lambda$runAsync$2(AsyncRunner.java:47)\n\tat tech.pegasys.teku.infrastructure.async.SafeFuture.of(SafeFuture.java:74)\n\tat tech.pegasys.teku.infrastructure.async.ScheduledExecutorAsyncRunner.lambda$createRunnableForAction$1(ScheduledExecutorAsyncRunner.java:124)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)\n\tat java.base/java.lang.Thread.run(Thread.java:1583)\nCaused by: io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 131072 byte(s) of direct memory (used: 33554432, max: 33554432)\n\tat io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:1102)\n\tat io.netty.util.internal.CleanerJava9$CleanableDirectBufferImpl.<init>(CleanerJava9.java:136)\n\tat io.netty.util.internal.CleanerJava9$CleanableDirectBufferImpl.<init>(CleanerJava9.java:131)\n\tat io.netty.util.internal.CleanerJava9.allocate(CleanerJava9.java:86)\n\tat io.netty.util.internal.PlatformDependent.allocateDirect(PlatformDependent.java:633)\n\tat io.netty.buffer.UnpooledDirectByteBuf.allocateDirectBuffer(UnpooledDirectByteBuf.java:129)\n\tat io.netty.buffer.UnpooledDirectByteBuf.<init>(UnpooledDirectByteBuf.java:70)\n\tat io.netty.buffer.UnpooledUnsafeDirectByteBuf.<init>(UnpooledUnsafeDirectByteBuf.java:50)\n\tat io.netty.buffer.UnsafeByteBufUtil.newDirectByteBuf(UnsafeByteBufUtil.java:734)\n\tat io.netty.buffer.AdaptiveByteBufAllocator$DirectChunkAllocator.allocate(AdaptiveByteBufAllocator.java:114)\n\tat io.netty.buffer.AdaptivePoolingAllocator$SizeClassChunkController.newChunkAllocation(AdaptivePoolingAllocator.java:734)\n\tat io.netty.buffer.AdaptivePoolingAllocator$Magazine.allocate(AdaptivePoolingAllocator.java:957)\n\tat io.netty.buffer.AdaptivePoolingAllocator$Magazine.tryAllocate(AdaptivePoolingAllocator.java:854)\n\tat io.netty.buffer.AdaptivePoolingAllocator$MagazineGroup.allocate(AdaptivePoolingAllocator.java:421)\n\tat io.netty.buffer.AdaptivePoolingAllocator.allocate(AdaptivePoolingAllocator.java:269)\n\tat io.netty.buffer.AdaptivePoolingAllocator.allocate(AdaptivePoolingAllocator.java:255)\n\tat io.netty.buffer.AdaptiveByteBufAllocator.newDirectBuffer(AdaptiveByteBufAllocator.java:67)\n\tat io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:168)\n\tat io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:154)\n\tat io.netty.buffer.AbstractByteBufAllocator.buffer(AbstractByteBufAllocator.java:88)\n\tat tech.pegasys.teku.networking.p2p.libp2p.rpc.LibP2PRpcStream.writeBytes(LibP2PRpcStream.java:47)\n\tat tech.pegasys.teku.networking.eth2.rpc.core.RpcResponseCallback.completeWithErrorResponse(RpcResponseCallback.java:64)\n\tat tech.pegasys.teku.networking.eth2.rpc.core.RpcResponseCallback.completeWithUnexpectedError(RpcResponseCallback.java:81)\n\tat tech.pegasys.teku.networking.eth2.rpc.beaconchain.methods.LoggingResponseCallback.completeWithUnexpectedError(LoggingResponseCallback.java:52)\n\tat tech.pegasys.teku.networking.eth2.rpc.beaconchain.methods.CompletionAwareResponseCallback.completeWithUnexpectedError(CompletionAwareResponseCallback.java:75)\n\tat tech.pegasys.teku.networking.eth2.rpc.core.PeerRequiredLocalMessageHandler.handleError(PeerRequiredLocalMessageHandler.java:73)\n\tat tech.pegasys.teku.networking.eth2.rpc.beaconchain.methods.DataColumnSidecarsByRangeMessageHandler.lambda$onIncomingMessage$2(DataColumnSidecarsByRangeMessageHandler.java:200)\n\tat tech.pegasys.teku.infrastructure.async.SafeFuture.lambda$finish$39(SafeFuture.java:503)\n\tat java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:934)\n\t... 21 more\n"}
By default Netty has 32MB limit for direct data allocation.
Options to fix it:
-
Force the pooled allocator at runtime
Netty 4.2.x added the adaptive allocator as the default; the old PooledByteBufAllocator is still shipped and is battle-tested. Flip the default back for the Teku process
only:
// build.gradle applicationDefaultJvmArgs
"-Dio.netty.allocator.type=pooled",
This avoids the adaptive allocator entirely. You keep 4.2.12's CVE fixes / bug fixes, but sidestep the allocator regression. Recommended first step.
-
Disable the adaptive path directly at build
There's also a narrower switch that only affects where adaptive is used:
"-Dio.netty.allocator.useAdaptiveAllocator=false",
-
Increase direct buffer size limit from 32Mb to some greater value, say 64Mb
/in build.gradle applicationDefaultJvmArgs:
// 256Mb for Netty Direct ByteBuf
"-Dio.netty.maxDirectMemory=67108864",
The difference between pools strategy:
Adaptive (AdaptiveByteBufAllocator) — Netty 4.2's default, what bit us
- Pools buffers in per-thread magazines. Each magazine holds a handful of large chunks (up to 4 MB each). Allocations carve slices out of a chunk; releases return the slice
to the magazine.
- Self-tunes. The magazine decides its chunk size dynamically based on recent allocation patterns — hot threads get bigger chunks, quiet threads shrink.
- Fast path is lock-free. Most allocations are thread-local, no contention.
- Downside: retention. A magazine keeps its chunks even after buffers are released, so idle memory sits allocated. With many event-loop threads and bursty traffic, total
retained direct memory can be far larger than the sum of currently-live buffers. That's exactly what hit your 32 MB ceiling.
Unpooled (UnpooledByteBufAllocator)
- No pool. Every allocate() does a fresh native direct-buffer creation; every release() frees it (via the JDK Cleaner).
- No retention. Direct-memory usage ≈ currently-live buffers, near-zero steady-state overhead.
- Slow allocation/free. Each allocation involves a syscall-ish malloc; each free creates a PhantomReference for the Cleaner, so you also get more GC pressure from reference
processing.
- Not viable for Teku's hot paths. Gossip, RPC streaming, and discovery allocate buffers at high rates. Unpooled in those paths will tank throughput.
PooledByteBufAllocator sits between the two:
- Pools, like adaptive, but with fixed-shape arenas (pre-Netty-4.2's tried-and-true design).
- One arena per event-loop thread, buffers come from size classes.
- Lower per-thread retention than adaptive because arenas don't inflate chunk size dynamically.
- Well-understood memory profile — the one pre-4.2 Teku was implicitly using for years.
I've tried "-Dio.netty.allocator.type=pooled", PooledByteBufAllocator on one of the nightly nodes and exceptions gone away. But increasing direct buffer size could be a better long-term strategy. Anyway we should commit something to default args to make it default for any teku running.
We have exception like this in our logs after bumping netty from Netty 4.2.10.Final to Netty 4.2.12.Final:
By default Netty has 32MB limit for direct data allocation.
Options to fix it:
Force the pooled allocator at runtime
Netty 4.2.x added the adaptive allocator as the default; the old PooledByteBufAllocator is still shipped and is battle-tested. Flip the default back for the Teku process
only:
// build.gradle applicationDefaultJvmArgs
"-Dio.netty.allocator.type=pooled",This avoids the adaptive allocator entirely. You keep 4.2.12's CVE fixes / bug fixes, but sidestep the allocator regression. Recommended first step.
Disable the adaptive path directly at build
There's also a narrower switch that only affects where adaptive is used:
"-Dio.netty.allocator.useAdaptiveAllocator=false",Increase direct buffer size limit from 32Mb to some greater value, say 64Mb
/in build.gradle applicationDefaultJvmArgs:
// 256Mb for Netty Direct ByteBuf
"-Dio.netty.maxDirectMemory=67108864",The difference between pools strategy:
Adaptive (AdaptiveByteBufAllocator) — Netty 4.2's default, what bit us
to the magazine.
retained direct memory can be far larger than the sum of currently-live buffers. That's exactly what hit your 32 MB ceiling.
Unpooled (UnpooledByteBufAllocator)
processing.
PooledByteBufAllocator sits between the two:
I've tried "-Dio.netty.allocator.type=pooled",
PooledByteBufAllocatoron one of the nightly nodes and exceptions gone away. But increasing direct buffer size could be a better long-term strategy. Anyway we should commit something to default args to make it default for any teku running.