erts: add chunked process_info backtrace BIFs#11095
Conversation
Adds three new BIFs to allow reading a process backtrace incrementally
(in fixed-size chunks) without ever materialising the full backtrace as
a single binary in memory:
erlang:process_info_backtrace_start(Pid, Options)
-> {ok, Handle, FirstChunk} | done | badarg
erlang:process_info_backtrace_next(Handle)
-> {more, Chunk} | done
erlang:process_info_backtrace_stop(Handle)
-> ok
Options: [{chunk_size, Bytes}] (default: 1024)
Design highlights
-----------------
* The target process is suspended for the duration of the session so
that the backtrace view is consistent across all chunks.
* An ERTS_MON_TYPE_SUSPEND monitor is created between the caller and
the target; if the caller exits mid-session the demonitor signal path
automatically resumes the target.
* Timers on the target are paused/resumed symmetrically with the
suspend, so no timer fires are missed.
* Backtrace generation is fully lazy via a new erts_stack_dump_step()
function that emits one logical unit (PC info or one stack element)
per call. Peak C-heap pressure is bounded to chunk_size + one step's
output — never the full backtrace.
* The lazy output is byte-for-byte identical to process_info(Pid,
backtrace) (verified by T9 in the test suite).
Motivation: in memory-tight production deployments the existing
process_info(Pid, backtrace) BIF can require 2× the full backtrace
size in C-heap (dsbuf + refc binary simultaneously), making it
impractical to dump very large processes — exactly the ones most
interesting when diagnosing memory growth. The chunked interface
removes that constraint.
Files changed
-------------
erts/emulator/beam/atom.names new atom: process_info_backtrace_start
erts/emulator/beam/bif.tab BIF declarations
erts/emulator/beam/erl_bif_info.c BIF implementations + RPC callbacks
erts/emulator/beam/erl_proc_sig_queue.c hook caller-exit cleanup into SUSPEND demonitor path
erts/emulator/beam/erl_process.c erts_stack_dump_step (lazy step function)
erts/emulator/beam/erl_process.h ErtsStackDumpCursor, ErtsBacktraceSession structs
erts/preloaded/src/erlang.erl BIF declarations + specs
erts/preloaded/ebin/erlang.beam recompiled
Test script: /tmp/full_test.escript (15 tests, all passing)
This is a prototype / proof-of-concept intended to accompany an EEP.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Tests that the step buffer in the chunked backtrace BIF survives the
target process migrating between schedulers between backtrace_start_cb
and backtrace_next_cb.
Before the fix (ERTS_ALC_T_TMP_DSBUF), allocating on scheduler X and
freeing on scheduler Y causes a NULL-pointer crash inside
unlink_carrier() because the carrier is on allctr[X]'s list, not
allctr[Y]'s.
The test engineers the migration by:
- spawning each target bound to scheduler 1 (ensuring start_cb
allocates on allctr[1]), then immediately unbinding it so it is
eligible for migration;
- flooding scheduler 1 with low-priority spinners so
erts_check_emigration_need() moves unbound targets to other
schedulers when next_cb signals arrive.
On unfixed code (TMP_DSBUF) this reliably crashes beam.smp via SIGSEGV.
On fixed code (SIG_DATA) all N*4 concurrent sessions complete normally.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The stepbuf (erts_dsprintf_buf_t) in ErtsBacktraceSession was allocated using ERTS_ALC_T_TMP_DSBUF, which uses a thread-specific (thr_spec) allocator. This means allocations and frees are expected to happen on the same scheduler thread. However, the stepbuf is allocated in backtrace_start_cb (on whatever scheduler picks up the RPC for the target process at start time) and freed in backtrace_next_cb / backtrace_stop_cb (on whatever scheduler picks up a subsequent RPC for the same target). Because the target process can migrate between schedulers while suspended, these two callbacks may execute on different schedulers. When the stepbuf backing carrier is freed on a different scheduler than the one that allocated it, mbc_free() calls destroy_carrier() with the current scheduler's allctr. destroy_carrier() calls unlink_carrier(&allctr->mbc_list, crr), but crr is in the HOME scheduler's mbc_list, not the current one. Since crr is not in the current list, cl->first != crr, so the else branch executes crr->prev->next = crr->next -- but crr->prev == NULL (it was the head of its home list) -- causing the NULL-pointer segfault observed at erl_alloc_util.c:1286. Fix: replace ERTS_ALC_T_TMP_DSBUF with ERTS_ALC_T_SIG_DATA for the stepbuf. SIG_DATA uses a thread-safe allocator that is explicitly designed for cross-scheduler signal data (the ErtsBacktraceSession struct itself already uses SIG_DATA for the same reason). Three local helper functions (grow/create/destroy_backtrace_dsbuf) mirror the tmp_dsbuf variants but use SIG_DATA throughout. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The chunked backtrace BIF (erts_stack_dump_step) previously called stack_element_dump with precision=-1, which causes erts_printf_term to use a default limit of 100,000 dcount units before emitting "... ". For the non-chunked path (erts_stack_dump / process_info(Pid,backtrace)) this is a reasonable safety limit. For the chunked path the output is serialised incrementally via a dsbuf, so there is no single large allocation risk. Pass INT_MAX instead so the format handler uses LONG_MAX as the depth counter (unlimited). Verified against a production system with deeply nested gen_server State values (FXS schema records with long URI atoms whose dcount exceeds 100k due to atom-byte-length accounting in erts_printf_term): Old (100k limit): 400 "... " truncation markers in rendered backtrace New (unlimited): 0 truncation markers Wall time: 5.1 s | File: 4.1 MB | Render RSS: +234 MB
The session handle is now a magic binary reference instead of an opaque
{Ref, Pid} tuple. When the caller lets the handle go out of scope and it
is garbage collected, the destructor fires automatically and resumes the
suspended target process — no explicit _stop call needed and no SUSPEND
monitor required.
Key changes:
- ErtsBacktraceSession is now embedded inside a magic binary
(erts_create_magic_binary / ERTS_MAGIC_BIN_DATA), not separately
allocated on ERTS_ALC_T_SIG_DATA.
- backtrace_session_destructor: CAS active 1→0, acquire target MAIN lock,
free stepbuf, clear chunked_backtrace, resume timers + process.
- _start_2 now passes a lightweight BacktraceStartArgs (chunk_size only).
backtrace_start_cb uses a stack-local session to fill the first chunk;
only creates the magic binary when there is data to deliver.
- build_backtrace_reply uses erts_mk_magic_ref(&hp, &bp->off_heap, mbin)
for the handle, sized with ERTS_MAGIC_REF_THING_SIZE + 4.
- _next_1 / _stop_1 validate with is_internal_magic_ref +
ERTS_MAGIC_BIN_DESTRUCTOR type check; extract target_pid from ses.
- Removed: SUSPEND monitor, cleanup_origin_cb,
backtrace_session_monitor_cleanup, erts_backtrace_session_cleanup.
- erl_proc_sig_queue.c: removed erts_backtrace_session_cleanup call.
- erlang.erl: opaque handle type is now reference().
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Tests that a dropped handle (going out of scope via tail call) causes the GC destructor to fire and resume the suspended target process. On branches without the magic binary handle (otp-22 through otp-27) the target remains suspended after GC, demonstrating that the test would have caught the missing cleanup. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
All 15 chunked_backtrace_* CT tests pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
erts_factory_trim_and_close calls erts_resize_message_buffer which may realloc the ErlHeapFragment to a new address. The old local 'bp' pointer becomes stale after the trim. Fix: read the (possibly relocated) fragment from hfact.heap_frags after the trim, instead of using the original 'bp'. Found via valgrind; previously manifested as invalid reads/writes in erts_save_message_in_proc after the caller received the reply. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ErtsPrintTermCursor struct and its three API functions declared in erl_printf_term.h; erl_process.h gains the term_cursor, in_term and term_max_bytes fields in ErtsStackDumpCursor. process_SUITE.erl gains 15 new chunked_backtrace_* test cases covering basic multi-chunk round-trips, cross-scheduler, dropped handle, byte equality against classic process_info/2, and edge cases. erlang.erl BIF declaration updated for process_info_backtrace_start/2. All 15 tests pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
CT Test Results 4 files 199 suites 1h 53m 58s ⏱️ For more details on these failures, see this check. Results for commit 5de641b. ♻️ This comment has been updated with latest results. To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass. See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally. Artifacts
// Erlang/OTP Github Action Bot |
The magic binary destructor only freed stepbuf/cursor.wstack inside the "if (target->chunked_backtrace == ses)" branch. When the target died first, the target lookup returned NULL so the branch was never taken, leaking the buffers when the handle was eventually GC'd. Move destroy_backtrace_session_buffers(ses) ahead of the target lookup so the buffers are always freed once the destructor wins the CAS. The target-resume logic remains correctly gated on chunked_backtrace == ses. Add test chunked_backtrace_target_exit: start a session, exit the target, verify _next returns done, _stop returns ok, and the handle can be GC'd cleanly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Thanks for the PR, we'll look into it. 🙂
A backtrace is mostly just expensive when the stack is really deep, and in that case you want it to be complete because the most interesting things usually are in the beginning ("how did we enter this catastrophic stack growth?"). Do you think a "front-trace" giving the first N frames would be sufficient in most cases? It would be a much smaller change. |
That's highly appreciated. 🙏
We take what we get, of course. In the cases where we now have to skip processes from the dump, we would use that functionality instead. Maybe a reentrant term printer could be useful elsewhere? If you implemented that in a more trustworthy/robust way than what is in this PR, then "the rest" is not so much. 😉 We have suffered on numerous occasions, that the important data is hidden "later" in very deep terms in the This is a different situation from "if we print it we die", but it leads to the same problem - we don't always get to |
We've taken a glance, and it seems like we won't be able to take a deeper look until after summer, so it'll have to wait. |
Summary
Add three new BIFs for reading a process backtrace incrementally in
fixed-size chunks, without materialising the full backtrace in memory
at once:
erlang:process_info_backtrace_start(Pid, ChunkSize)→Handleerlang:process_info_backtrace_next(Handle)→{{backtrace, Chunk}} | doneerlang:process_info_backtrace_stop(Handle)→okMotivation
We are stuck trying to make it possible to dump out the backtraces of
certain erlang processes in production systems, and for those to be complete.
When our customers deploy our products, and they run into issues (often our
bugs or misarchitectures, in erlang code) where their erlang node grows
non-catastrophically, but nearing an OOM situation, we provide a built-in
"debug dump" functionality, which writes diagnostics on file.
Those diagnostics include information about allocators and process info etc.
This is similar to a crash dump but intended to be non-destructive.
Unfortunately, we have run into situations where our debug dump has tipped
the scales, due to the amount of memory required to run
process_info(Pid,backtrace) -> BigBinarywith a big process as target.
For the time being, we choose to skip processes over a certain stack depth
and/or heap size.
But, those processes are usually the most interesting processes given the
problem statement.
So, to make it possible for us to capture the backtraces of really big
processes, and without any truncation (unless decided by our debug dump
collector on a timeout basis) we would like to suggest a chunked API
for reading process backtraces.
With this feature, plus the new
process_iteratorAPIs, users would beable to reduce the memory footprint of their system diagnostics tools built
on top of
process_info.This is a fair amount of code in erts, and it is more code than the PR
author has managed to get the full grasp of, so look upon this PR more
as a request for functionality than a suggestion for inclusion in the
OTP code base as-is.
Implementation notes
ErtsPrintTermCursorterm printer (erl_printf_term.c/h)yields after printing at most
ChunkSizebytes of any single term, thenresumes from the same position on the next call.
unblocked if the caller drops the handle without calling
_stop.ERTS_ALC_T_TMP_DSBUF(scheduler-local) to
ERTS_ALC_T_SIG_DATA(scheduler-safe) to handle the case where_next/_stopruns on a different scheduler than_start.Tests
15 test cases in
process_SUITE, groupchunked_backtrace_bif.Checklist
git diff --checkwhitespace cleanotp_build check --no-dialyzer --no-docs --no-testspasses-doc sincetags:~"OTP @OTP-XXXXX@"— replaced automatically by OTP release tooling