Skip to content

erts: add chunked process_info backtrace BIFs#11095

Draft
lmth wants to merge 10 commits into
erlang:masterfrom
lmth:process-info-chunked-prototype
Draft

erts: add chunked process_info backtrace BIFs#11095
lmth wants to merge 10 commits into
erlang:masterfrom
lmth:process-info-chunked-prototype

Conversation

@lmth
Copy link
Copy Markdown
Contributor

@lmth lmth commented May 7, 2026

Summary

Add three new BIFs for reading a process backtrace incrementally in
fixed-size chunks, without materialising the full backtrace in memory
at once:

  • erlang:process_info_backtrace_start(Pid, ChunkSize)Handle
  • erlang:process_info_backtrace_next(Handle){{backtrace, Chunk}} | done
  • erlang:process_info_backtrace_stop(Handle)ok

Motivation

We are stuck trying to make it possible to dump out the backtraces of
certain erlang processes in production systems, and for those to be complete.

When our customers deploy our products, and they run into issues (often our
bugs or misarchitectures, in erlang code) where their erlang node grows
non-catastrophically, but nearing an OOM situation, we provide a built-in
"debug dump" functionality, which writes diagnostics on file.
Those diagnostics include information about allocators and process info etc.
This is similar to a crash dump but intended to be non-destructive.
Unfortunately, we have run into situations where our debug dump has tipped
the scales, due to the amount of memory required to run
process_info(Pid,backtrace) -> BigBinary
with a big process as target.

For the time being, we choose to skip processes over a certain stack depth
and/or heap size.

But, those processes are usually the most interesting processes given the
problem statement.

So, to make it possible for us to capture the backtraces of really big
processes, and without any truncation (unless decided by our debug dump
collector on a timeout basis) we would like to suggest a chunked API
for reading process backtraces.

With this feature, plus the new process_iterator APIs, users would be
able to reduce the memory footprint of their system diagnostics tools built
on top of process_info.

This is a fair amount of code in erts, and it is more code than the PR
author has managed to get the full grasp of, so look upon this PR more
as a request for functionality than a suggestion for inclusion in the
OTP code base as-is.

Implementation notes

  • A resumable ErtsPrintTermCursor term printer (erl_printf_term.c/h)
    yields after printing at most ChunkSize bytes of any single term, then
    resumes from the same position on the next call.
  • The session handle is a magic binary so the target process is automatically
    unblocked if the caller drops the handle without calling _stop.
  • The stepbuf allocator was changed from ERTS_ALC_T_TMP_DSBUF (scheduler-
    local) to ERTS_ALC_T_SIG_DATA (scheduler-safe) to handle the case where
    _next/_stop runs on a different scheduler than _start.

Tests

15 test cases in process_SUITE, group chunked_backtrace_bif.

Checklist

  • Commit subjects ≤ 72 chars, no trailing period
  • SPDX license headers on new files
  • git diff --check whitespace clean
  • otp_build check --no-dialyzer --no-docs --no-tests passes
  • -doc since tags: ~"OTP @OTP-XXXXX@" — replaced automatically by OTP release tooling
  • Full CI (dialyzer + docs) via GitHub Actions

lmth and others added 9 commits May 6, 2026 17:05
Adds three new BIFs to allow reading a process backtrace incrementally
(in fixed-size chunks) without ever materialising the full backtrace as
a single binary in memory:

  erlang:process_info_backtrace_start(Pid, Options)
      -> {ok, Handle, FirstChunk} | done | badarg

  erlang:process_info_backtrace_next(Handle)
      -> {more, Chunk} | done

  erlang:process_info_backtrace_stop(Handle)
      -> ok

Options: [{chunk_size, Bytes}]  (default: 1024)

Design highlights
-----------------
* The target process is suspended for the duration of the session so
  that the backtrace view is consistent across all chunks.
* An ERTS_MON_TYPE_SUSPEND monitor is created between the caller and
  the target; if the caller exits mid-session the demonitor signal path
  automatically resumes the target.
* Timers on the target are paused/resumed symmetrically with the
  suspend, so no timer fires are missed.
* Backtrace generation is fully lazy via a new erts_stack_dump_step()
  function that emits one logical unit (PC info or one stack element)
  per call. Peak C-heap pressure is bounded to chunk_size + one step's
  output — never the full backtrace.
* The lazy output is byte-for-byte identical to process_info(Pid,
  backtrace) (verified by T9 in the test suite).

Motivation: in memory-tight production deployments the existing
process_info(Pid, backtrace) BIF can require 2× the full backtrace
size in C-heap (dsbuf + refc binary simultaneously), making it
impractical to dump very large processes — exactly the ones most
interesting when diagnosing memory growth. The chunked interface
removes that constraint.

Files changed
-------------
  erts/emulator/beam/atom.names           new atom: process_info_backtrace_start
  erts/emulator/beam/bif.tab              BIF declarations
  erts/emulator/beam/erl_bif_info.c       BIF implementations + RPC callbacks
  erts/emulator/beam/erl_proc_sig_queue.c hook caller-exit cleanup into SUSPEND demonitor path
  erts/emulator/beam/erl_process.c        erts_stack_dump_step (lazy step function)
  erts/emulator/beam/erl_process.h        ErtsStackDumpCursor, ErtsBacktraceSession structs
  erts/preloaded/src/erlang.erl           BIF declarations + specs
  erts/preloaded/ebin/erlang.beam         recompiled

Test script: /tmp/full_test.escript (15 tests, all passing)

This is a prototype / proof-of-concept intended to accompany an EEP.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Tests that the step buffer in the chunked backtrace BIF survives the
target process migrating between schedulers between backtrace_start_cb
and backtrace_next_cb.

Before the fix (ERTS_ALC_T_TMP_DSBUF), allocating on scheduler X and
freeing on scheduler Y causes a NULL-pointer crash inside
unlink_carrier() because the carrier is on allctr[X]'s list, not
allctr[Y]'s.

The test engineers the migration by:
  - spawning each target bound to scheduler 1 (ensuring start_cb
    allocates on allctr[1]), then immediately unbinding it so it is
    eligible for migration;
  - flooding scheduler 1 with low-priority spinners so
    erts_check_emigration_need() moves unbound targets to other
    schedulers when next_cb signals arrive.

On unfixed code (TMP_DSBUF) this reliably crashes beam.smp via SIGSEGV.
On fixed code (SIG_DATA) all N*4 concurrent sessions complete normally.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The stepbuf (erts_dsprintf_buf_t) in ErtsBacktraceSession was allocated
using ERTS_ALC_T_TMP_DSBUF, which uses a thread-specific (thr_spec)
allocator.  This means allocations and frees are expected to happen on
the same scheduler thread.

However, the stepbuf is allocated in backtrace_start_cb (on whatever
scheduler picks up the RPC for the target process at start time) and
freed in backtrace_next_cb / backtrace_stop_cb (on whatever scheduler
picks up a subsequent RPC for the same target).  Because the target
process can migrate between schedulers while suspended, these two
callbacks may execute on different schedulers.

When the stepbuf backing carrier is freed on a different scheduler than
the one that allocated it, mbc_free() calls destroy_carrier() with the
current scheduler's allctr.  destroy_carrier() calls
unlink_carrier(&allctr->mbc_list, crr), but crr is in the HOME
scheduler's mbc_list, not the current one.  Since crr is not in the
current list, cl->first != crr, so the else branch executes
crr->prev->next = crr->next -- but crr->prev == NULL (it was the head
of its home list) -- causing the NULL-pointer segfault observed at
erl_alloc_util.c:1286.

Fix: replace ERTS_ALC_T_TMP_DSBUF with ERTS_ALC_T_SIG_DATA for the
stepbuf.  SIG_DATA uses a thread-safe allocator that is explicitly
designed for cross-scheduler signal data (the ErtsBacktraceSession
struct itself already uses SIG_DATA for the same reason).  Three local
helper functions (grow/create/destroy_backtrace_dsbuf) mirror the
tmp_dsbuf variants but use SIG_DATA throughout.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The chunked backtrace BIF (erts_stack_dump_step) previously called
stack_element_dump with precision=-1, which causes erts_printf_term to
use a default limit of 100,000 dcount units before emitting "... ".

For the non-chunked path (erts_stack_dump / process_info(Pid,backtrace))
this is a reasonable safety limit.  For the chunked path the output is
serialised incrementally via a dsbuf, so there is no single large
allocation risk.  Pass INT_MAX instead so the format handler uses
LONG_MAX as the depth counter (unlimited).

Verified against a production system with deeply nested gen_server State
values (FXS schema records with long URI atoms whose dcount exceeds 100k
due to atom-byte-length accounting in erts_printf_term):
  Old (100k limit): 400 "... " truncation markers in rendered backtrace
  New (unlimited):    0 truncation markers
  Wall time: 5.1 s  |  File: 4.1 MB  |  Render RSS: +234 MB
The session handle is now a magic binary reference instead of an opaque
{Ref, Pid} tuple.  When the caller lets the handle go out of scope and it
is garbage collected, the destructor fires automatically and resumes the
suspended target process — no explicit _stop call needed and no SUSPEND
monitor required.

Key changes:
- ErtsBacktraceSession is now embedded inside a magic binary
  (erts_create_magic_binary / ERTS_MAGIC_BIN_DATA), not separately
  allocated on ERTS_ALC_T_SIG_DATA.
- backtrace_session_destructor: CAS active 1→0, acquire target MAIN lock,
  free stepbuf, clear chunked_backtrace, resume timers + process.
- _start_2 now passes a lightweight BacktraceStartArgs (chunk_size only).
  backtrace_start_cb uses a stack-local session to fill the first chunk;
  only creates the magic binary when there is data to deliver.
- build_backtrace_reply uses erts_mk_magic_ref(&hp, &bp->off_heap, mbin)
  for the handle, sized with ERTS_MAGIC_REF_THING_SIZE + 4.
- _next_1 / _stop_1 validate with is_internal_magic_ref +
  ERTS_MAGIC_BIN_DESTRUCTOR type check; extract target_pid from ses.
- Removed: SUSPEND monitor, cleanup_origin_cb,
  backtrace_session_monitor_cleanup, erts_backtrace_session_cleanup.
- erl_proc_sig_queue.c: removed erts_backtrace_session_cleanup call.
- erlang.erl: opaque handle type is now reference().

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Tests that a dropped handle (going out of scope via tail call)
causes the GC destructor to fire and resume the suspended target
process.  On branches without the magic binary handle (otp-22 through
otp-27) the target remains suspended after GC, demonstrating that
the test would have caught the missing cleanup.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
All 15 chunked_backtrace_* CT tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
erts_factory_trim_and_close calls erts_resize_message_buffer
which may realloc the ErlHeapFragment to a new address.
The old local 'bp' pointer becomes stale after the trim.

Fix: read the (possibly relocated) fragment from hfact.heap_frags
after the trim, instead of using the original 'bp'.

Found via valgrind; previously manifested as invalid reads/writes
in erts_save_message_in_proc after the caller received the reply.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ErtsPrintTermCursor struct and its three API functions declared in
erl_printf_term.h; erl_process.h gains the term_cursor, in_term and
term_max_bytes fields in ErtsStackDumpCursor.

process_SUITE.erl gains 15 new chunked_backtrace_* test cases covering
basic multi-chunk round-trips, cross-scheduler, dropped handle, byte
equality against classic process_info/2, and edge cases.

erlang.erl BIF declaration updated for process_info_backtrace_start/2.

All 15 tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

CT Test Results

    4 files    199 suites   1h 53m 58s ⏱️
3 375 tests 2 902 ✅ 472 💤 1 ❌
4 275 runs  3 725 ✅ 549 💤 1 ❌

For more details on these failures, see this check.

Results for commit 5de641b.

♻️ This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

The magic binary destructor only freed stepbuf/cursor.wstack inside
the "if (target->chunked_backtrace == ses)" branch.  When the target
died first, the target lookup returned NULL so the branch was never
taken, leaking the buffers when the handle was eventually GC'd.

Move destroy_backtrace_session_buffers(ses) ahead of the target lookup
so the buffers are always freed once the destructor wins the CAS.  The
target-resume logic remains correctly gated on chunked_backtrace == ses.

Add test chunked_backtrace_target_exit: start a session, exit the
target, verify _next returns done, _stop returns ok, and the handle
can be GC'd cleanly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jhogberg
Copy link
Copy Markdown
Contributor

jhogberg commented May 8, 2026

Thanks for the PR, we'll look into it. 🙂

We are stuck trying to make it possible to dump out the backtraces of
certain erlang processes in production systems, and for those to be complete.

A backtrace is mostly just expensive when the stack is really deep, and in that case you want it to be complete because the most interesting things usually are in the beginning ("how did we enter this catastrophic stack growth?"). Do you think a "front-trace" giving the first N frames would be sufficient in most cases? It would be a much smaller change.

@lmth
Copy link
Copy Markdown
Contributor Author

lmth commented May 8, 2026

Thanks for the PR, we'll look into it. 🙂

That's highly appreciated. 🙏

We are stuck trying to make it possible to dump out the backtraces of
certain erlang processes in production systems, and for those to be complete.

A backtrace is mostly just expensive when the stack is really deep, and in that case you want it to be complete because the most interesting things usually are in the beginning ("how did we enter this catastrophic stack growth?"). Do you think a "front-trace" giving the first N frames would be sufficient in most cases? It would be a much smaller change.

We take what we get, of course. In the cases where we now have to skip processes from the dump, we would use that functionality instead.

Maybe a reentrant term printer could be useful elsewhere? If you implemented that in a more trustworthy/robust way than what is in this PR, then "the rest" is not so much. 😉

We have suffered on numerous occasions, that the important data is hidden "later" in very deep terms in the
backtraces, and then we have changed the records of our gen_... processes to put the "not expected to be very
interesting, but potentially taking a lot of text estate when printed" terms later in the process' state record.

This is a different situation from "if we print it we die", but it leads to the same problem - we don't always get to
see the interesting data, when we didn't guess (not a good word for describing how experience leads to better
design) a good layout in advance. So, the suggested capability is not just taken out of thin air, even though the
function names are too long. Sorry about those.

@jhogberg jhogberg added stalled waiting for input by the Erlang/OTP team team:VM Assigned to OTP team VM labels May 11, 2026
@jhogberg jhogberg self-assigned this May 11, 2026
@jhogberg
Copy link
Copy Markdown
Contributor

That's highly appreciated. 🙏

We've taken a glance, and it seems like we won't be able to take a deeper look until after summer, so it'll have to wait.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stalled waiting for input by the Erlang/OTP team team:VM Assigned to OTP team VM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants