Skip to content

Sub-quadratic bignum integer-to-string#11074

Open
ericmj wants to merge 2 commits into
erlang:masterfrom
ericmj:ericmj/bignum-render-dc-bz-barrett
Open

Sub-quadratic bignum integer-to-string#11074
ericmj wants to merge 2 commits into
erlang:masterfrom
ericmj:ericmj/bignum-render-dc-bz-barrett

Conversation

@ericmj
Copy link
Copy Markdown
Contributor

@ericmj ericmj commented Apr 29, 2026

Render-side integer_to_list/1 and integer_to_binary/1 of large bignums were O(N²) in the number of decimal digits, exposing a DoS surface for any library that converts attacker-controlled bignums to strings. This rewrites the bignum render path with three layered optimizations.

Algorithms

  1. Divide-and-conquer wrapper around write_big. Splits an N-digit bignum at base^(N/2) via a single bignum divmod, recurses on each half, and writes the halves into adjacent positions in the output buffer (low half zero-padded). Below WRITE_BIG_DC_THRESHOLD (250 decimal digits, picked from a sweep) the original schoolbook single-digit-extraction loop is used. A power-of-base cache is built once per top-level call and shared across the recursion.

  2. Burnikel-Ziegler recursive division. The D&C wrapper above is bounded by the cost of bignum divmod. With OTP's existing Knuth-D I_div (O(xl·yl)), the wrapper stays O(N²) — only a constant-factor improvement. BZ replaces I_div for divisors above BZ_DIV_THRESHOLD=8 ErtsDigits, giving O(M(n)·log n) per divmod (sub-quadratic with the existing Karatsuba multiplication). Wired through I_div_dispatch so big_div_rem, big_div, and the render D&C all benefit. One subtle dependency: I_mul_karatsuba assumes normalized inputs (single zero digit, or top digit non-zero), which BZ's recursive Q sometimes isn't — bz_div_3n_2n trims Q and B2 before each multiplication.

  3. Barrett reciprocals for the power cache. The render D&C divides by the same cached base^k divisors many times across the recursion tree. Pre-computing mu_i = floor(beta^(2*sizes[i] + 1) / vals[i]) once per cache level (skipped for levels under BARRETT_LEVEL_THRESHOLD=100 ErtsDigits where mu-build cost wouldn't pay back) lets each divmod become a multiplication plus a small correction loop — saves a log-N factor per call vs BZ. The +1 ErtsDigit of precision in mu keeps the correction within 2 iterations even though base^k isn't normalized (top bit set).

References: Brent & Zimmermann Modern Computer Arithmetic; Burnikel & Ziegler MPI-I-98-1-022 (1998); CPython _pylong; V8 src/bigint.

Benchmarks (single-process min µs, macOS aarch64, 64-bit ErtsDigit)

integer_to_list/1 and integer_to_binary/1:

Decimal digits Baseline (µs) Final (µs) Speedup
100 0 0
1 000 12 11 1.1×
10 000 1 923 512 3.8×
100 000 204 451 17 742 11.5×
300 000 1 934 852 131 916 14.7×
1 000 000 21 989 158 571 683 38.5×

Asymptotic class fit (100k → 1M, 10× input):
Baseline: 107.5× → slope 2.03 → O(N²)
Final: 32.7× → slope 1.51 → sub-quadratic, near M(N) bound

integer_to_list of a 1M-digit bignum: 22.0 s → 0.57 s.

Parse-side binary_to_integer/1 and list_to_integer/1 are unchanged; the existing Erlang-level segmentize+pairwise-combine in big_binary_to_int was already Karatsuba-bound (slope 1.55 across the full curve) and a C port did not move the numbers.

Render-side `integer_to_list/1` and `integer_to_binary/1` of large bignums
were O(N²) in the number of decimal digits, exposing a DoS surface for any
library that converts attacker-controlled bignums to strings (cf. the
Elixir `decimal` advisory GHSA-rhv4-8758-jx7v). This rewrites the bignum
render path with three layered optimizations.

## Algorithms

1. **Divide-and-conquer wrapper around `write_big`.** Splits an N-digit
   bignum at `base^(N/2)` via a single bignum divmod, recurses on each
   half, and writes the halves into adjacent positions in the output
   buffer (low half zero-padded). Below WRITE_BIG_DC_THRESHOLD (250
   decimal digits, picked from a sweep) the original schoolbook
   single-digit-extraction loop is used. A power-of-base cache is
   built once per top-level call and shared across the recursion.

2. **Burnikel-Ziegler recursive division.** The D&C wrapper above is
   bounded by the cost of bignum divmod. With OTP's existing Knuth-D
   I_div (O(xl·yl)), the wrapper stays O(N²) — only a constant-factor
   improvement. BZ replaces I_div for divisors above
   BZ_DIV_THRESHOLD=8 ErtsDigits, giving O(M(n)·log n) per divmod
   (sub-quadratic with the existing Karatsuba multiplication). Wired
   through I_div_dispatch so big_div_rem, big_div, and the render
   D&C all benefit. One subtle dependency: I_mul_karatsuba assumes
   normalized inputs (single zero digit, or top digit non-zero), which
   BZ's recursive Q
   sometimes isn't — bz_div_3n_2n trims Q and B2 before each
   multiplication.

3. **Barrett reciprocals for the power cache.** The render D&C
   divides by the same cached `base^k` divisors many times across
   the recursion tree. Pre-computing
   `mu_i = floor(beta^(2*sizes[i] + 1) / vals[i])` once per cache
   level (skipped for levels under BARRETT_LEVEL_THRESHOLD=100
   ErtsDigits where mu-build cost wouldn't pay back) lets each
   divmod become a multiplication plus a small correction loop —
   saves a log-N factor per call vs BZ. The +1 ErtsDigit of
   precision in mu keeps the correction within 2 iterations even
   though `base^k` isn't normalized (top bit set).

References: Brent & Zimmermann *Modern Computer Arithmetic*; Burnikel
& Ziegler MPI-I-98-1-022 (1998); CPython _pylong; V8 src/bigint.

## Benchmarks (single-process min µs, macOS aarch64, 64-bit ErtsDigit)

`integer_to_list/1` and `integer_to_binary/1`:

  Decimal digits | Baseline (µs)  | Final (µs) | Speedup
  ---------------|----------------|------------|--------
  100            | 0              | 0          | —
  1 000          | 12             | 11         | 1.1×
  10 000         | 1 923          | 512        | 3.8×
  100 000        | 204 451        | 17 742     | 11.5×
  300 000        | 1 934 852      | 131 916    | 14.7×
  1 000 000      | 21 989 158     | 571 683    | 38.5×

  Asymptotic class fit (100k → 1M, 10× input):
    Baseline: 107.5× → slope 2.03 → O(N²)
    Final:     32.7× → slope 1.51 → sub-quadratic, near M(N) bound

`integer_to_list` of a 1M-digit bignum: 22.0 s → 0.57 s.

Parse-side `binary_to_integer/1` and `list_to_integer/1` are unchanged;
the existing Erlang-level segmentize+pairwise-combine in big_binary_to_int
was already Karatsuba-bound (slope 1.55 across the full curve) and a C
port did not move the numbers.

## Tests

- All existing OTP `num_bif_SUITE` and `big_SUITE` cases pass on the
  opt and asan builds.
- The patch was developed iteratively under AddressSanitizer; the BZ
  recursion's I_mul_karatsuba normalization assumption was found by
  ASan via a heap-buffer-overflow read in I_mul_karatsuba's I_sub
  (big.c:991) caused by Q with leading-zero cells.

## Review follow-ups

Follow-ups from review of the sub-quadratic integer-to-string rewrite:

* Document I_mul_karatsuba's normalized-input precondition and assert it
  at entry, so future callers that pass zero-padded digits fail loudly
  rather than reading one cell past the internal scratch.
* Size barrett_divmod's prod buffer with BARRETT_MAX_CORRECTIONS slack
  to eliminate a latent off-by-one if the correction loop's D_add ever
  carries into a new cell, and assert the bound before each carry.
* Lift BARRETT_LEVEL_THRESHOLD next to the other tuning knobs, and drop
  the unused BZ_DEBUG/BZ_TRACE macros and orig_x_padded variable.
* Add t_integer_to_string_large, round-tripping integers across each
  threshold boundary (WRITE_BIG_DC_THRESHOLD, BARRETT_LEVEL_THRESHOLD,
  deep cache levels) in bases 2/8/10/16/36, including negatives and a
  power-of-10 to exercise the high-half-zero split branch.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 29, 2026

CT Test Results

    3 files    136 suites   50m 49s ⏱️
1 677 tests 1 620 ✅ 57 💤 0 ❌
2 319 runs  2 244 ✅ 75 💤 0 ❌

Results for commit c74f81f.

♻️ This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

- big.c: move declarations above early return in barrett_divmod
  to satisfy -Werror=declaration-after-statement; drop unused 'p'
  in write_big_dc_top
- big.c, num_bif_SUITE.erl: convert tabs to spaces on new lines
- license-header.es: allowlist big_SUITE_data/karatsuba.dat
@bjorng bjorng self-assigned this Apr 29, 2026
@bjorng bjorng added the team:VM Assigned to OTP team VM label Apr 29, 2026
@bjorng
Copy link
Copy Markdown
Contributor

bjorng commented Apr 29, 2026

Thanks! Looks good to me after a very quick glance. It is too late to include in Erlang/OTP 29.0, so we will aim to include it in a patch release for OTP 29.

@ericmj ericmj marked this pull request as ready for review April 29, 2026 15:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

team:VM Assigned to OTP team VM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants