Sub-quadratic bignum integer-to-string#11074
Open
ericmj wants to merge 2 commits into
Open
Conversation
Render-side `integer_to_list/1` and `integer_to_binary/1` of large bignums were O(N²) in the number of decimal digits, exposing a DoS surface for any library that converts attacker-controlled bignums to strings (cf. the Elixir `decimal` advisory GHSA-rhv4-8758-jx7v). This rewrites the bignum render path with three layered optimizations. ## Algorithms 1. **Divide-and-conquer wrapper around `write_big`.** Splits an N-digit bignum at `base^(N/2)` via a single bignum divmod, recurses on each half, and writes the halves into adjacent positions in the output buffer (low half zero-padded). Below WRITE_BIG_DC_THRESHOLD (250 decimal digits, picked from a sweep) the original schoolbook single-digit-extraction loop is used. A power-of-base cache is built once per top-level call and shared across the recursion. 2. **Burnikel-Ziegler recursive division.** The D&C wrapper above is bounded by the cost of bignum divmod. With OTP's existing Knuth-D I_div (O(xl·yl)), the wrapper stays O(N²) — only a constant-factor improvement. BZ replaces I_div for divisors above BZ_DIV_THRESHOLD=8 ErtsDigits, giving O(M(n)·log n) per divmod (sub-quadratic with the existing Karatsuba multiplication). Wired through I_div_dispatch so big_div_rem, big_div, and the render D&C all benefit. One subtle dependency: I_mul_karatsuba assumes normalized inputs (single zero digit, or top digit non-zero), which BZ's recursive Q sometimes isn't — bz_div_3n_2n trims Q and B2 before each multiplication. 3. **Barrett reciprocals for the power cache.** The render D&C divides by the same cached `base^k` divisors many times across the recursion tree. Pre-computing `mu_i = floor(beta^(2*sizes[i] + 1) / vals[i])` once per cache level (skipped for levels under BARRETT_LEVEL_THRESHOLD=100 ErtsDigits where mu-build cost wouldn't pay back) lets each divmod become a multiplication plus a small correction loop — saves a log-N factor per call vs BZ. The +1 ErtsDigit of precision in mu keeps the correction within 2 iterations even though `base^k` isn't normalized (top bit set). References: Brent & Zimmermann *Modern Computer Arithmetic*; Burnikel & Ziegler MPI-I-98-1-022 (1998); CPython _pylong; V8 src/bigint. ## Benchmarks (single-process min µs, macOS aarch64, 64-bit ErtsDigit) `integer_to_list/1` and `integer_to_binary/1`: Decimal digits | Baseline (µs) | Final (µs) | Speedup ---------------|----------------|------------|-------- 100 | 0 | 0 | — 1 000 | 12 | 11 | 1.1× 10 000 | 1 923 | 512 | 3.8× 100 000 | 204 451 | 17 742 | 11.5× 300 000 | 1 934 852 | 131 916 | 14.7× 1 000 000 | 21 989 158 | 571 683 | 38.5× Asymptotic class fit (100k → 1M, 10× input): Baseline: 107.5× → slope 2.03 → O(N²) Final: 32.7× → slope 1.51 → sub-quadratic, near M(N) bound `integer_to_list` of a 1M-digit bignum: 22.0 s → 0.57 s. Parse-side `binary_to_integer/1` and `list_to_integer/1` are unchanged; the existing Erlang-level segmentize+pairwise-combine in big_binary_to_int was already Karatsuba-bound (slope 1.55 across the full curve) and a C port did not move the numbers. ## Tests - All existing OTP `num_bif_SUITE` and `big_SUITE` cases pass on the opt and asan builds. - The patch was developed iteratively under AddressSanitizer; the BZ recursion's I_mul_karatsuba normalization assumption was found by ASan via a heap-buffer-overflow read in I_mul_karatsuba's I_sub (big.c:991) caused by Q with leading-zero cells. ## Review follow-ups Follow-ups from review of the sub-quadratic integer-to-string rewrite: * Document I_mul_karatsuba's normalized-input precondition and assert it at entry, so future callers that pass zero-padded digits fail loudly rather than reading one cell past the internal scratch. * Size barrett_divmod's prod buffer with BARRETT_MAX_CORRECTIONS slack to eliminate a latent off-by-one if the correction loop's D_add ever carries into a new cell, and assert the bound before each carry. * Lift BARRETT_LEVEL_THRESHOLD next to the other tuning knobs, and drop the unused BZ_DEBUG/BZ_TRACE macros and orig_x_padded variable. * Add t_integer_to_string_large, round-tripping integers across each threshold boundary (WRITE_BIG_DC_THRESHOLD, BARRETT_LEVEL_THRESHOLD, deep cache levels) in bases 2/8/10/16/36, including negatives and a power-of-10 to exercise the high-half-zero split branch.
Contributor
CT Test Results 3 files 136 suites 50m 49s ⏱️ Results for commit c74f81f. ♻️ This comment has been updated with latest results. To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass. See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally. Artifacts// Erlang/OTP Github Action Bot |
- big.c: move declarations above early return in barrett_divmod to satisfy -Werror=declaration-after-statement; drop unused 'p' in write_big_dc_top - big.c, num_bif_SUITE.erl: convert tabs to spaces on new lines - license-header.es: allowlist big_SUITE_data/karatsuba.dat
Contributor
|
Thanks! Looks good to me after a very quick glance. It is too late to include in Erlang/OTP 29.0, so we will aim to include it in a patch release for OTP 29. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Render-side
integer_to_list/1andinteger_to_binary/1of large bignums were O(N²) in the number of decimal digits, exposing a DoS surface for any library that converts attacker-controlled bignums to strings. This rewrites the bignum render path with three layered optimizations.Algorithms
Divide-and-conquer wrapper around
write_big. Splits an N-digit bignum atbase^(N/2)via a single bignum divmod, recurses on each half, and writes the halves into adjacent positions in the output buffer (low half zero-padded). Below WRITE_BIG_DC_THRESHOLD (250 decimal digits, picked from a sweep) the original schoolbook single-digit-extraction loop is used. A power-of-base cache is built once per top-level call and shared across the recursion.Burnikel-Ziegler recursive division. The D&C wrapper above is bounded by the cost of bignum divmod. With OTP's existing Knuth-D I_div (O(xl·yl)), the wrapper stays O(N²) — only a constant-factor improvement. BZ replaces I_div for divisors above BZ_DIV_THRESHOLD=8 ErtsDigits, giving O(M(n)·log n) per divmod (sub-quadratic with the existing Karatsuba multiplication). Wired through I_div_dispatch so big_div_rem, big_div, and the render D&C all benefit. One subtle dependency: I_mul_karatsuba assumes normalized inputs (single zero digit, or top digit non-zero), which BZ's recursive Q sometimes isn't — bz_div_3n_2n trims Q and B2 before each multiplication.
Barrett reciprocals for the power cache. The render D&C divides by the same cached
base^kdivisors many times across the recursion tree. Pre-computingmu_i = floor(beta^(2*sizes[i] + 1) / vals[i])once per cache level (skipped for levels under BARRETT_LEVEL_THRESHOLD=100 ErtsDigits where mu-build cost wouldn't pay back) lets each divmod become a multiplication plus a small correction loop — saves a log-N factor per call vs BZ. The +1 ErtsDigit of precision in mu keeps the correction within 2 iterations even thoughbase^kisn't normalized (top bit set).References: Brent & Zimmermann Modern Computer Arithmetic; Burnikel & Ziegler MPI-I-98-1-022 (1998); CPython _pylong; V8 src/bigint.
Benchmarks (single-process min µs, macOS aarch64, 64-bit ErtsDigit)
integer_to_list/1andinteger_to_binary/1:Asymptotic class fit (100k → 1M, 10× input):
Baseline: 107.5× → slope 2.03 → O(N²)
Final: 32.7× → slope 1.51 → sub-quadratic, near M(N) bound
integer_to_listof a 1M-digit bignum: 22.0 s → 0.57 s.Parse-side
binary_to_integer/1andlist_to_integer/1are unchanged; the existing Erlang-level segmentize+pairwise-combine in big_binary_to_int was already Karatsuba-bound (slope 1.55 across the full curve) and a C port did not move the numbers.