Skip to content

e/matrices: fix SIGSEGV in coeffs_of_vec for inexact coefficient rings#4429

Open
d-torrance wants to merge 1 commit into
Macaulay2:developmentfrom
d-torrance:noetherian-operators
Open

e/matrices: fix SIGSEGV in coeffs_of_vec for inexact coefficient rings#4429
d-torrance wants to merge 1 commit into
Macaulay2:developmentfrom
d-torrance:noetherian-operators

Conversation

@d-torrance

Copy link
Copy Markdown
Member

In the last couple days, I started seeing the following segfault in the PPA builds (specifically, Ubuntu 22.04 amd64):

 -- capturing check(1, "NoetherianOperators")-- warning: experimental computation over inexact field begun
--          results not reliable (one warning given per session)
-- SIGSEGV
-* stack trace, pid: 72045
 0# profiler_stacktrace(std::ostream&, int) at ./M2/Macaulay2/d/main.cpp:131
 1# segv_handler at ./M2/Macaulay2/d/main.cpp:247
 2# 0x0000710C0B642520 in /lib/x86_64-linux-gnu/libc.so.6
 3# Matrix::coeffs(M2_arrayint_struct*, Matrix const*) const at matrices/matrix.cpp:1910
 4# interface_rawCoefficients at ./M2/Macaulay2/d/interface.dd:2540
 5# evaluate_evalraw at ./M2/Macaulay2/d/evaluate.d:1662
 6# evaluate_evalSequence at ./M2/Macaulay2/d/evaluate.d:474
 7# evaluate_applyFCCS at ./M2/Macaulay2/d/evaluate.d:832
 8# evaluate_evalraw at ./M2/Macaulay2/d/evaluate.d:1669
...

Full log: https://launchpadlibrarian.net/864313612/buildlog_ubuntu-jammy-amd64.macaulay2_1.26.05+git202606070040-0ppa202606080242~ubuntu22.04.1_BUILDING.txt.gz

I had Claude check it out -- here is its proposed fix, which seems quite reasonable!

AI Disclosure

This was all Claude. Here's its commit message:

coeffs_of_vec iterates over the Nterms of a polynomial and, for each term whose monomial appears in the exponent table, calls:

ring_elem t = P->make_flat_term(h.coeff, mon);
vec v = P->make_vec(val - 1, t);
v->next = result;          // <-- crashed here

make_flat_term returns ZERO_RINGELEM (a null Nterm pointer) when the coefficient is zero under K_->is_zero. make_vec in turn returns nullptr for a zero polynomial. The subsequent unconditional dereference of v is therefore a null pointer dereference, caught as SIGSEGV.

For exact coefficient rings (ZZ, QQ, finite fields) a polynomial never carries a zero-coefficient Nterm in practice, so the bug lay dormant. For inexact fields (RR, CC / MPFR-backed) it surfaces non-deterministically: the polyheap that accumulates products in Matrix multiplication removes cancelling terms through add_to's case-0 branch only when both sides present the same monomial simultaneously; in edge cases -- e.g. when mult_by_term is called with a scalar that is exactly 0.0 in MPFR, or when terms with a unique monomial survive from an intermediate polynomial that was multiplied by zero -- a zero-coefficient Nterm can reach the final polynomial without being cleaned up.

The symptom appeared in the NoetherianOperators test suite after new tests were added that call truncatedDual / eliminatingDual over CC[x,y]. Those functions build the matrix product E * H.BMcoefs (a CC[x,y] polynomial matrix times a numerical CC kernel). Floating-point sums of products can land on exactly 0.0+0.0i non-deterministically depending on arithmetic order and memory layout, producing the latent zero-coefficient Nterms that trigger this path.

Fix: guard the vec construction with a null check. A zero-coefficient term contributes nothing to the coefficient matrix, so skipping it is both safe and correct.

coeffs_of_vec iterates over the Nterms of a polynomial and, for each
term whose monomial appears in the exponent table, calls:

    ring_elem t = P->make_flat_term(h.coeff, mon);
    vec v = P->make_vec(val - 1, t);
    v->next = result;          // <-- crashed here

make_flat_term returns ZERO_RINGELEM (a null Nterm pointer) when the
coefficient is zero under K_->is_zero.  make_vec in turn returns nullptr
for a zero polynomial.  The subsequent unconditional dereference of v
is therefore a null pointer dereference, caught as SIGSEGV.

For exact coefficient rings (ZZ, QQ, finite fields) a polynomial never
carries a zero-coefficient Nterm in practice, so the bug lay dormant.
For inexact fields (RR, CC / MPFR-backed) it surfaces non-deterministically:
the polyheap that accumulates products in Matrix multiplication removes
cancelling terms through add_to's case-0 branch only when both sides
present the same monomial simultaneously; in edge cases -- e.g. when
mult_by_term is called with a scalar that is exactly 0.0 in MPFR, or
when terms with a unique monomial survive from an intermediate polynomial
that was multiplied by zero -- a zero-coefficient Nterm can reach the
final polynomial without being cleaned up.

The symptom appeared in the NoetherianOperators test suite after new
tests were added that call truncatedDual / eliminatingDual over CC[x,y].
Those functions build the matrix product E * H.BMcoefs (a CC[x,y]
polynomial matrix times a numerical CC kernel).  Floating-point sums of
products can land on exactly 0.0+0.0i non-deterministically depending on
arithmetic order and memory layout, producing the latent zero-coefficient
Nterms that trigger this path.

Fix: guard the vec construction with a null check.  A zero-coefficient
term contributes nothing to the coefficient matrix, so skipping it is
both safe and correct.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@d-torrance d-torrance added the Engine Macaulay2/e label Jun 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Engine Macaulay2/e

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant