fix(parser): recover from unexpected tokens in expression position#1709
fix(parser): recover from unexpected tokens in expression position#1709ghaith wants to merge 6 commits into
Conversation
Build Artifacts🐧 Linux
From workflow run 🪟 Windows
From workflow run |
mhasel
left a comment
There was a problem hiding this comment.
Note: this review was generated by Claude (via Claude Code) and posted on my behalf. Treat the suggestions as machine-generated input — verify before acting. — Michael
Overview
Fixes #1306: parse_atomic_leaf_expression's fallback diagnosed E007 but didn't advance the lexer, so binary-op parsers higher in the cascade re-consumed the bad token (e.g. & re-parsed as binary-AND with EmptyStatement LHS). The malformed AST then flowed into codegen, producing a confusing EmptyStatement type-hint error while compilation reported success. The fix advances past binary-only operators used as prefixes and narrows the foo(p := ) empty-parameter carve-out to only fire when the next token is actually ) or ,.
What I like
- The diagnosis is sharp. The PR body correctly identifies why the AST was malformed: silent token re-consumption, plus an over-broad empty-parameter carve-out. Both root causes are addressed.
- The carve-out tightening is the cleaner of the two changes. Adding
&& matches!(lexer.token, KeywordParensClose | KeywordComma)is a one-line invariant that prevents a recovery path from masking real errors. This is a strict improvement. - Diagnostic wording.
"Literal"→"expression"is more accurate. The 16 mechanical snapshot updates all preserve location/column markers and only change the wording, which is exactly the right blast radius. - Test coverage. Three new tests pin: the #1306 regression (
&gin a call arg), bare&yin an assignment, andMOD y(generality witness — proves the fix isn't&-specific). The retainedamp_as_and_testconfirmsb & cstill parses correctly. The combination is convincing. - Resolver-test migration to
REF(x)is the right call given that&xwas never first-class syntax — only working via the buggy path. The migrated tests still exercise the underlying pointer-arithmetic resolver logic.
Suggestions
- Recursion after
advance.return parse_atomic_leaf_expression(lexer)is a tail call that recurses for each consecutive bad operator. Pathological input like& & & gwould emit one diagnostic per&and recurse N times. The lexer is bounded so this won't OOM, but converting to a loop would (a) avoid burning stack on adversarial input and (b) make the bound visible:Optional — current form is fine in practice.while to_operator(&lexer.token).is_some() { lexer.advance(); } // then fall through to a single retry
- What counts as "binary-only" is implicit.
to_operator(&lexer.token).is_some()returns true for unary-eligible tokens too (-,+,NOT). Those shouldn't reach this fallback because the prefix-operator parser handles them earlier — but the comment frames this as recovery for "binary-only operators misused as a prefix," which doesn't quite match what the predicate actually does. A one-liner clarifying "any token that has an operator interpretation; unary-eligible operators normally don't reach here" would close the gap. - A negative test for the narrowed carve-out would help. The fix tightens
foo(p := )recovery to only fire for)or,. The new tests cover the case where the carve-out should not fire (foo(p := &g)now produces E007 instead of silent empty-param), but there's no positive test that the carve-out still works for the originalfoo(p := )shape. If one already exists, no action; otherwise consider adding one to anchor that the legitimate recovery path is preserved. - Behavior-change call-out. The PR body lists the user-visible change (
plc --checknow exits 1 on previously-silent input). This belongs in release notes too — projects with broken-but-compiling code will see a new build failure, and they should know it's the parser becoming honest, not a regression.
Risks
- Loud failure for previously-silent code. Any project that had
&x(or other binary-only-as-prefix) buried in code now getsE007and fails compilation. Strictly an improvement, but worth a heads-up in release notes. - The resolver-test migration is the most fragile part. Two tests changed semantics from "parse
&xand resolve through the buggy AST shape" to "parseREF(x)cleanly." The behavior being asserted (pointer arithmetic resolves to a pointer) is preserved, but if anyone has external lit/integration tests that relied on&xparsing, they'll break — please grep for&followed by an identifier intests/lit/andtests/integration/to confirm no other coverage was leaning on the dead path. to_operatorcoverage. Ifto_operatorreturnsNonefor any token that should be recovered (e.g., a punctuation-only token that intuitively reads as binary), the recovery won't fire and we fall back to leaving the token in the stream. That's the conservative direction (no false advance), so I don't think it's a real risk, just worth noting.
Verdict
LGTM. Well-scoped fix, good test coverage, appropriate snapshot churn, and the carve-out tightening is genuinely valuable on its own. The recursion-vs-loop point is a nit; the comment-clarification and release-notes points are small. Approve once the lit-test grep confirms no external &x usage.
…-out test Review followups on PR #1709: - Replace the tail-recursing recovery with a loop drain. On adversarial input like `& & & g` the previous form emitted one diagnostic per `&` and recursed N times; the loop drains the operator run and retries once, producing exactly one diagnostic. Guards against infinite retry on non-operator bad tokens via an `advanced` flag — the reviewer's bare-while suggestion would have looped forever on inputs like `END_VAR` directly. - Sharpen the recovery comment: `to_operator()` also matches unary-eligible operators (`-`, `+`, `NOT`), but those don't reach this fallback because the prefix-operator parser handles them earlier in the cascade. - Add `consecutive_bad_operators_emit_a_single_diagnostic` to lock the N→1 reduction. - Add `empty_parameter_assignment_carve_out_still_fires` to anchor the positive complement of the narrowed `foo(p := )` carve-out. Verified during followup that no `.st` file under tests/lit, tests/integration, or libs relied on the previously-tolerated `&<ident>` shape. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…1306) `parse_atomic_leaf_expression`'s fallback emitted an `E007` diagnostic but left the bad token in the stream, so binary-op parsers higher in the cascade re-consumed it (e.g. `&` taken as binary AND with an `EmptyStatement` LHS). The malformed AST flowed into codegen, producing a confusing "no type hint available for EmptyStatement" while compilation reported success. `prog(p := &g)` from #1306 was the salient case; any binary-only operator misused as a unary prefix had the same shape. Recovery now: - emits a clearer "expected expression" diagnostic (was "Literal"); - when the bad token is itself an operator, advances past it and retries the leaf so the operand the user wrote is captured (`&g` -> `g`); - otherwise leaves the token in place so outer parsers can use it for synchronization (e.g. `END_CASE`, `END_VAR`). The missing-parameter-assignment carve-out (`foo(p := )`) is narrowed to fire only when the next token is actually `)` or `,`; previously any unrecognised token after `:=` was silently treated as an empty parameter, masking real errors. Resolver tests that exercised pointer arithmetic via the dead-code `&x` path are migrated to `REF(x)`. Fixes #1306 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-out test Review followups on PR #1709: - Replace the tail-recursing recovery with a loop drain. On adversarial input like `& & & g` the previous form emitted one diagnostic per `&` and recursed N times; the loop drains the operator run and retries once, producing exactly one diagnostic. Guards against infinite retry on non-operator bad tokens via an `advanced` flag — the reviewer's bare-while suggestion would have looped forever on inputs like `END_VAR` directly. - Sharpen the recovery comment: `to_operator()` also matches unary-eligible operators (`-`, `+`, `NOT`), but those don't reach this fallback because the prefix-operator parser handles them earlier in the cascade. - Add `consecutive_bad_operators_emit_a_single_diagnostic` to lock the N→1 reduction. - Add `empty_parameter_assignment_carve_out_still_fires` to anchor the positive complement of the narrowed `foo(p := )` carve-out. Verified during followup that no `.st` file under tests/lit, tests/integration, or libs relied on the previously-tolerated `&<ident>` shape. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…test
Spell out that `p := )` and `p := ,` are explicitly allowed by the parser
(land as `Assignment { right: EmptyStatement }`, no diagnostic) and that
the test only locks in the carve-out — i.e. the new `parse_atomic_leaf_
expression` recovery must not consume the trailing `)` or `,`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
0bba529 to
4b3fd5c
Compare
Summary
parse_atomic_leaf_expression's fallback emitted anE007 unexpected_token_founddiagnostic but did not advance the lexer, leaving the bad token in the stream for binary-op parsers higher in the cascade to re-consume. Forprog(p := &g)from #1306 this meant the AST becameBinaryExpression { And, EmptyStatement, g }; the empty LHS reached codegen and produced a confusingno type hint available for EmptyStatementwhile compilation reported success. The same shape bit any other binary-only operator misused as a unary prefix (MOD x,OR x, …).Recovery now:
"expected expression"diagnostic (was"Literal");&g→g);END_CASE,END_VAR).The missing-parameter-assignment carve-out (
foo(p := )) is narrowed to fire only when the next token actually closes the parameter list ()or,); previously any unrecognised token after:=was silently treated as an empty parameter, masking real errors like the one in #1306.Resolver tests that exercised pointer arithmetic via the dead-code
&xpath are migrated toREF(x).Fixes #1306
Behaviour change for users
Before:
After:
Test plan
expressions_parser_tests.rscoveringprog(p := &g)(the Using the removed&- operator in a formal parameter assignment will cause an error during codegen #1306 regression), bare&y, andMOD y(generality witness).amp_as_and_test(binaryb & c) stays green."Literal"→"expression"wording. Locations and column markers are unchanged across all of them.qualified_reference_location_testno longer covers&aaa.bbb.cccsince that case was anchoring the buggyEmptyStatement AND exprshape.&xtoREF(x)(the&path was dead code that fell through the buggy parser recovery).cargo test --workspaceclean (2466/2466).cargo xtask litclean (347/347).cargo fmt --all+cargo clippy --workspace -- -Dwarningsclean.plc <file> --checkon the issue's input now exits 1 with E007; full compile aborts before codegen with the same diagnostic.🤖 Generated with Claude Code