Skip to content

[php] add Semgrep grammar augmentation (was empty)#583

Draft
brandonspark wants to merge 2 commits into
mainfrom
fix/php-pattern-augmentation
Draft

[php] add Semgrep grammar augmentation (was empty)#583
brandonspark wants to merge 2 commits into
mainfrom
fix/php-pattern-augmentation

Conversation

@brandonspark
Copy link
Copy Markdown
Contributor

@brandonspark brandonspark commented Apr 30, 2026

Summary

The PHP Semgrep grammar augmentation file at lang/semgrep-grammars/src/semgrep-php/grammar.js was a no-op (entire rules block commented out). PHP variables natively start with $ so simple $X-style metavariable patterns parsed as PHP variables, but every other Semgrep construct (ellipsis in non-variadic positions, $...ARGS, metavar-as-class/function/type/attribute name) failed.

This PR builds out the PHP augmentation, modeled on semgrep-java and semgrep-kotlin. Closes LANG-474, LANG-475.

Companion release PR: semgrep/semgrep-php#4

What's added

  • semgrep_ellipsis (...) wired into expression, statement, member-declaration (class/interface/trait), enum-body, formal-parameter, match-arm positions.
  • semgrep_deep_ellipsis (<... expr ...>) in expression position.
  • semgrep_variadic_metavariable ($...ARGS) in formal-parameter and argument positions.
  • semgrep_metavar_ident ($FOO in identifier position) wired into class/interface/trait/enum/function/method names, named types, base/interface clauses, and attribute names.

$$F lexer interaction

PHP's variable-variable form $$F keeps its native meaning (dynamic_variable_name($variable_name(F))) because semgrep_metavar_ident is only accepted in identifier positions, not variable positions. This means the LANG-475 "metavar property name with $$F" sub-case (class $C { public $T $$F = $V; }) is NOT addressed in this PR — $$F continues to be parsed as variable-variable. That edge case can be tackled in a follow-up if needed, since this PR already unblocks the much larger set of patterns covered in LANG-474's triage. A regression test verifies $$F still parses correctly.

Test plan

  • make build && make test in lang/semgrep-grammars/src/semgrep-php/ (all 105 tests pass: 81 inherited + 24 new semgrep tests, including a regression test that $$F still parses as variable-variable).
  • Companion PR in semgrep/semgrep-php to release the regenerated parser: [php] add Semgrep grammar augmentation (was empty) semgrep-php#4.

🤖 Generated with Claude Code

brandonspark and others added 2 commits April 29, 2026 17:34
Wire up `semgrep_ellipsis`, `semgrep_deep_ellipsis`,
`semgrep_variadic_metavariable` (`$...ARGS`), and `semgrep_metavar_ident`
into the previously-empty semgrep-php grammar so PHP patterns can use
metavariables in non-variable positions and ellipses in any expression,
statement, member, parameter, argument, and match-arm position.

PHP's native `$FOO` parses as a `variable_name`, so we only added a
metavariable token for *identifier* positions (class/interface/trait/
enum/function/method names, type references, base/interface clauses,
and attribute names). The variable-variable form `$$F` keeps its native
meaning since `semgrep_metavar_ident` is never accepted in variable
positions.

Closes LANG-474, LANG-475.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant