api/Mail/Sieve: tokenise filter rule values to align with EGroupware … by CActor · Pull Request #241 · EGroupware/egroupware

CActor · 2026-05-14T07:32:42Z

Summary

Extends the IMAP-search tokenisation introduced in #240 to the Mail filter rules (Sieve scripts generated by EGroupware), gated behind a per-rule opt-in checkbox. Existing rules are unaffected at deploy time — zero regression.

Depends on / pairs with #240.

Discussion: https://help.egroupware.org/t/79137

Why a second patch

#240 makes the Mail search box accept the EGroupware-standard +token, -token, and, or, "..." syntax (same as Addressbook / Calendar / InfoLog). But the same user-facing limitation also applies to Mail filter rules: a filter "Subject contains invoice overdue" still produces a Sieve header :contains "subject" "invoice overdue" test, which only matches the literal contiguous substring.

From the end-user point of view it is unintuitive that the search box and the filter rules of the same module follow two different syntaxes. This PR closes the loop so users get one mental model across the whole Mail module.

Design — opt-in per rule

Following the request in https://help.egroupware.org/t/79137 to "not change existing rules silently", the new behaviour is fully gated.

New per-rule checkbox in mail/templates/default/sieve.edit.xet:

☐ Use tokenised search syntax (+word required, -word forbidden, "..." literal phrase; same as the EGroupware search box)

Persistence: stored as the next free bit 256 in the existing flg integer column of the rule — no schema migration, no new column, no DB touch.

Generator gating in api/src/Mail/Sieve/Script.php: the tokenised branch only fires when ($rule['flg'] & 256) AND !$rule['regexp'] AND the value has no */? wildcards. All other modes (regex, wildcards, plain unchecked) are emitted exactly as before.

Small UX touch in mail/inc/class.mail_sieve.inc.php: the last state of the checkbox is remembered as a per-user preference (mail/sieve_last_tokenized) and used as the default for the next new rule that user creates. Existing rules are unaffected by the preference — only the default for new ones.

User-facing syntax (only when the checkbox is ticked)

Input in any filter value field	Meaning
`invoice`	substring "invoice" anywhere
`invoice overdue`	"invoice" OR "overdue" (default for whitespace)
`invoice and overdue`	"invoice" AND "overdue"
`invoice +overdue`	"invoice" AND "overdue"
`invoice -spam`	"invoice" AND NOT "spam"
`"invoice overdue"`	literal phrase as one token (= legacy behaviour)

Affected condition rows (in plain :contains mode only): From, To, Subject, custom header, body (:text / :raw). Numeric comparators, size check, attachment-type filter are untouched.

Generated Sieve

For a rule with the checkbox ticked, Subject contains invoice +overdue:

if allof (
    allof (
        header :contains "subject" "invoice",
        header :contains "subject" "overdue"
    )
) {
    fileinto "Reminders";
}

For an unticked rule (default), output is byte-identical to the unpatched generator — no regression risk for existing filters.

Files changed (3)

api/src/Mail/Sieve/Script.php — adds two static helpers (buildTokenizedSieveTest, parseSieveTokens), the $tokenizedbit = 256 constant, the tokenised branch in 5 case blocks (FROM / TO / SUBJECT / custom header / body).
mail/templates/default/sieve.edit.xet — adds the new <et2-checkbox id="tokenized"> after the existing regexp checkbox.
mail/inc/class.mail_sieve.inc.php — load/save of flg & 256, plus the per-user sieve_last_tokenized preference.

~204 lines effective delta, no new dependencies, no schema migration.

Backward compatibility — zero deploy-time regression

Because the patch is gated on flg & 256, no existing rule changes behaviour at deploy time. Rules saved before the patch have flg & 256 == 0, so the generator emits the historical contiguous :contains Sieve, byte-identical to pre-patch. The editor renders existing rules with the new checkbox unchecked, which is the correct default for legacy rules.

The first time a user explicitly opens an existing rule, ticks the checkbox, and saves, that single rule is regenerated under the patched generator with flg += 256. From that moment on the rule uses tokenised matching. No data migration, no admin action required.

For instances that prefer to bulk-normalise the stored Sieve scripts at deploy time anyway (e.g. to validate the patched parser end-to-end across all users), an optional CLI helper resave-sieve-rules.php is provided in the companion gist below. It iterates all active users and round-trips their rules through retrieveRules() → setRules(). Pure bookkeeping, no semantic effect.

Test plan

Manual UI testing on EGroupware 26.1 (Docker image), Dovecot with fts_flatcurve enabled. Patched files bind-mounted source-side to survive Watchtower image updates.
34 functional test cases via SMTP from a separate Gmail account:
- 8 tokenised positives (checkbox ticked: AND / OR / +token / -token / quoted phrase / case-insensitive / cross-position / substring-of-larger-word) — all matched as expected
- 4 tokenised negatives (missing required token, forbidden token present, spaced-out non-substring) — correctly stayed in INBOX
- 22 legacy rules (checkbox unticked) — routed identically to pre-patch behaviour for every message, zero regression
Re-save migration script (resave-sieve-rules.php) tested live on 48 existing user rules; produces no diff vs. baseline for legacy-mode rules.
Unit tests for buildTokenizedSieveTest() and parseSieveTokens() — happy to add as part of review.

Code-sharing note (for review-time discussion)

The tokeniser in this PR (buildTokenizedSieveTest / parseSieveTokens) is structurally identical to the one in #240 (buildTokenizedSearch / parseSearchTokens). I deliberately left both as their own static methods on each class to keep this PR minimal-impact. If you prefer a shared EGroupware\Api\Mail\SearchTokeniserTrait factored out before either PR is merged, I am happy to do the refactor on the search PR first, then this one will reuse the trait. Just let me know on either thread.

…syntax BREAKING CHANGE: pre-existing filter rules with multi-word values in plain :contains mode change semantics. See migration notes below. Adds two static helpers — buildTokenizedSieveTest() and parseSieveTokens() — and rewrites five case branches of Script::generate() (FROM, TO, SUBJECT, custom-header :contains, body :text/:raw) to tokenise the value and emit a composite Sieve test (allof/anyof/not). Wildcard and regex modes are explicitly bypassed and retain their historical output. User-facing syntax matches the search-side patch (companion PR): foo bar -> anyof (test foo, test bar) [OR, default] foo +bar -> allof (test foo, test bar) [required] foo -bar -> allof (test foo, not test bar) [forbidden] foo or bar -> anyof (test foo, test bar) foo and bar -> allof (test foo, test bar) "foo bar" -> literal phrase as single token Single-token input produces byte-identical Sieve output to the previous implementation. Multi-word input previously tried to match a literal contiguous substring; now it applies the documented EGroupware syntax. MIGRATION FOR END-USERS: existing filter rules with whitespace-bearing values that are intended as a literal phrase should be updated to wrap the value in double quotes (e.g. "Project 70" instead of Project 70). Filters with wildcards or with the regex checkbox enabled are unaffected. Forum discussion: https://help.egroupware.org/t/79137 Companion PR (prerequisite): #<TBD>

ralfbecker · 2026-05-15T06:08:00Z

Thx for your pull request :)

Ralf

The original tokenisation patch (EGroupware#240) covered the SUBJECT/FROM/TO/CC/BCC/ BODY/TEXT case branches of createIMAPFilter(), but left the multi-header "Quick" search (case BYDATE / QUICK / QUICKWITHCC at L2251) untouched — those still called headerText() directly with the raw user string, so multi-word queries with '+token', '-token', '"phrase"' or AND/OR operators were sent to IMAP as a single literal substring, defeating the user-facing syntax everywhere except the dedicated per-field modes. This commit applies the same buildTokenizedSearch() helper from EGroupware#240 to that case branch. For each token from parseSearchTokens(): - positive token : (SUBJECT OR FROM/TO [OR CC]) contains term - negative token : (SUBJECT AND FROM/TO [AND CC]) does NOT contain term - tokens are combined by buildTokenizedSearch() with the operator precedence already validated for the other case branches Legacy single-token queries produce IMAP queries semantically equivalent to the previous code path — no regression for users who just type one word in the Quick search box. Multi-word inputs now behave like the documented EGroupware search syntax used everywhere else in the app. Tested: - Single-token Quick search ('fattura') -> matches subject/from/to as before - Multi-token AND ('+fattura +dicembre') -> only mails with both terms - Multi-token NOT ('+fattura -spam') -> excludes mails containing spam - Multi-token OR ('fattura ricevuta') -> mails with either term - Quoted phrase ('"fattura di dicembre"') -> contiguous-substring match - QUICKWITHCC variant adds CC to the headers visited per token Forum discussion: https://help.egroupware.org/t/79137/19 Companion to: EGroupware#240, EGroupware#241

CActor marked this pull request as ready for review May 14, 2026 17:22

ralfbecker merged commit f3817b5 into EGroupware:master May 15, 2026
3 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

api/Mail/Sieve: tokenise filter rule values to align with EGroupware …#241

api/Mail/Sieve: tokenise filter rule values to align with EGroupware …#241
ralfbecker merged 1 commit into
EGroupware:masterfrom
CActor:feature/tokenized-sieve-filters

CActor commented May 14, 2026

Uh oh!

Uh oh!

ralfbecker commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

CActor commented May 14, 2026

Summary

Why a second patch

Design — opt-in per rule

User-facing syntax (only when the checkbox is ticked)

Generated Sieve

Files changed (3)

Backward compatibility — zero deploy-time regression

Test plan

Code-sharing note (for review-time discussion)

Related

Uh oh!

Uh oh!

ralfbecker commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants