api/Mail/Sieve: tokenise filter rule values to align with EGroupware …#241
Merged
ralfbecker merged 1 commit intoMay 15, 2026
Merged
Conversation
…syntax BREAKING CHANGE: pre-existing filter rules with multi-word values in plain :contains mode change semantics. See migration notes below. Adds two static helpers — buildTokenizedSieveTest() and parseSieveTokens() — and rewrites five case branches of Script::generate() (FROM, TO, SUBJECT, custom-header :contains, body :text/:raw) to tokenise the value and emit a composite Sieve test (allof/anyof/not). Wildcard and regex modes are explicitly bypassed and retain their historical output. User-facing syntax matches the search-side patch (companion PR): foo bar -> anyof (test foo, test bar) [OR, default] foo +bar -> allof (test foo, test bar) [required] foo -bar -> allof (test foo, not test bar) [forbidden] foo or bar -> anyof (test foo, test bar) foo and bar -> allof (test foo, test bar) "foo bar" -> literal phrase as single token Single-token input produces byte-identical Sieve output to the previous implementation. Multi-word input previously tried to match a literal contiguous substring; now it applies the documented EGroupware syntax. MIGRATION FOR END-USERS: existing filter rules with whitespace-bearing values that are intended as a literal phrase should be updated to wrap the value in double quotes (e.g. "Project 70" instead of Project 70). Filters with wildcards or with the regex checkbox enabled are unaffected. Forum discussion: https://help.egroupware.org/t/79137 Companion PR (prerequisite): #<TBD>
Member
|
Thx for your pull request :) Ralf |
CActor
added a commit
to CActor/egroupware
that referenced
this pull request
May 15, 2026
The original tokenisation patch (EGroupware#240) covered the SUBJECT/FROM/TO/CC/BCC/ BODY/TEXT case branches of createIMAPFilter(), but left the multi-header "Quick" search (case BYDATE / QUICK / QUICKWITHCC at L2251) untouched — those still called headerText() directly with the raw user string, so multi-word queries with '+token', '-token', '"phrase"' or AND/OR operators were sent to IMAP as a single literal substring, defeating the user-facing syntax everywhere except the dedicated per-field modes. This commit applies the same buildTokenizedSearch() helper from EGroupware#240 to that case branch. For each token from parseSearchTokens(): - positive token : (SUBJECT OR FROM/TO [OR CC]) contains term - negative token : (SUBJECT AND FROM/TO [AND CC]) does NOT contain term - tokens are combined by buildTokenizedSearch() with the operator precedence already validated for the other case branches Legacy single-token queries produce IMAP queries semantically equivalent to the previous code path — no regression for users who just type one word in the Quick search box. Multi-word inputs now behave like the documented EGroupware search syntax used everywhere else in the app. Tested: - Single-token Quick search ('fattura') -> matches subject/from/to as before - Multi-token AND ('+fattura +dicembre') -> only mails with both terms - Multi-token NOT ('+fattura -spam') -> excludes mails containing spam - Multi-token OR ('fattura ricevuta') -> mails with either term - Quoted phrase ('"fattura di dicembre"') -> contiguous-substring match - QUICKWITHCC variant adds CC to the headers visited per token Forum discussion: https://help.egroupware.org/t/79137/19 Companion to: EGroupware#240, EGroupware#241
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Extends the IMAP-search tokenisation introduced in #240 to the Mail filter rules (Sieve scripts generated by EGroupware), gated behind a per-rule opt-in checkbox. Existing rules are unaffected at deploy time — zero regression.
Depends on / pairs with #240.
Discussion: https://help.egroupware.org/t/79137
Why a second patch
#240 makes the Mail search box accept the EGroupware-standard
+token,-token,and,or,"..."syntax (same as Addressbook / Calendar / InfoLog). But the same user-facing limitation also applies to Mail filter rules: a filter "Subject contains invoice overdue" still produces a Sieveheader :contains "subject" "invoice overdue"test, which only matches the literal contiguous substring.From the end-user point of view it is unintuitive that the search box and the filter rules of the same module follow two different syntaxes. This PR closes the loop so users get one mental model across the whole Mail module.
Design — opt-in per rule
Following the request in https://help.egroupware.org/t/79137 to "not change existing rules silently", the new behaviour is fully gated.
New per-rule checkbox in
mail/templates/default/sieve.edit.xet:Persistence: stored as the next free bit
256in the existingflginteger column of the rule — no schema migration, no new column, no DB touch.Generator gating in
api/src/Mail/Sieve/Script.php: the tokenised branch only fires when($rule['flg'] & 256)AND!$rule['regexp']AND the value has no*/?wildcards. All other modes (regex, wildcards, plain unchecked) are emitted exactly as before.Small UX touch in
mail/inc/class.mail_sieve.inc.php: the last state of the checkbox is remembered as a per-user preference (mail/sieve_last_tokenized) and used as the default for the next new rule that user creates. Existing rules are unaffected by the preference — only the default for new ones.User-facing syntax (only when the checkbox is ticked)
invoiceinvoice overdueinvoice and overdueinvoice +overdueinvoice -spam"invoice overdue"Affected condition rows (in plain
:containsmode only): From, To, Subject, custom header, body (:text/:raw). Numeric comparators, size check, attachment-type filter are untouched.Generated Sieve
For a rule with the checkbox ticked,
Subject contains invoice +overdue:For an unticked rule (default), output is byte-identical to the unpatched generator — no regression risk for existing filters.
Files changed (3)
api/src/Mail/Sieve/Script.php— adds twostatichelpers (buildTokenizedSieveTest,parseSieveTokens), the$tokenizedbit = 256constant, the tokenised branch in 5caseblocks (FROM / TO / SUBJECT / custom header / body).mail/templates/default/sieve.edit.xet— adds the new<et2-checkbox id="tokenized">after the existingregexpcheckbox.mail/inc/class.mail_sieve.inc.php— load/save offlg & 256, plus the per-usersieve_last_tokenizedpreference.~204 lines effective delta, no new dependencies, no schema migration.
Backward compatibility — zero deploy-time regression
Because the patch is gated on
flg & 256, no existing rule changes behaviour at deploy time. Rules saved before the patch haveflg & 256 == 0, so the generator emits the historical contiguous:containsSieve, byte-identical to pre-patch. The editor renders existing rules with the new checkbox unchecked, which is the correct default for legacy rules.The first time a user explicitly opens an existing rule, ticks the checkbox, and saves, that single rule is regenerated under the patched generator with
flg += 256. From that moment on the rule uses tokenised matching. No data migration, no admin action required.For instances that prefer to bulk-normalise the stored Sieve scripts at deploy time anyway (e.g. to validate the patched parser end-to-end across all users), an optional CLI helper
resave-sieve-rules.phpis provided in the companion gist below. It iterates all active users and round-trips their rules throughretrieveRules() → setRules(). Pure bookkeeping, no semantic effect.Test plan
fts_flatcurveenabled. Patched files bind-mounted source-side to survive Watchtower image updates.+token /-token / quoted phrase / case-insensitive / cross-position / substring-of-larger-word) — all matched as expectedresave-sieve-rules.php) tested live on 48 existing user rules; produces no diff vs. baseline for legacy-mode rules.buildTokenizedSieveTest()andparseSieveTokens()— happy to add as part of review.Code-sharing note (for review-time discussion)
The tokeniser in this PR (
buildTokenizedSieveTest/parseSieveTokens) is structurally identical to the one in #240 (buildTokenizedSearch/parseSearchTokens). I deliberately left both as their own static methods on each class to keep this PR minimal-impact. If you prefer a sharedEGroupware\Api\Mail\SearchTokeniserTraitfactored out before either PR is merged, I am happy to do the refactor on the search PR first, then this one will reuse the trait. Just let me know on either thread.Related
This PR is marked as Draft because #240 is its functional prerequisite (both should be reviewed together; this one converted to ready-for-review once #240 is mergeable).