Skip to content

api/Mail: tokenise search input to align with EGroupware syntax#240

Merged
ralfbecker merged 1 commit into
EGroupware:masterfrom
CActor:feature/tokenized-mail-search
May 14, 2026
Merged

api/Mail: tokenise search input to align with EGroupware syntax#240
ralfbecker merged 1 commit into
EGroupware:masterfrom
CActor:feature/tokenized-mail-search

Conversation

@CActor
Copy link
Copy Markdown
Contributor

@CActor CActor commented May 14, 2026

Summary

Aligns the Mail module search-box behaviour with the documented EGroupware search syntax used in Addressbook / Calendar / InfoLog. Multi-word queries now match messages whose terms can appear in any order and with arbitrary text between them.

Problem this solves

Documented in detail in the forum thread (linked below). The Mail search field is the only one in EGroupware that forwards the entire input string to the IMAP server as a single substring match. A query like invoice overdue becomes SEARCH TEXT "invoice overdue", so a message body saying "the invoice from 02/2026 is overdue" is not matched, even though both words are clearly present.

Even with fts_flatcurve enabled on Dovecot, the phrase is matched as a contiguous bigram of tokens, so non-adjacent words remain a hard miss.

Solution

Two new protected static helpers added to Mail class:

  • buildTokenizedSearch($string, callable $factory): ?Horde_Imap_Client_Search_Query
  • parseSearchTokens($string): array

The relevant case branches (SUBJECT, FROM, TO, CC, BCC, BODY, TEXT) in createIMAPFilter() now tokenise the input and call the factory for each token, then combine the sub-queries with Horde_Imap_Client_Search_Query::andSearch() / orSearch() / negation according to the EGroupware-standard operator on each token.

User-facing syntax (matches Addressbook/Calendar/InfoLog):

Input Meaning
invoice substring "invoice" anywhere
invoice overdue "invoice" OR "overdue" (default)
invoice and overdue "invoice" AND "overdue"
invoice +overdue "invoice" AND "overdue"
invoice -spam "invoice" AND NOT "spam"
"invoice overdue" literal phrase as a single token

The resulting IMAP query for invoice +overdue with TEXT search type is (BODY "invoice") (BODY "overdue") — two index lookups intersected by Dovecot (sub-millisecond with Flatcurve, two linear scans without).

A single-token input produces byte-identical output to the historical generator — no regression on existing user behaviour where they typed one word.

Files changed

  • api/src/Mail.php — ~130 lines of effective delta

Test plan

  • Manual UI testing on EGroupware 26.1 (Docker image), Dovecot with fts_flatcurve
  • Multi-word search with words in arbitrary order — verified match
  • Single-word search — verified identical output to before patch
  • Negation - — verified AND NOT semantics
  • Quoted phrase "foo bar" — verified historical contiguous match preserved
  • Unit tests for buildTokenizedSearch() and parseSearchTokens() — happy to add as part of review

Backward compatibility

Fully preserved for single-word inputs. Multi-word inputs change semantics: previously they tried to match a literal contiguous substring (which almost never succeeded), now they apply the documented A B = A or B rule. Users who specifically relied on contiguous match (rare) can now wrap the value in double quotes to preserve the old semantics.

Related

Adds two protected static helpers — buildTokenizedSearch() and
parseSearchTokens() — and rewrites the SUBJECT/FROM/TO/CC/BCC/BODY/TEXT
branches of createIMAPFilter() to tokenise multi-word input and combine
per-token Horde sub-queries with andSearch()/orSearch() and negation.

Brings the Mail search behaviour in line with the documented EGroupware
syntax already used in Addressbook, Calendar and InfoLog:

  foo bar     -> contains foo OR contains bar (default)
  foo +bar    -> contains foo AND contains bar (required)
  foo -bar    -> contains foo AND NOT contains bar (forbidden)
  foo or bar  -> contains foo OR contains bar
  foo and bar -> contains foo AND contains bar
  "foo bar"   -> literal phrase as single token (preserves legacy behaviour)

Single-word input produces byte-identical IMAP queries to the previous
implementation. Multi-word input previously tried to match a literal
contiguous substring (which almost never succeeded); now it applies the
documented A B = A or B rule. Users who rely on contiguous match can
wrap the value in double quotes.

The patch leverages the existing Horde_Imap_Client_Search_Query primitives
(andSearch / orSearch / negation flag) that the codebase already uses in
the QUICK/QUICKWITHCC branches, so there is no new dependency.

Forum discussion: https://help.egroupware.org/t/79137
@ralfbecker ralfbecker merged commit cdcaae7 into EGroupware:master May 14, 2026
4 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants