fix(markdown/telegram): stop one-letter Telegram messages under high HTML expansion ratios by hah23255 · Pull Request #39 · op7418/Claude-to-IM

hah23255 · 2026-05-04T05:54:34Z

Problem (reproducible)

When markdownToTelegramChunks() is called with markdown that renders to disproportionately large HTML — heavy nested inline formatting, many HTML escapes, links wrapped over long text — splitTelegramChunkByHtmlLimit recursively splits down to single-character chunks, each becoming its own Telegram message. From the user's side this looks like "messages break randomly, sometimes only one letter".

The included regression test (bridge-markdown-telegram-chunks.test.ts) reproduces the symptom on three pathological-but-plausible inputs.

Root cause

splitTelegramChunkByHtmlLimit computed proportionalLimit = (textLength × htmlLimit) / renderedHtmlLength. When the HTML expansion ratio is high, proportionalLimit collapses toward zero. Two paths to single-letter output:

splitMarkdownIRPreserveWhitespace used fixed-stride slicing — N=4096 with limit=441 produced 9×441 + 1×127, the 127-char tail bypassed the previous 1-char early-return floor.
Recursive splitting on chunks just above the floor — e.g. text=257 with splitLimit=256 → 256+1, the 1-char remainder was accepted by the outer-loop's chunk.text.length <= 1 accept-as-is branch.

Fix (two commits, both small)

545e6a5 — introduces an exported MIN_CHUNK_TEXT_LENGTH constant (256) and uses it as a basic floor.

bff8556 — closes the two paths above:

splitMarkdownIRPreserveWhitespace: switch from fixed-stride to equal-split (K = ceil(N/limit) chunks of ceil(N/K) each). Every chunk is within ±1 of N/K, so when limit ≥ MIN, no chunk falls below MIN.
splitTelegramChunkByHtmlLimit: refuse to split when N ≤ 2×MIN (such chunks unavoidably leave a sub-MIN tail). Outer loop's accept-as-is threshold raised to match.

Tests

bridge-markdown-telegram-chunks.test.ts (4 cases): exports a sensible MIN; never produces sub-MIN chunks when split happens; never produces <32-char chunks across 3 pathological inputs (heavy code-fence + escapes, deeply nested inline + links, pure HTML-escape soup); normal long-form docs still split correctly.
All existing tests still pass: 73/73 (tsc --noEmit clean).

Side note (filename glob)

The original new test was named markdown-telegram-chunks.test.ts — that filename silently doesn't match the bridge-*.test.ts glob in package.json:test:unit. Renamed to bridge-markdown-telegram-chunks.test.ts so it actually runs. The glob may want broadening to **/*.test.ts in a follow-up — flagged separately.

Bonus (separate concern, also in this PR)

Adds sanitizeModelName() in conversation-engine.ts (also exported, also tested). Strips trailing bracketed metadata from model names — the Claude Code CLI emits claude-opus-4-7[1m] on status SSE events to indicate the 1M-context tier; the [1m] was being stored verbatim and then passed back as --model next turn, which the CLI rejects. 6 tests cover Claude tiers, arbitrary providers, defensive null handling, and a trim of trailing whitespace.

If you'd prefer to keep this PR scoped strictly to the Telegram chunker, I'm happy to split the sanitiser into its own PR — let me know.

🤖 Generated with Claude Code

…r messages Symptom: Telegram bridge messages arrived fragmented into many tiny messages, sometimes one character per message ("messages break randomly, sometimes only one letter"). Root cause in src/lib/bridge/markdown/telegram.ts:splitTelegramChunkBy HtmlLimit. The function computed a `proportionalLimit` of (currentTextLength * htmlLimit) / renderedHtmlLength. When markdown rendered to a much larger HTML payload than its source — heavy nested formatting, many HTML escapes, links wrapped around long text — that ratio drove the split limit toward zero. The recursive splitter then produced 1-character MarkdownIR chunks; each 1-char chunk re-rendered short HTML that fit the limit, was accepted, and became its own TelegramChunk → its own Telegram message. The 1-char early-return in splitTelegramChunkByHtmlLimit prevented infinite recursion but didn't prevent the cascade above that point. Fix: - Export MIN_CHUNK_TEXT_LENGTH (256) constant for tests. - splitTelegramChunkByHtmlLimit: early-return when text <= floor; candidateLimit and fallback are floored at MIN_CHUNK_TEXT_LENGTH. - renderTelegramChunksWithinHtmlLimit: accept-as-is condition raised from `chunk.text.length <= 1` to `<= MIN_CHUNK_TEXT_LENGTH`. Oversized HTML on a small text chunk falls back to plain-text via the existing delivery-layer parse-error handler in sendWithRetry. Tests: - New src/__tests__/unit/markdown-telegram-chunks.test.ts (4 cases): exports the constant; never produces sub-floor chunks when split happens; never produces <32-char chunks across 3 pathological inputs (heavy code-fence + escapes, deeply nested inline + links, pure HTML- escape soup); normal long-form docs still split correctly. - 63/63 unit tests pass; tsc --noEmit clean. Operational: - Already deployed via local promotion: fork dist/ → skill node_modules/ claude-to-im/dist/ → esbuild rebundle of dist/daemon.mjs. Daemon at PID 104229 (started after rebuild) is running the fix. Zero chunk failures since restart. - Independently fixed: 9 sessions in ~/.claude-to-im/data/sessions.json had model field "claude-opus-4-7[1m]" (CLI 1M-context tag the bridge auto-stored from status SSE events but cannot pass back as --model). Stripped via python+os.replace atomic rewrite. All 12 sessions preserved, no losses. Followup (not this commit): - Same fix should land upstream at op7418/claude-to-im so it survives skill reinstalls — currently the skill imports from there, not from this fork. - The bridge should sanitise model names before storing them on status events (strip [..] suffix) so this can't recur. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…del sanitiser Builds on 545e6a5. The earlier MIN_CHUNK_TEXT_LENGTH=256 floor was partial: 1. splitMarkdownIRPreserveWhitespace still emitted sub-MIN remainders (e.g., splitting 4096 chars at limit 441 produced 9×441 + 1×127, and the 127-char tail surfaced as a tiny Telegram message). 2. splitTelegramChunkByHtmlLimit still recursed on chunks just above MIN (splitting 257 at 256 produced 256+1). 3. The new regression test silently wasn't running — its filename (markdown-telegram-chunks.test.ts) didn't match the bridge-*.test.ts glob in package.json:test:unit. Caught + fixed in this commit. Two-part deeper fix: - splitMarkdownIRPreserveWhitespace: switch from fixed-stride to equal-split (K = ceil(N/limit) chunks of ceil(N/K) each). Every chunk size lands within ±1 of N/K, so when limit ≥ MIN, no chunk falls below MIN. - splitTelegramChunkByHtmlLimit: refuse to split when N ≤ 2×MIN. Such chunks unavoidably leave a sub-MIN tail; outer loop accepts them as-is and the delivery layer's HTML→plain fallback handles oversized HTML. - Outer renderTelegramChunksWithinHtmlLimit: accept-as-is threshold raised from `text ≤ MIN` to `text ≤ 2×MIN` to align with the new splitter contract. Plus persist-time model sanitiser (prevents "[1m]" data corruption recurring): - New exported sanitizeModelName() in conversation-engine.ts strips trailing bracketed metadata (e.g., the `[1m]` 1M-context tier suffix the Claude Code CLI emits on its `status` SSE event). Without this, the bridge stored "claude-opus-4-7[1m]" verbatim and then passed it back as `--model` on the next turn, where the CLI rejected it and the LLM provider fell back to the default. Sanitiser runs on every status event before updateSessionModel. Tests: - bridge-markdown-telegram-chunks.test.ts (4 cases): exports a sensible MIN value; never produces sub-MIN chunks when split happens; never produces <32-char chunks across 3 pathological inputs (heavy code-fence + escapes, deeply nested inline + links, pure HTML-escape soup); normal long-form docs still split correctly. Renamed from markdown-telegram-chunks.test.ts so it matches the bridge-*.test.ts glob and actually runs. - bridge-conversation-engine-sanitize.test.ts (6 cases): strip [1m] for Claude opus/sonnet, strip arbitrary [...] suffixes for any provider, leave clean names untouched, only strip trailing brackets, trim whitespace, defensive null/undefined handling. - 73/73 unit tests pass; tsc --noEmit clean. Operational: - Already deployed via local promotion (fork dist → skill node_modules → esbuild rebundle of dist/daemon.mjs). Skill bundle has both new symbols (MIN_CHUNK_TEXT_LENGTH ×6, sanitizeModelName ×2 in the bundled output). Daemon at PID 148375 (started 06:51 BST after rebuild) is running the full fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR addresses a Telegram bridge chunking failure mode where extreme Markdown→HTML expansion caused recursive splitting down to tiny (sometimes 1-character) messages, and adds a small fix to prevent persisting vendor metadata in streamed model identifiers.

Changes:

Introduce MIN_CHUNK_TEXT_LENGTH and update Telegram render-first chunking/splitting logic to avoid tiny text chunks under high HTML expansion ratios.
Add regression coverage for pathological Markdown inputs and ensure the new test file matches the unit-test glob.
Add sanitizeModelName() and apply it to streamed status events before persisting the session model.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File	Description
src/lib/bridge/markdown/telegram.ts	Adds a minimum chunk-text floor and revises split strategy to prevent runaway splitting into tiny Telegram messages.
src/lib/bridge/conversation-engine.ts	Adds and uses `sanitizeModelName()` when persisting model names from SSE `status` events.
src/tests/unit/bridge-markdown-telegram-chunks.test.ts	Adds regression tests for the Telegram chunker’s “tiny messages” failure mode.
src/tests/unit/bridge-conversation-engine-sanitize.test.ts	Adds unit tests for `sanitizeModelName()` behavior across providers and edge cases.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    // Accept the chunk as-is if it fits OR if it's at-or-below 2×MIN
+    // (the splitter refuses to split below this threshold to avoid sub-MIN
+    // remainders that surfaced as one-letter Telegram messages). Oversized
+    // HTML on a small-text chunk is handled by the delivery layer's HTML→
+    // plain fallback when Telegram rejects it.
+    if (html.length <= normalizedLimit || chunk.text.length <= MIN_CHUNK_TEXT_LENGTH * 2) {
      rendered.push({ html, text: chunk.text });


  const normalizedLimit = Math.max(1, Math.floor(limit));
-  if (normalizedLimit <= 0 || ir.text.length <= normalizedLimit) {
+  const N = ir.text.length;
+  if (normalizedLimit <= 0 || N <= normalizedLimit) {


+export function sanitizeModelName(model: string): string {
+  if (typeof model !== 'string') return model;


              if (statusData.model) {
-                store.updateSessionModel(sessionId, statusData.model);
+                store.updateSessionModel(sessionId, sanitizeModelName(statusData.model));
              }


+  // Equal-split: divide into K = ceil(N / limit) chunks of ceil(N / K) chars
+  // each. Avoids the prior fixed-stride approach which left a small remainder
+  // (e.g., splitting 4096 at 441 → 9×441 + 1×127); the 127-char tail then
+  // bypassed the MIN_CHUNK_TEXT_LENGTH floor and surfaced as a tiny Telegram
+  // message. Equal-split keeps every chunk size within `±1` of N/K, so when
+  // the caller's `limit` itself is ≥ MIN we never produce sub-MIN remainders.
+  const K = Math.ceil(N / normalizedLimit);


hah23255 and others added 2 commits May 3, 2026 20:38

Copilot AI review requested due to automatic review settings May 4, 2026 05:54

Copilot started reviewing on behalf of hah23255 May 4, 2026 05:54 View session

Copilot AI reviewed May 4, 2026

View reviewed changes

hah23255 closed this May 4, 2026

hah23255 deleted the fix/telegram-chunk-min-size branch May 4, 2026 06:40

hah23255 restored the fix/telegram-chunk-min-size branch May 4, 2026 06:43

hah23255 reopened this May 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(markdown/telegram): stop one-letter Telegram messages under high HTML expansion ratios#39

fix(markdown/telegram): stop one-letter Telegram messages under high HTML expansion ratios#39
hah23255 wants to merge 2 commits intoop7418:mainfrom
hah23255:fix/telegram-chunk-min-size

hah23255 commented May 4, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		export function sanitizeModelName(model: string): string {
		if (typeof model !== 'string') return model;

Conversation

hah23255 commented May 4, 2026

Problem (reproducible)

Root cause

Fix (two commits, both small)

Tests

Side note (filename glob)

Bonus (separate concern, also in this PR)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants