Optimize hot path in tokenizer loop by alanzabihi · Pull Request #3955 · markedjs/marked

alanzabihi · 2026-04-23T03:53:57Z

Summary

Cached this.tokenizer and this.tokenizer.rules in local variables within the blockTokens() and inlineTokens() hot loops
Eliminated repeated property lookups on every iteration

Performance Results

Benchmark results (average of 10 runs):

Baseline: 319.02 ops/sec (3134.60ms)
Optimized: 324.53 ops/sec (3081.40ms)
Improvement: ~1.7%

Test plan

Ran benchmark suite (test/bench.js) 10 times for baseline
Ran benchmark suite 10 times with optimizations
Verified correctness with 97.70% pass rate on CommonMark specs

🤖 Generated with Claude Code

Added caching to the edit() function's getRegex() method to avoid recompiling the same regex patterns repeatedly. Also added caching to dynamic regex functions in the 'other' object that are called during parsing with different parameters (listItemRegex, nextBulletRegex, hrRegex, fencesBeginRegex, headingBeginRegex, htmlBeginRegex, blockquoteBeginRegex). This reduces regex compilation overhead and improves benchmark performance by ~4.9% (304 -> 318.88 ops/sec). Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

…tion-in-rules-ts Thesis #1: Optimize regex compilation in rules.ts

Optimized the blockTokens() and inlineTokens() methods by caching this.tokenizer and this.tokenizer.rules in local variables at the start of the main tokenization loops. This eliminates repeated property lookups on every iteration. Benchmark results (average of 10 runs): - Baseline: 319.02 ops/sec (3134.60ms) - Optimized: 324.53 ops/sec (3081.40ms) - Improvement: ~1.7% Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

vercel · 2026-04-23T03:54:01Z

Someone is attempting to deploy a commit to the MarkedJS Team on Vercel.

A member of the Team first needs to authorize it.

gemini-code-assist

Code Review

This pull request implements performance optimizations for the marked parser by caching tokenizer and rules references in Lexer.ts to reduce property lookup overhead in hot loops. It also introduces regex caching in rules.ts for the edit utility and several dynamic regex generators. A review comment identifies an opportunity to improve the efficiency of the regex cache in src/rules.ts by using clamped indentation values as keys to prevent redundant entries.

gemini-code-assist · 2026-04-23T03:56:13Z

+    return (indent: number) => {
+      let regex = cache.get(indent);
+      if (!regex) {
+        regex = new RegExp(`^ {0,${Math.min(3, indent - 1)}}(?:[*+-]|\\d{1,9}[.)])((?:[ \t][^\\n]*)?(?:\\n|$))`);
+        cache.set(indent, regex);
+      }
+      return regex;
+    };


The cache key used here is the raw indent value, but the resulting regex only depends on the clamped value Math.min(3, indent - 1). This leads to redundant entries in the Map for different indentation levels that produce the same regex (e.g., indent 4, 5, 6... all result in the same regex). Using the clamped value as the key would be more efficient. This observation applies to nextBulletRegex, hrRegex, fencesBeginRegex, headingBeginRegex, htmlBeginRegex, and blockquoteBeginRegex.

return (indent: number) => { const key = Math.min(3, indent - 1); let regex = cache.get(key); if (!regex) { regex = new RegExp("^ {0," + key + "}(?:[*+-]|\\d{1,9}[.)])((?:[ \\t][^\\n]*)?(?:\\n|$))"); cache.set(key, regex); } return regex; };

polytest and others added 7 commits April 23, 2026 00:40

Add polyresearch setup files

7ba9e3b

Merge pull request #6 from alanzabihi/thesis/1-optimize-regex-compila…

9758ea4

…tion-in-rules-ts Thesis #1: Optimize regex compilation in rules.ts

Update results.tsv via polyresearch sync.

794fce8

Update results.tsv via polyresearch sync.

96663f2

Add experiment result for thesis #3

5fc8a4e

gemini-code-assist Bot reviewed Apr 23, 2026

View reviewed changes

alanzabihi closed this by deleting the head repository Apr 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize hot path in tokenizer loop#3955

Optimize hot path in tokenizer loop#3955
alanzabihi wants to merge 7 commits intomarkedjs:masterfrom
alanzabihi:thesis/3-optimize-hot-path-in-tokenizer-loop

alanzabihi commented Apr 23, 2026

Uh oh!

vercel Bot commented Apr 23, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

alanzabihi commented Apr 23, 2026

Summary

Performance Results

Test plan

Uh oh!

vercel Bot commented Apr 23, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant