Skip to content

perf: speed up flat tree traversal in modular decode#716

Merged
jonsneyers merged 1 commit into
libjxl:mainfrom
hjanuschka:perf/pr10-flat-tree-traversal
Jun 4, 2026
Merged

perf: speed up flat tree traversal in modular decode#716
jonsneyers merged 1 commit into
libjxl:mainfrom
hjanuschka:perf/pr10-flat-tree-traversal

Conversation

@hjanuschka

Copy link
Copy Markdown
Collaborator

This stores leaf predictors directly in flat tree nodes and uses an unrolled traversal macro in the hot prediction loop. It removes repeated enum conversion and reduces branching while keeping prediction behavior unchanged. Unsafe is used only for unchecked tree/property indexing where indices are validated during tree decode/flattening and guarded by invariants.

@github-actions

Copy link
Copy Markdown

Benchmark @ 05fe313

MULTI-FILE BENCHMARK RESULTS (4 files)
  CPU architecture: x86_64
  WARNING: System appears noisy: high system load (2.46). Results may be unreliable.
Statistics:
  Confidence:               99.0%
  Max relative error:        3.0%

Comparing: e883140e (Base) vs 7d77dfd6 (PR)

File Base (MP/s) PR (MP/s) Δ%
bike.jxl 24.238 24.374 +0.56% ±2.7%
green_queen_modular_e3.jxl 7.867 7.811 -0.71% ±0.3%
green_queen_vardct_e3.jxl 23.852 23.817 -0.15% ±2.2%
sunset_logo.jxl 2.787 2.926 +5.02% ±1.3%

@veluca93

Copy link
Copy Markdown
Member

Same comment at #714 and #715 :-)

@veluca93 veluca93 force-pushed the perf/pr10-flat-tree-traversal branch from 05fe313 to e689173 Compare June 2, 2026 23:00
@veluca93

veluca93 commented Jun 2, 2026

Copy link
Copy Markdown
Member

I modified the implementation not to use unsafe; the PR includes #787.

@jonsneyers

Copy link
Copy Markdown
Member

Could you rebase now that #787 is merged, and do a benchmark to see the isolated effect of the tree traversal speedup compared to that new baseline?

On my machine, speeds up sunset-logo.jxl from ~7.7MP/s to ~8.4MP/s
(about 10% improvement).

No unsafe required.
@veluca93 veluca93 force-pushed the perf/pr10-flat-tree-traversal branch from e689173 to 1c05b2b Compare June 3, 2026 06:40
@veluca93

veluca93 commented Jun 3, 2026

Copy link
Copy Markdown
Member

Performance Summary (Commit 1c05b2b)

Machine Threading Base MP/s PR MP/s Avg Improvement
desktop Single 92.09 91.92 +1.70% ± 0.32%
desktop Multi 91.94 91.99 +1.78% ± 0.39%
framework-desktop Single 109.22 109.54 +1.51% ± 0.33%
framework-desktop Multi 110.02 109.76 +1.50% ± 0.23%
pixel7a Single (Fast) 32.59 32.71 +1.11% ± 0.38%
pixel7a Single (Mid) 23.61 23.61 +1.18% ± 0.32%
pixel7a Multi 33.12 33.15 +1.01% ± 0.39%

Detailed per-image results

@jonsneyers jonsneyers merged commit ad5ead5 into libjxl:main Jun 4, 2026
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants