Skip to content

perf: use direct gradient decode path in SingleGradientOnly#738

Merged
veluca93 merged 1 commit into
libjxl:mainfrom
hjanuschka:perf/pr21-singlegradient-direct-decode
Apr 13, 2026
Merged

perf: use direct gradient decode path in SingleGradientOnly#738
veluca93 merged 1 commit into
libjxl:mainfrom
hjanuschka:perf/pr21-singlegradient-direct-decode

Conversation

@hjanuschka

@hjanuschka hjanuschka commented Mar 20, 2026

Copy link
Copy Markdown
Collaborator

Carved out from the ongoing perf/autoresearch stream after PR #705 split work.

Change:

  • in SingleGradientOnly::decode_one, use direct clamped_gradient + wrapping_add instead of routing through the generic predictor helper path.

Why:

  • trims hot-loop overhead in modular decode
  • no behavior change intended

@github-actions

github-actions Bot commented Mar 20, 2026

Copy link
Copy Markdown

Benchmark @ a006d4d

MULTI-FILE BENCHMARK RESULTS (8 files)
  CPU architecture: x86_64
  WARNING: System appears noisy: high system load (2.36). Results may be unreliable.
Statistics:
  Confidence:               99.0%
  Max relative error:        3.0%

Comparing: 7cf3a662 (Base) vs 9e878745 (PR)

File Base (MP/s) PR (MP/s) Δ%
bicycles.jxl 7.517 7.553 +0.47% ±1.3%
bike.jxl 25.043 25.302 +1.04% ±1.5%
delta_palette.jxl 6.195 6.210 +0.23% ±0.5%
green_queen_modular_e3.jxl 8.405 8.350 -0.66% ±0.5%
green_queen_vardct_e3.jxl 25.855 25.828 -0.10% ±0.7%
lz77_flower.jxl 3.438 3.436 -0.07% ±0.6%
patches_lossless.jxl 3.252 3.255 +0.09% ±0.6%
sunset_logo.jxl 2.892 2.865 -0.95% ±0.8%

@veluca93 veluca93 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rebase!

@hjanuschka hjanuschka force-pushed the perf/pr21-singlegradient-direct-decode branch from 9c8585e to a006d4d Compare April 2, 2026 06:53
@veluca93

Copy link
Copy Markdown
Member

Performance Summary (Commit a006d4d)

Machine Threading Base MP/s Head MP/s Avg Improvement
desktop Single 84.24 83.82 -0.19% ± 0.21%
desktop Multi 84.18 84.10 -0.09% ± 0.16%
framework-desktop Single 103.79 102.43 -0.96% ± 0.17%
framework-desktop Multi 103.75 103.37 -0.59% ± 0.17%
pixel7a Single (Fast) 34.57 33.65 -1.46% ± 0.69%
pixel7a Single (Mid) 24.78 24.90 +0.04% ± 0.17%
pixel7a Multi 34.19 33.63 -1.18% ± 0.27%

Detailed per-image results

@veluca93 veluca93 merged commit 371f033 into libjxl:main Apr 13, 2026
19 checks passed
lilith added a commit to imazen/zenjxl-decoder that referenced this pull request Jun 1, 2026
…jxl-rs#738)

SingleGradientOnly::decode_one called Predictor::Gradient.predict_one (enum
dispatch) then make_pixel(dec, 1, pred) (an i64 multiply by 1). For this
single-predictor tree both reduce to clamped_gradient(left, top, topleft) and
dec.wrapping_add(pred as i32) respectively. Output is bit-identical (verified
against predict.rs:332 and common.rs:86-88); the inline drops the dispatch and
multiply from a hot modular-decode loop.

Upstream change by Helmut Januschka.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants