Skip to content

do two TODOs#802

Open
jonsneyers wants to merge 4 commits into
libjxl:mainfrom
jonsneyers:hf_16bit_coeffs
Open

do two TODOs#802
jonsneyers wants to merge 4 commits into
libjxl:mainfrom
jonsneyers:hf_16bit_coeffs

Conversation

@jonsneyers

Copy link
Copy Markdown
Member

Does two TODOs, one small one in tf.rs, and a bigger one to use 16-bit HF coeff buffers (only relevant in case of progressive passes) instead of 32-bit ones.

Gives only an almost immeasurable speedup on my laptop, but a substantial memory reduction in case of multiple HF passes:

Before:

Decoded 333921030 pixels in 6.153 seconds: 54.271 MP/s
        6.39s real              6.35s user              0.02s sys
           236355584  maximum resident set size

After:

Decoded 333921030 pixels in 6.147 seconds: 54.322 MP/s
        6.38s real              6.35s user              0.01s sys
           154320896  maximum resident set size

@jonsneyers jonsneyers requested a review from veluca93 June 12, 2026 20:22

@veluca93 veluca93 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to get benchmark results as usual ;)

Comment thread jxl/src/frame/mod.rs Outdated
passes: Vec<PassState>,
dequant_matrices: DequantMatrices,
hf_coefficients: Option<(Image<i32>, Image<i32>, Image<i32>)>,
pub(super) hf_coefficients: Option<HfCoefficients>,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're doing a match anyway, why not make HfCoefficients have a None / I16(...) / I32(...) enum? Also, any reason not to make the tuple into an array of size 3?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, have you considered using three OwnedRawImages and a boolean flag to remember whether it is i32 or i16? This should have a bit less code.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did the enum thing. With OwnedRawImage it seems hard to avoid unsafe reinterpret casts, no?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't be, there is a safe way to get a typed image view out of a OwnedRawImage ;)
See

pub fn from_raw(raw: OwnedRawImage) -> Self {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and

pub fn from_raw(raw: RawImageRectMut<'a>) -> Self {
for the rect version

Comment thread jxl_simd/src/aarch64/neon.rs Outdated
Comment thread jxl/src/frame/group.rs Outdated
Comment thread jxl/src/frame/group.rs Outdated
Comment thread jxl/src/frame/group.rs Outdated
Comment thread jxl/src/frame/group.rs Outdated
@veluca93

veluca93 commented Jun 12, 2026

Copy link
Copy Markdown
Member

Performance Summary (Commit d4958fd)

Machine Threading Base MP/s PR MP/s Avg Improvement
desktop Single 79.50 79.36 -0.43% ± 0.33%
desktop Multi 79.11 79.11 -0.23% ± 0.39%
framework-desktop Single 92.68 96.39 -0.32% ± 0.30%
framework-desktop Multi 93.50 97.07 -0.32% ± 0.20%
pixel7a Single (Fast) 28.57 28.29 +0.13% ± 0.33%
pixel7a Single (Mid) 20.37 20.86 +1.00% ± 0.37%
pixel7a Multi 28.47 28.67 +0.51% ± 0.54%

Detailed per-image results

@jonsneyers

Copy link
Copy Markdown
Member Author

Performance Summary (Commit 0aabc58)

[...]

Probably this is just noise since there's no image where this should make a difference (requires multiple HF passes).
Let's take another look after adding a multi-pass test image.

@veluca93

Copy link
Copy Markdown
Member

Performance Summary (Commit 0aabc58)

[...]

Probably this is just noise since there's no image where this should make a difference (requires multiple HF passes). Let's take another look after adding a multi-pass test image.

It seems to help a bit with progressive images.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants