perf: add palette transform fast paths#713
Open
hjanuschka wants to merge 3 commits into
Open
Conversation
Benchmark @ 3ca7625Comparing: e883140e (Base) vs cba318d6 (PR)
|
17 tasks
veluca93
reviewed
May 29, 2026
Address review feedback from veluca93: replace the get_unchecked calls in the no-delta palette fast paths with safe indexing, gated by an assert!(palette_size <= pal_row.len()) outside the inner loop. Combined with the existing 'if idx < palette_size' guard, the compiler can prove in-bounds access and elide the check while still keeping the code unsafe-free. Also drop #[inline(always)] from the large body functions (do_palette_step_general, do_palette_step_one_group, do_palette_step_group_row, get_prediction_data); keep it only on the small helpers (scale, get_palette_value_with_row). The original commit's benchmark showed a regression on all four files (-0.74% .. -3.07%), and aggressive inlining of these large bodies is a plausible contributor via icache pressure. The structural wins (hoisting palette.row(c) out of the inner loops, splitting the no-delta case) are retained.
veluca93
reviewed
Jun 2, 2026
| palette: &Image<i32>, | ||
| /// Look up palette value. `pal_row` is the pre-fetched palette row for channel `c`. | ||
| #[inline(always)] | ||
| fn get_palette_value_with_row( |
Member
There was a problem hiding this comment.
Please rename this back to get_palette_value.
| let index_img = buf_in.data.row(y); | ||
| let palette_size = num_colors + num_deltas; | ||
|
|
||
| if num_deltas == 0 { |
Member
There was a problem hiding this comment.
In do_palette_step_general, this is gated by the predictor also being zero - why is this not the same here?
If the fast path is basically equivalent, shouldn't we factor it out to a function?
What images trigger this fast path?
Address review feedback from veluca93: * Rename get_palette_value_with_row back to get_palette_value. The function's role hasn't changed; it just now takes a pre-fetched palette row (`&[i32]`) instead of looking it up via `palette.row(c)` on every call. Updated the doc comment to reflect that. * Drop the predictor == Predictor::Zero half of the fast-path gate in do_palette_step_general. When num_deltas == 0, the `if index < num_deltas as i32` branch in the weighted/general arms is never taken, so predictor.predict_one is never called and the WP state's update_errors writes are dead. The fast path is therefore correct for any predictor when num_deltas == 0, matching what do_palette_step_one_group was already doing. * Factor the common inner loop into apply_palette_lookup_row. Both fast paths now call it, removing the duplicated direct-lookup / out-of-range-fallback / assert pattern. Behavior unchanged. 639 unit tests pass.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This optimizes palette transforms by prefetching palette rows and adding a direct no-delta lookup fast path, while keeping delta handling behavior unchanged. It also keeps delta prediction on the cross-grid-safe path to preserve conformance correctness. Unsafe is used only for unchecked palette index reads in the proven in-bounds fast path where idx < palette_size and palette_size <= pal_row.len().