do two TODOs#802
Conversation
veluca93
left a comment
There was a problem hiding this comment.
Approving to get benchmark results as usual ;)
| passes: Vec<PassState>, | ||
| dequant_matrices: DequantMatrices, | ||
| hf_coefficients: Option<(Image<i32>, Image<i32>, Image<i32>)>, | ||
| pub(super) hf_coefficients: Option<HfCoefficients>, |
There was a problem hiding this comment.
If we're doing a match anyway, why not make HfCoefficients have a None / I16(...) / I32(...) enum? Also, any reason not to make the tuple into an array of size 3?
There was a problem hiding this comment.
Alternatively, have you considered using three OwnedRawImages and a boolean flag to remember whether it is i32 or i16? This should have a bit less code.
There was a problem hiding this comment.
Did the enum thing. With OwnedRawImage it seems hard to avoid unsafe reinterpret casts, no?
There was a problem hiding this comment.
It shouldn't be, there is a safe way to get a typed image view out of a OwnedRawImage ;)
See
Line 113 in 36c9c3b
Performance Summary (Commit
|
| Machine | Threading | Base MP/s | PR MP/s | Avg Improvement |
|---|---|---|---|---|
| desktop | Single | 79.50 | 79.36 | -0.43% ± 0.33% |
| desktop | Multi | 79.11 | 79.11 | -0.23% ± 0.39% |
| framework-desktop | Single | 92.68 | 96.39 | -0.32% ± 0.30% |
| framework-desktop | Multi | 93.50 | 97.07 | -0.32% ± 0.20% |
| pixel7a | Single (Fast) | 28.57 | 28.29 | +0.13% ± 0.33% |
| pixel7a | Single (Mid) | 20.37 | 20.86 | +1.00% ± 0.37% |
| pixel7a | Multi | 28.47 | 28.67 | +0.51% ± 0.54% |
Probably this is just noise since there's no image where this should make a difference (requires multiple HF passes). |
It seems to help a bit with progressive images. |
Does two TODOs, one small one in tf.rs, and a bigger one to use 16-bit HF coeff buffers (only relevant in case of progressive passes) instead of 32-bit ones.
Gives only an almost immeasurable speedup on my laptop, but a substantial memory reduction in case of multiple HF passes:
Before:
After: