Improve serialization performance with memchr#740
Conversation
|
I plan to extend support for this optimization to the XML serializer in a followup. In order to do that I need to make some XML benchmarks. That seems like enough to split into a separate PR. |
There was a problem hiding this comment.
Very nice speedup!
We solve a very similar problem for the data state in the HTML tokenizer with SIMD intrinsics. I wonder how that compares to two memchr calls in terms of performance.
html5ever/html5ever/src/tokenizer/mod.rs
Lines 1945 to 1990 in 201534e
| let result2 = memchr2(b'&', 0xC2, &slice[..result]).unwrap_or(slice.len()); | ||
| result.min(result2) |
There was a problem hiding this comment.
nit: If you do unwrap_or(result) for the memchr2 call then you don't need result.min(result2)
|
Thanks for the review!
I've been doing a lot of rough experimentation of the past couple days with an SSE3 and AVX2 version of the parser optimization. It's quite possible that we could use a similar technique here (and it would benefit from not having to count newlines). It would be nice to move some of these routines into markup5ever utilities and to make them more general, though very carefully in order to avoid hurting performance. I think you are ultimately correct in #703, though that bigger wins are likely found by structural changes to the API such as supporting a mode that doesn't count newlines. |
This change greatly improves the performance of serialization (up to 95% on some benchmarks) by changing the way that escaping of HTML entities works. It uses memchar to avoid creating a `chars()` iterator on the output stream. When run with the benchmark from #739, I see these results: ``` serialize "lipsum.html" time: [6.4817 µs 6.5021 µs 6.5212 µs] change: [−95.179% −95.013% −94.846%] (p = 0.00 < 0.05) Performance has improved. Found 5 outliers among 100 measurements (5.00%) 4 (4.00%) low severe 1 (1.00%) low mild serialize "lipsum-zh.html" time: [2.0815 µs 2.0888 µs 2.0947 µs] change: [−91.533% −90.940% −90.407%] (p = 0.00 < 0.05) Performance has improved. Found 15 outliers among 100 measurements (15.00%) 15 (15.00%) low severe serialize "medium-fragment.html" time: [7.7625 µs 7.7927 µs 7.8147 µs] change: [−84.424% −83.952% −83.486%] (p = 0.00 < 0.05) Performance has improved. serialize "small-fragment.html" time: [879.01 ns 886.43 ns 892.78 ns] change: [−89.813% −89.711% −89.610%] (p = 0.00 < 0.05) Performance has improved. serialize "tiny-fragment.html" time: [332.13 ns 332.78 ns 333.60 ns] change: [−27.768% −27.617% −27.457%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 3 (3.00%) low mild 3 (3.00%) high mild 1 (1.00%) high severe serialize "strong.html" time: [5.4946 µs 5.4988 µs 5.5030 µs] change: [−0.3133% −0.0322% +0.2349%] (p = 0.83 > 0.05) No change in performance detected. Found 7 outliers among 100 measurements (7.00%) 3 (3.00%) low severe 1 (1.00%) low mild 2 (2.00%) high mild 1 (1.00%) high severe ``` In this case `lipsum.html` deserialization time dropped from 122.81 µs to 6.5021 µs. Signed-off-by: Martin Robinson <mrobinson@igalia.com>
e4688ab to
630777d
Compare
This change greatly improves the performance of serialization (up to
95% on some benchmarks) by changing the way that escaping of HTML
entities works. It uses memchar to avoid creating a
chars()iteratoron the output stream. When run with the benchmark from #739, I see these
results:
In this case
lipsum.htmldeserialization time dropped from 122.81 µsto 6.5021 µs.