Fix excessive usage of malloc/free#99
Conversation
Split ZopfliInitHash to: - ZopfliAllocHash - allocate hash memory, - ZopfliResetHash - reset hash values. Allocate Hash outside of ZopfliLZ77Greedy and ZopfliLZ77OptimalRun that pass it further to functions previously allocating them. Do the same for costs malloc'd array. Reason for this change: - the size of malloc doesn't change, - speed up Zopfli*, - fix crash on certain devices**. * speeds up Zopfli (especially on smaller blocks) by reducing amount of sys time from ~7s to 0.1s on x64 Linux for ~5m compression time and from ~1m to 0.1s on ARMv7 Linux for 13m compression time. ** fixes a large amount of iterations crash on some ARM devices that due to architecture or older kernel (not sure which) don't handle too aggressive heap allocation and freeing.
|
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed, please reply here (e.g.
|
|
I signed it! Lol, just because those are 2 serious bugs imo, that should be fixed asap (2 of my last pull requests). |
|
CLAs look good, thanks! |
|
Is there a reason this hasn't been merged? |
|
Merging it in right now. I measure no real speed difference with gcc on intel. But useful if it fixes an ARM problem. |
|
Merged, thanks for your work! |
|
The speed difference is very little on x86, if You compress lodepng.cpp file with 5000 iterations and use x64 binary for it You should get ~3s less seconds on sys time, however it's almost 1m faster on ARMv7. It may depend on malloc/free implementation or kernel heap manager. EDIT: Following pull request with further improvement: #106 |
Split ZopfliInitHash to:
Allocate Hash outside of ZopfliLZ77Greedy and ZopfliLZ77OptimalRun that
pass it further to functions previously allocating them. Do the same for
costs malloc'd array.
Reason for this change:
^ speeds up Zopfli (especially on smaller blocks) by reducing amount of
sys time from ~7s to 0.1s on x64 Linux for ~5m compression time and from
~1m to 0.1s on ARMv7 Linux for 13m compression time.
^^ fixes a large amount of iterations crash on some ARM devices that due
to architecture or older kernel (not sure which) don't handle too
aggressive heap allocation and freeing well.
PS. You don't want to know how many hours I wasted guessing what was wrong on Odroid U3 when gdb showed false-positives - without pthreads it was running into assert in ZopfliVerifyLenDist, with pthreads it was crashing on free in ZopfliCleanHash. :)
PS2: TCmalloc didn't help on Odroid U3, still crash due to heap problems occured, before this fix.