-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inflate fast NEON optimization #345
base: master
Are you sure you want to change the base?
Conversation
In inflate_fast() the output pointer always has plenty of room to write. This means that so long as the target is capable, wide un-aligned loads and stores can be used to transfer several bytes at once. When the reference distance is too short simply unroll the data a little to increase the distance. For reference, please see: https://chromium.googlesource.com/chromium/src/+/78104f4d73e3bbb4155fa804d00ed66682180556 ps: this is still missing the fix for inflate_back corner case. Change-Id: I5216424ab584e069b77ddf04000a313d5ca99839
This handles the case where a zlib user could rely on InflateBack API to decompress content. The NEON optimization assumes that it can perform wide stores, sometimes overwriting data on the output pointer (but never overflowing the buffer end as it has enough room for the write). For infback there is no such guarantees (i.e. no extra wiggle room), which can result in illegal operations. This patch fixes the potential issue by falling back to the non-optimized code for such cases. Also it adds some comments about the entry assumptions in inflate and writes out a defined value at the write buffer to identify where the real data has ended (helpful while debugging). For reference, please see: https://chromium.googlesource.com/chromium/src/+/0bb11040792edc5b28fcb710fc4c01fedd98c97c Change-Id: Iffbda9eb5e08a661aa15c6e3d1c59b678cc23b2c
Ideally this should be applied first followed by updated (WIP) versions of the checksums patches (i.e. optimized crc32 and adler32). |
@madler any suggestions? |
For further details concerning the optimization, please see: |
@@ -0,0 +1,311 @@ | |||
/* inffast.c -- fast decoding | |||
* Copyright (C) 1995-2017 Mark Adler | |||
* For conditions of distribution and use, see copyright notice in zlib.h |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should point clearly that this is a modded inffast.c (i.e. by adding the respective Copyright).
@@ -0,0 +1,1582 @@ | |||
/* inflate.c -- zlib decompression | |||
* Copyright (C) 1995-2016 Mark Adler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should point clearly that this is a modded inflate.c (i.e. by adding the respective Copyright).
Some benchmarking data running in an ARM CPU (big core A72, snappy data set), shows an average of 31% performance improvement: a) Vanilla b) inflate_fast |
@madler any comment? |
@madler ping? |
This introduces arm/neon optimizations to zlib. The first two patches are a neon optimization relating to zlib's inflate function. They increase decompression speed. It has been shipping in Chromimum since release 62 (Oct. 2017). The patches have been pulled from a PR to zlib upstream: madler/zlib#345. Patches 003 and 004 have been pulled from Fedora Core's aarch64 zlib package. They improve zlib compression speed and have been there for 4 months. Patch 005 is pulled from a PR to zlib upstream. madler/zlib#251. It's been shipping in Chromium since release 63, and increases decompression speed. Patch 006 is my own to allow 005 to merge without conflict with the previous patches. Signed-off-by: Ian Leonard <[email protected]>
* Remove old zlib readme. * Remove old zlib change history from inflate.c. * Remove old treebuild.xml and zlib pdf.
Can you rebase on the latest master? :) |
Using SIMD to perform wide loads/stores in inflate_fast, this should improve performance on ARM between
18% to 30% depending on the data.
Plus it has the fix for the InflateBack() corner case (details in: https://bugs.chromium.org/p/chromium/issues/detail?id=769880).
This optimization is shipping in Chromium since M62 (landed in the repository around September/October 2017).