-
-
Notifications
You must be signed in to change notification settings - Fork 264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
benchmark comparisons against ISA-L #1478
Comments
Issues probably isn't the correct place to put this question. I have run benchmarks against their adler32 checksum implementation, and there at least, we are several times faster. This is only a small portion of the deflate/inflate algorithm, though. Does ISA-L provide a zlib-compatible interface to test against? |
https://github.com/pycompression/Python-isal provides zlib functions |
We'd need C bindings to directly plug this into our existing benchmarks. That having been said, it's definitely possible to leverage python's zlib interfaces with zlib-ng for a direct comparison. That, however, would not been MR worthy into the repo. Unless you're suggesting we add benchmark data into the repo. |
For your edification, I managed to build this project (though, it seems rely on Makefiles that aren't present, so I had to switch it to point to my distro's package of isa-l). Here's their benchmark output with
So, what's apparent is that we're winning handedly in crc and adler checksums (which is funny because I think a lot of distributions use this as a "fast" implementation for things like block devices specifically for the CRC checksum implementation). Now, they are winning, seemingly, on the compression and decompression tests. However, I did not write these benchmarks nor do I know what type of data they are compressing. The compression and decompression speeds are very data dependent, so it's not exactly a definitive test. There's also the caveat that isa-l shrinks 9 levels to only 3 and doesn't offer anywhere near the same compression ratios it would seem. Given how the encoding works, it wouldn't be surprising that isa-l is faster with poorer compression ratios. |
Another interesting observation is if I substitute their genome based test dataset for the one we commonly use (the large Silesia test corpus), we start winning a few benchmarks for decompression:
This is, perhaps, a slightly more compressible dataset? In the past I've noticed that we do pretty well in sequences that aren't mostly encoding literals. |
Honestly I think the testing methodology in calling through python's interfaces is perhaps flawed as well but I can't really assess that fairly without understanding the user's actual use case. Here's my invoking pigz with 8 threads (well, processes), compression level 4, with zlib-ng injected:
Here's igzip, level 3, with 8 threads:
Granted, there's also a lot of other crap in process launch that that anecdote should be taken with a grain of salt but, I don't think that isa-l walks away with that many wins. And in all fairness, I'm testing their latest release code against our develop branch (what is about to be the release soon, though). There might be quite a few improvements in isa-l as well. I think ISA-L's approach is similar to IPP's, no? Custom deflate dictionaries, if I'm reading this correctly (with some SIMD acceleration for some of the hot spots). That having been said, their adler32 code hasn't improved and from what I remember, we were significantly winning on the internal benchmark distributed with ISA-L for that one. Performance wins were limited by the total memory bandwidth, but when doing the "hot in cache" test, we were winning by significant margins. It's also interesting to see their "fast zero" code use AVX512 when in most of our measurements, the clock penalty made even short zeroing sequences more expensive to switch to 512 bit copy operations (or anything too compute-lite). |
@KungFuJesus For an entirely fair comparison ISA-L level 1 should be compared with zlib-ng level 2 or ISA-L level 0 with zlib-ng level 1. The compression ratio at ISA-L level 1 is much better than that of zlib-ng level 1. The python-isal benchmark suite tests ISA-L against zlib, this is mostly a test to see if the bindings are any good. ISA-L should of course be faster than the zlib, but sometimes there is an initialisation overhead as well as an overhead from the bindings itself. If I messed up the bindings, the overhead would be so much that zlib would actually win for most of te small data sizes. So I have this benchmark to make sure the bindings are really capable of being a drop-in replacement. The python zlib-ng bindings are also written by me and basically a copy of the python-isal bindings. So that is actually more of an apples-to-apples comparison. The size to look for is 128KB these are the chunk sizes that are typically (de)compressed. (Used internally in pigz). I see ISA-L still wins there. This is not really surprising given the amount of custom assembly that was written. |
Extended benchmark TurboBench: Dynamic/Static web content compression benchmark including zstd and memory usage. |
@KungFuJesus At least for me, decompressing big gzip files with igzip is a lot faster than with zlib-ng, although the 2.1.3 release got twice as fast as the 2.0.7 release, when using minigzip (with pigz the speed difference is less big): # Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz = 814M
❯ time igzip -c -d Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz > /dev/null
real 0m4.745s
user 0m4.516s
sys 0m0.206s
❯ module load pigz
❯ module load zlib-ng/2.0.7-GCCcore-10.3.0
# pigz with zlib-ng 2.0.7:
❯ time pigz -c -d Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz > /dev/null
real 0m14.172s
user 0m17.501s
sys 0m1.189s
# minigzip with zlib-ng 2.0.7:
❯ time minigzip -c -d Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz > /dev/null
real 0m14.076s
user 0m13.721s
sys 0m0.320s
❯ module load zlib-ng/2.1.3-GCCcore-10.3.0
# pigz with zlib-ng 2.1.3:
❯ time pigz -c -d Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz > /dev/null
real 0m10.280s
user 0m10.573s
sys 0m1.202s
# minigzip with zlib-ng 2.1.3:
❯ time minigzip -c -d Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz > /dev/null
real 0m7.172s
user 0m6.915s
sys 0m0.234s
❯ module unload zlib-ng
❯ module load zlib/1.2.13
# pigz with just zlib 1.2.13:
❯ time pigz -c -d Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz > /dev/null
real 0m12.016s
user 0m13.206s
sys 0m1.186s igzip is by default compiled withouth pthreads support, unless you use # igzip compiled with pthreads support: (when running make -f Makefile.unx)
❯ time programs/igzip -T 8 -3 -k -z hs.fa
real 0m2.086s
user 0m9.264s
sys 0m1.403s
# igzip compiled without pthreads support (default if running just make):
❯ time igzip -T 8 -3 -k -z hs.fa
real 0m10.289s
user 0m8.890s
sys 0m1.354s
❯ time pigz -3 -k -p 8 hs.fa
real 0m8.491s
user 0m48.517s
sys 0m1.736s
❯ time pigz -2 -k -p 8 hs.fa
real 0m5.076s
user 0m38.331s
sys 0m1.885s
# File sizes
3151425857 hs.fa
927950066 hs.pigz_l3.fa.gz
998514930 hs.pigz_l2.fa.gz
977666017 hs.igzip_l3.fa.gz |
Could you share which architecture this benchmark ran on? Also, how did you plug in zlib-ng instead of zlib? Thanks! |
|
Thanks for the quick response. Did you have to make any code changes? Because benchmark script uses libz.so, so how do you use zlib_ng.so instead? |
The same way you make any project use zlib-ng in place of zlib: |
Thanks a lot for sharing. I confirm the Alder32 performance difference, but not the CRC32. Do you have turbo boost enabled? Also, it would be a good idea to increase the number of iterations to have some more stable results in benchmark.py. |
Yes. Then again my isa-l was provided by the distro, so it could have been a version or two behind. As far as I'm aware ISA-L does leverage (v)pclmulqdq in an extremely similar manner to us, so I wouldn't be surprised if those were basically on par. Their adler32 implementation is pretty poor for a number of reasons, though. |
Hello all,
do you have comparison benchmarks against https://github.com/intel/isa-l ?
The text was updated successfully, but these errors were encountered: