-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compressed Integer Conversion #16
Comments
Thanks, I'd never even heard of LEB128. It's unfortunate that I'm only finding out about it now, after the format is fixed, but it's probably not a huge deal. You're right about the advantage of using a bitmask. I'll take a look at refactoring that part of my code. Just to note, integers can be more than five bytes on 64-bit architectures. |
Yes, I already changed it to long in my code and added some checks (10 bytes max). I think this is what you also use in your code (implicitly on 64bit systems). Btw: why not reserve LEB128 for a v2.0 header? ;-) |
On a 32-bit system that isn't using 64-bit size_t's, zchunk should fail to read any sizes that require more than 32 bits. But, there is a define that you can use on 32 bit to set size_t to 64 bits (I've temporarily forgotten what it is), which would allow zchunk to work with a significantly larger maximum chunk size. In the real world, I wouldn't expect that anybody would ever need >4GB chunk sizes, but one never knows. After all, nobody needs more than 640K, right? ;)
We could, but I'm not sure what the point would be. We will always need to support reading the v1 header, so it's not like we could throw away the current code. |
Agreed. So on a 64bit system, the maximum number of bytes will be 10. On a 32bit system, the maximum number of bytes will be 5 ( For compatiblity, I still chose You can see the code I am using here: There are only two methods, compress and decompress. |
About your comment
zchunk/src/lib/compint.c
Lines 61 to 63 in c831e22
Congratulations, you just reinvented LEB128 (https://en.wikipedia.org/wiki/LEB128), except you reversed the continue-flag!
Thus, you just made us implement our own reversed LEB128-logic 😆!
Actual answer to our comment: Yes, there is.
On that wiki page there is a pseudo-code, which will work just fine, except that you have to reverse the bitmask logic.
By the way, I think it is easier to use bitmasks (
FLAG_LAST_BYTE = 0b10000000
), because it really shows us the highest bit is a flag. If you write "128" and don't even assign a constant, everyone reading the code will wonder why there is this magic number.Java code:
As you can see, I extract the bytes from the file beforehand (it cannot be more than 5 bytes anyway for an integer). I can have the length of the byte array seperately, which is feasible in java.
The text was updated successfully, but these errors were encountered: