-
Notifications
You must be signed in to change notification settings - Fork 418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make sure arrays are bytes aligned in joblib pickles #1254
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1254 +/- ##
==========================================
+ Coverage 93.81% 93.83% +0.01%
==========================================
Files 50 50
Lines 7181 7267 +86
==========================================
+ Hits 6737 6819 +82
- Misses 444 448 +4
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First comment, I haven't actually reviewed the code change in details.
joblib/numpy_pickle.py
Outdated
@@ -95,6 +97,17 @@ def write_array(self, array, pickler): | |||
# pickle protocol. | |||
pickle.dump(array, pickler.file_handle, protocol=2) | |||
else: | |||
try: | |||
current_pos = pickler.file_handle.tell() | |||
alignment = current_pos % 8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numpy documentation mentions that some dtypes would rather expect 16 bytes alignment (e.g. float128).
Also, since SIMD-optimized compute kernels would run more efficient (fully vectorized) if the buffers are directly aligned to their vector instructions sizes, maybe we should directly go for 64 bytes alignment (e.g. for AVX 512 which are currently the widest vector instructions).
In the ARM ecosystem there are also 512 bit wide vector instructions, e.g.:
https://www.fujitsu.com/global/products/computing/servers/supercomputer/a64fx/
But from what I read about SVE2 the size can be dynamic by 128 bit (16 bytes) increments.
So I have the feeling that padding by 16 bytes is a necessity to be safe (avoid crashs) but padding by 64 bytes (512 bits) can be a bit helpful for vectorized compute kernels to run more efficiently on such memmaped buffers. Going beyond is probably useless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. However I don't like the implicit padding logic implemented redundantly at write and read time.
Instead I think we should adopt an explicit single byte code to store the effective padding size at write time (without the try
/ except io.UnsupportedOperation
boilerplate for readability's sake):
current_pos = pickler.file_handle.tell()
alignment = (current_pos + 1) % NUMPY_ARRAY_ALIGNMENT_BYTES
padding_size = NUMPY_ARRAY_ALIGNMENT_BYTES - alignment
padding_bytecode = chr(padding_size).encode('ascii')
assert len(padding_bytecode) == 1 # should always hold
pickler.file_handle.write(padding_bytecode) # always written
if padding_size != 0:
padding = b'\x00' * padding_size
pickler.file_handle.write(padding)
then at read time, again without the try
/ except io.UnsupportedOperation
boilerplate:
padding_bytecode = unpickler.file_handle.read(1)
padding_size = ord(padding_bytecode.decode('ascii'))
offset += padding_size
Disclaimer: untested code snippet.
WDYT @lesteve?
EDIT: I think I had made a mistake in a first version of this code.
That does seem cleaner indeed. The only problem I see is that for old pickles (i.e. written without the change in this PR), there is no prefix code (the array bytes starts right away) so you are going to interpret the first array byte as the prefix code. Somehow there needs to be some kind of "sentinel" byte to indicate whether you have a new or old pickle file. |
This is a valid concern indeed. Good catch.
Yes we need to put some joblib-dump format versioning info somewhere at the beginning of the pickle file. But I don't think we can do it without breaking the pickle format. But we are already breaking the pickle format so maybe this is not a problem. Edit: Actually we already do this kind of hack in Each time we change the formatting, we could increase that number and explicitly maintain code-paths for backward compatibility with previous versions. |
…ange of alignment
…handle that does not support tell
I think this is ready enough for a more thorough review, removing the draft status. |
Release 1.2.0 Fix a security issue where eval(pre_dispatch) could potentially run arbitrary code. Now only basic numerics are supported. joblib/joblib#1327 Make sure that joblib works even when multiprocessing is not available, for instance with Pyodide joblib/joblib#1256 Avoid unnecessary warnings when workers and main process delete the temporary memmap folder contents concurrently. joblib/joblib#1263 Fix memory alignment bug for pickles containing numpy arrays. This is especially important when loading the pickle with mmap_mode != None as the resulting numpy.memmap object would not be able to correct the misalignment without performing a memory copy. This bug would cause invalid computation and segmentation faults with native code that would directly access the underlying data buffer of a numpy array, for instance C/C++/Cython code compiled with older GCC versions or some old OpenBLAS written in platform specific assembly. joblib/joblib#1254 Vendor cloudpickle 2.2.0 which adds support for PyPy 3.8+. Vendor loky 3.3.0 which fixes several bugs including: robustly forcibly terminating worker processes in case of a crash (joblib/joblib#1269); avoiding leaking worker processes in case of nested loky parallel calls; reliability spawn the correct number of reusable workers.
fix #563
alternative to #570