-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ms VisualStudio - Assembler speedups on x64 #376
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
GerHobbelt
pushed a commit
to GerHobbelt/zlib
that referenced
this pull request
Nov 20, 2021
This issue I mentioned in madler#370. Optimization code such as crc_folding.c, deflate_quick_sse.c, fill_window_sse.c, and insert_string_sse.c were not being compiled when the compiler was MSVC because the checks for the instrincs were not being done and the HAVE_[TARGET]_INTRIN variables weren't being set. I could have simply set HAVE_[TARGET]_INTRIN variables to ON manually in the case of MSVC, but it is better this way to have one path for all the compilers (that it runs and checks some code for determination). I have just added MSVC code where necessary in the checks. * Rename HAVE_SSE42_INTRIN to HAVE_SSE42CRC_INLINE_ASM. * Added msvc inline asm support to insert_string_sse.c
This commit allows a parameter change even if the input data has not all been compressed and copied to the application output buffer, so long as all of the input data has been compressed to the internal pending output buffer. This also allows an immediate deflateParams change so long as there have been no deflate calls since initialization or reset.
This limits hash table inserts to the available data in the window and to the sliding window size in deflate_stored(). The hash table inserts are deferred until deflateParams() switches to a non-zero compression level.
As it is used in deflateParams().
This is a problem in the odd case that the second argument of LSEEK is a larger type than off_t. Apparently MinGW defines off_t to be 32 bits, but _lseeki64 has a 64-bit second argument. Also undo a previous commit to permit MinGW to use _lseeki64.
Seeing a few percent speedup by using a pointer instead of an assigned structure. This seems to help the compiler to optimize better.
If zlib and/or gzip header processing was requested, but a header was never provided and inflateSync was used successfully, then the inflate state would be inconsistent, trying to compute a check value but with no flags set. This commit sets the inflate mode to raw in this case, since there is no other assumption that can be made if a header was requested but never seen.
This isn't the right type anyway to assure that it contains a pointer. That type would be intptr_t or uintptr_t. However the C99 standard says that those types are optional, so their use would not be portable. This commit simply uses size_t or whatever configure decided to use for size_t. That would be the same length as ptrdiff_t, and so will work just as well. The code checks to see if the length of the type used is the same as the length of a void pointer, so there is already protection against the use of the wrong type. The use of size_t (or ptrdiff_t) will almost always work, as all modern architectures have an array size that is the same as the pointer size. Only old segmented architectures would have to fall back to the slower CRC-32 calculation, where the amount of memory that can be accessed is larger than the maximum array size.
Allegedly the behavior of memcpy() is undefined if the source pointer is NULL, even if the number of bytes to copy is zero.
In addition, there is not sufficient gain from the inflate assembler code to warrant its inclusion.
This bug was reported by Danilo Ramos of Eideticom, Inc. It has lain in wait 13 years before being found! The bug was introduced in zlib 1.2.2.2, with the addition of the Z_FIXED option. That option forces the use of fixed Huffman codes. For rare inputs with a large number of distant matches, the pending buffer into which the compressed data is written can overwrite the distance symbol table which it overlays. That results in corrupted output due to invalid distances, and can result in out-of-bound accesses, crashing the application. The fix here combines the distance buffer and literal/length buffers into a single symbol buffer. Now three bytes of pending buffer space are opened up for each literal or length/distance pair consumed, instead of the previous two bytes. This assures that the pending buffer cannot overwrite the symbol table, since the maximum fixed code compressed length/distance is 31 bits, and since there are four bytes of pending space for every three bytes of symbol space.
inflateSync() is used to skip invalid deflate data, which means that the check value that was being computed is no longer useful. This commit turns off the check value computation, and furthermore allows a successful return if the compressed data terminated in a graceful manner. This commit also fixes a bug in the case that inflateSync() is used before a header is ever processed. In that case, there is no knowledge of a trailer, so the remainder is treated as raw.
Define the macro Z_ARM_CRC32 at compile time to use the ARMv8 (aarch64) crc32x and crc32b instructions. This code does not check for the presence of the crc32 instructions. Those instructions are optional for ARMv8.0, though mandatory for ARMv8.1 and later. The use of the crc32 instructions is about ten times as fast as the software braided calculation of the CRC-32. This can noticeably speed up the decompression of gzip streams.
The ARM crc32 instructions will be used only if an architecture is explicitly specified at compile time that has those instructions. For example, -march=armv8.1-a or -march=armv8-a+crc, or if the machine being compiled on has the instructions, -march=native.
Thank you Adam Richter.
The issue is that unztell64() does not return the correct value if the position in the current file (in the ZIP archive) is beyond 4 GB. The cause is that unzReadCurrentFile() does not account for pfile_in_zip_read_info->stream.total_out at line 1854 of unzip.c wrapping around (it is a 32-bit variable). So, on line 1860 uTotalOutAfter can be *less* than uTotalOutBefore, propagating the wraparound to uOutThis, which in turn is added to pfile_in_zip_read_info->total_out_64. That has the effect of subtracting 4 GB.
Also fix Makefile test target and permit added compile options.
Note intentional switch case fall throughs to avoid gcc warnings.
0xffffffffffffffffULL was causing warnings for C99 usage. The suffix for the constant is not necessary anyway, so this commit removes them.
@CristiFati: Why have you closed your PR? Can you check latest comments of this ticket? |
Open
@Neustradamus : I explained why Id id that:
I briefly went through the issue you pointed out (noticed this was mentioned) but that's it. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
More details on [SO]: CMake zlib build on Windows - (@CristiFati's answer).
Note:
Closing this one as:
No longer applies (actually it no longer applied at creation time)
It's included in Re enable ASM speedups on Win #701
Was created from master branch