-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for issue of 32-bit z_off64_t with MinGW{-w64} while _LARGEFILE64_SOURCE=1 & _LFS64_LARGEFILE=1. #934
Conversation
For ease of testing with a file larger than 4GB I have attached a gzipped file of 5,000,000,000 bytes (4,852,396 bytes compressed). The file consists entirely of bytes with value 0, except the byte at offset 0x100000000 (4GB), in order to help detect that all data is read (instead of internal file pointers of zlib wrapping around at 4GB).
|
Out of curiousity, which specific environment is defining both |
There are a couple of different scenarios where that is the case for Cygwin.
|
Interesting, thanks! I understand this is to protect against a false |
You are welcome. I am not sure what you mean by "false _WIN32 signal". Cygwin implements POSIX and as explained in the second link in my previous post, does not define _WIN32 to avoid certain problems. However, it also allows access to the Win32 API. Especially the headers in the w32api directory contain a lot of conditional code guarded by tests whether _WIN32 is or is not defined. In order to use certain Win32 functionality _WIN32 has to be defined. Originally, including windows.h was taken as an indication that the program intended to use the Win32 API and windef.h would define _WIN32 to make that possible. Apparently the Cygwin developers changed their minds later and concluded that it was better to make the programmer explicitly define it. My point being that the combination of both __CYGWIN__ and _WIN32 defined is a valid state. My PR only concerns the issue of 64-bit variants of zlib functions effectively being 32-bit with MinGW and MinGW-w64 while not introducing new issues for other environments. It does not concern whether zlib works properly with Cygwin with _WIN32 defined or not. With Cygwin I have only tested with _WIN32 undefined and have not encountered any issues. |
By false signal I mean that Either way, since this happens in existing code that will include |
I agree. If there are any issues with that, they can be addressed in a new PR. This PR does not affect any such issue. |
MinGW compiler doesn't define |
All mingw-w64 versions (1.0 and up) support I don't know about old-mingw's |
All mingw compilers define However, old mingw does support |
Thanks @sezero! (also for your mingw-w64 builds) Looking at the the guard: #elif defined(_WIN32) || (defined(__GNUC__) && defined(__int64)) edit: after reading the updated commit message. For old-mingw/mingw-w64, the current patch still requires #elif defined(__MINGW32__)
# define z_off64_t long long
#elif defined(_WIN32)
# define z_off64_t __int64
#elif [...] More clues in curl's |
Applied. Thanks. |
What is applied? Not the patch in this P/R I hope.. |
Doesn't seem to be applied, but hard to say for sure without any references. The last update to Reading existing code, mingw-w64 (default, non- For old-mingw it depends on the |
Applied, but not pushed yet. Yes, the patch here, with that line changed to:
What's wrong with it? |
As I explained above it is broken. |
It has a flaw, but you did not explain. Update: Z_SOLO is a pain. _WIN32 is defined for MinGW{-w64}, so if Z_SOLO is defined, the second part of the condition will not be tested and cause z_off64_t to be defined as __int64 (which is an unknown identifier in that case). Update 2: My updated patch to deal with Z_SOLO is: Update 3: Since this PR has been closed, for the sake of completeness my latest commit. |
The question remains: why use Relying on |
The benefits of my change:
The benefits of your change:
Relying on __int64 is not brittle as I have extensively explained in the description. MinGW, MinGW-w64 and Cygwin all rely on the type in their Win32 API headers. Support for __int64 is also a requirement for compatibility with Visual C++, which is the de facto standard compiler on Windows. zconf.h relies on __int64 being available for a set of compilers that define _WIN32. |
On the other hand Adding 2 more short lines wins in readability (and extendability) IMO. |
Look, I do not care much which implementation @madler will choose in the end, but at least use honest arguments. Suggesting that Microsoft will suddenly decide to remove __int64 is ludicrous. Also, perhaps MinGW, MinGW-w64 and Cygwin would not even follow such a change. It is mere speculation. Regardless, __int64 is used in an enormous codebase that will not go away anytime soon. |
No one suggested that...
Really now?? Geez.. |
Thanks for the contructive feedback. |
Geez. That 'Unsubscribe' button is my friend here. |
This comment was marked as outdated.
This comment was marked as outdated.
@dbjh: Seconding @sezero here with the hope to have this reverted. |
This discussion got off-topic. I identified a longstanding issue in a fundamental feature for which I proposed a change to a single line. I provided the rationale and all relevant background material, including the relevant history of MinGW, the predecessor of MinGW-w64. It took me several tries to also deal with situations that even the author sometimes makes mistakes with, but the final version is decent. I acknowledged that there many ways to solve the issue and I even pointed to your alternative, although it has its own cons. However, the attitude that I was greeted with leaves somewhat of a bad aftertaste, and now that attitude is leading to the point of rejecting a solution. It is unprofessional. This does not help the zlib project. |
Problem
The issue causes the 64-bit functions to effectively use a 32-bit type, with the obvious consequences. For example, gztell64() will not return a correct value for (gzipped) files larger than 4GB if the file pointer has been moved beyond that point.
The problems occurs with MinGW and MinGW-w64 (GCC & Clang).
The 64-bit functions are only available when the defined constants _LARGEFILE64_SOURCE and _LFS64_LARGEFILE are non-zero. In zlib.h it is stated that
and
With MinGW-w64 the problem can be worked around by defining _FILE_OFFSET_BITS as 64 and defining {Z_}HAVE_UNISTD_H, but that does not match with those statements -- only defining _LARGEFILE64_SOURCE and _LFS64_LARGEFILE as non-zero should be enough.
Cause
In case of MinGW, z_off_t is always 32-bit and _FILE_OFFSET_BITS is ignored.
The file zconf.h contains a test,
defined(_WIN32) && !defined(__GNUC__)
to guard a reference to __int64. Perhaps based on the assumption that GNU C compilers do not support that type, while other Windows compilers do. That assumption is at least partially incorrect. To make that piece of preprocessor logic work correctly for MinGW, z_off64_t should be defined as __int64.Solution
As far as I know, all ports of GCC on Windows (i.e. defining _WIN32) are (part of):
Clang is not a port of GCC, but like GCC defines __GNUC__, usually. Except for Cygwin, they all support __int64 on Windows.
I realize there are more details regarding __GNUC__ on Windows. Clang may not always define it (https://blog.conan.io/2022/10/13/Different-flavors-Clang-compiler-Windows.html). If a compiler does not define __GNUC__, it either also supports __int64 or it cannot compile zlib in its current form (if Z_LARGE64=1 and/or _LARGEFILE64_SOURCE=1 and _LFS64_LARGEFILE=1).
Clang as part of Cygwin does not support __int64. Clang as part of MinGW-w64 does.
So, although a positive test for compilers that support __int64 may be better, it is safer to stay as close to
!defined(__GNUC__)
as possible, i.e., just expand the set of compilers that will refer to __int64 with the ones mentioned above. In my pull request I proposedefined(_WIN32) && !defined(__CYGWIN__)
. That solves the problem for MinGW and MinGW-w64 (GCC & Clang) and does not introduce a problem for Cygwin (GCC & Clang).Note that MinGW-w64 (GCC & Clang) does not solely depend on __int64, because there z_off_t can be 64-bit. However, as mentioned above this only happens when _FILE_OFFSET_BITS=64 and {Z_}HAVE_UNISTD_H defined, which should not be a requirement (similar to how it is no requirement with Cygwin or on 64-bit Linux).
The attached code tell.zip can be used to demonstrate the problem and as verification that the problem is not present or has been solved. Example:
tell file_larger_than_4gb
.Update (2024-02-27)
I looked a bit deeper into support for __int64 by GCC and made the following findings:
#define __int64 long long
).The above explains the behavior I described for Clang depending on the environment (MinGW-w64 versus Cygwin).
Availability of __int64 with GCC depends on inclusion of certain types of header file, which can be a problem (compilation error) if Z_SOLO is defined. So, my fix required a test whether __int64 is defined. That is why I changed:
&& !defined(__CYGWIN__)
into:
|| (defined(__GNUC__) && defined(__int64))
There are of course numerous alternative ways to solve this issue, but this way may be the smallest modification.
In the context of this issue zconf.h itself includes the standard C header file unistd.h, so for MinGW and MinGW-w64 availability of __int64 is guaranteed.
If windows.h is included (zconf.h includes windows.h under certain conditions) __int64 is available with Cygwin too. Note that this issue doesn't occur with Cygwin.