Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ms VisualStudio - Assembler speedups on x64 #376

Closed
wants to merge 64 commits into from
Closed
Changes from 1 commit
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
8266563
Ms VisualStudio - Assembler speedups on x64
Sep 3, 2018
b23f0fb
Change version number to 1.2.11.1.
madler Jan 16, 2017
6da9ebc
Cygwin does not have _wopen(), so do not create gzopen_w() there.
madler Jan 16, 2017
77d6af6
Permit a deflateParams() parameter change as soon as possible.
madler Jan 16, 2017
5acb571
Limit hash table inserts after switch from stored deflate.
madler Jan 21, 2017
4d5c35b
Fix bug when window full in deflate_stored().
madler Jan 21, 2017
6b76809
Fix CLEAR_HASH macro to be usable as a single statement.
madler Jan 23, 2017
cb0bcfe
Avoid a conversion error in gzseek when off_t type too small.
madler Feb 5, 2017
4cf78d8
Have Makefile return non-zero error code on test failure.
madler Feb 12, 2017
e61cc34
Avoid some conversion warnings in gzread.c and gzwrite.c.
madler Feb 12, 2017
3b15763
Update use of errno for newer Windows CE versions.
madler Feb 12, 2017
de11b3b
Small speedup to inflate [psumbera].
madler Feb 12, 2017
f318b48
Return an error if the gzputs string length can't fit in an int.
madler Feb 12, 2017
8b5818b
Add address checking in clang to -w option of configure.
madler Feb 19, 2017
ba6ec4a
Don't compute check value for raw inflate if asked to validate.
madler Mar 30, 2017
7960bbd
Handle case where inflateSync used when header never processed.
madler Apr 16, 2017
ba91706
Avoid the use of ptrdiff_t.
madler Jun 3, 2017
a9653fe
Avoid an undefined behavior of memcpy() in gzappend().
madler Oct 13, 2017
c53077c
Avoid undefined behaviors of memcpy() in gz*printf().
madler Oct 13, 2017
3f58e6c
Avoid an undefined behavior of memcpy() in _tr_stored_block().
madler Oct 13, 2017
149bbdb
Make the names in functions declarations identical to definitions.
madler Oct 13, 2017
f3f795e
Remove old assembler code in which bugs have manifested.
madler Oct 13, 2017
5113477
Fix deflateEnd() to not report an error at start of raw deflate.
madler Oct 13, 2017
ca7d7d4
Add legal disclaimer to README.
madler Oct 13, 2017
b599703
Emphasize the need to continue decompressing gzip members.
madler Jan 9, 2018
4c07900
Correct the initialization requirements for deflateInit2().
madler Jan 31, 2018
ddca34e
Fix a bug that can crash deflate on some input when using Z_FIXED.
madler Apr 18, 2018
fe019c7
Assure that the number of bits for deflatePrime() is valid.
madler Apr 18, 2018
8f1269b
Use a structure to make globals in enough.c evident.
madler Aug 1, 2018
dfc85b6
Use a macro for the printf format of big_t in enough.c.
madler Aug 1, 2018
01a3b96
Clean up code style in enough.c, update version.
madler Aug 1, 2018
87345a4
Use inline function instead of macro for index in enough.c.
madler Aug 2, 2018
094ad2b
Clarify that prefix codes are counted in enough.c.
madler Aug 4, 2018
3f305a7
Show all the codes for the maximum tables size in enough.c.
madler Aug 4, 2018
1cc711e
Add gznorm.c example, which normalizes gzip files.
madler Oct 6, 2018
2f6044b
Fix the zran.c example to work on a multiple-member gzip file.
madler Oct 8, 2018
4562a25
Add tables for crc32_combine(), to speed it up by a factor of 200.
madler Nov 3, 2018
5702568
Add crc32_combine_gen() and crc32_combine_op() for fast combines.
madler Nov 4, 2018
a24cf67
Speed up software CRC-32 computation by a factor of 1.5 to 3.
madler Dec 11, 2018
b5a2a75
Use atomic test and set, if available, for dynamic CRC tables.
madler Dec 26, 2018
ff0f3d1
Don't bother computing check value after successful inflateSync().
madler Jan 3, 2019
e403bf1
Correct comment in crc32.c.
madler Feb 4, 2019
1da5c4c
Add use of the ARMv8 crc32 instructions when requested.
madler Feb 18, 2019
1c9e904
Use ARM crc32 instructions if the ARM architecture has them.
madler Feb 18, 2019
5e81e4c
Explicitly note that the 32-bit check values are 32 bits.
madler Apr 5, 2019
d793747
Avoid adding empty gzip member after gzflush with Z_FINISH.
madler Apr 14, 2019
9a9c47b
Fix memory leak on error in gzlog.c.
madler May 26, 2019
815fbe2
Fix error in comment on the polynomial representation of a byte.
madler Jul 9, 2019
77fbba8
Clarify gz* function interfaces, referring to parameter names.
madler Aug 31, 2020
d2fb6a8
Change macro name in inflate.c to avoid collision in VxWorks.
madler Sep 17, 2020
36e9f70
Correct typo in blast.c.
madler Jan 18, 2021
8129aa8
Improve portability of contrib/minizip.
madler Feb 10, 2021
5667ca1
Fix indentation in minizip's zip.c.
madler Jul 8, 2021
c7e2c7c
Replace black/white with allow/block. (theresa-m)
madler Jan 1, 2022
31b7702
minizip warning fix if MAXU32 already defined. (gvollant)
madler Jan 1, 2022
660a427
Fix unztell64() in minizip to work past 4GB. (Daniël Hörchner)
madler Jan 1, 2022
aa811fd
Clean up minizip to reduce warnings for testing.
madler Jan 1, 2022
aa1e101
Add fallthrough comments for gcc.
madler Mar 27, 2022
03decff
Eliminate use of ULL constants.
madler Mar 27, 2022
c990af7
Separate out address sanitizing from warnings in configure.
madler Mar 27, 2022
f191714
Remove destructive aspects of make distclean.
madler Mar 27, 2022
38c5f3b
Check for cc masquerading as gcc or clang in configure.
madler Mar 27, 2022
46da6a6
Fix crc32.c to compile local functions only if used.
madler Mar 27, 2022
8a346fc
zlib 1.2.12
madler Mar 27, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Use a structure to make globals in enough.c evident.
  • Loading branch information
madler authored and CristiFati committed Sep 2, 2022
commit 8f1269bd8ef858a3320d45b83be4ef1e3a91d68d
175 changes: 89 additions & 86 deletions examples/enough.c
Original file line number Diff line number Diff line change
Expand Up @@ -167,32 +167,34 @@ struct tab { /* type for been here check */
*/

/* Globals to avoid propagating constants or constant pointers recursively */
local int max; /* maximum allowed bit length for the codes */
local int root; /* size of base code table in bits */
local int large; /* largest code table so far */
local size_t size; /* number of elements in num and done */
local int *code; /* number of symbols assigned to each bit length */
local big_t *num; /* saved results array for code counting */
local struct tab *done; /* states already evaluated array */
struct {
int max; /* maximum allowed bit length for the codes */
int root; /* size of base code table in bits */
int large; /* largest code table so far */
size_t size; /* number of elements in num and done */
int *code; /* number of symbols assigned to each bit length */
big_t *num; /* saved results array for code counting */
struct tab *done; /* states already evaluated array */
} g;

/* Index function for num[] and done[] */
#define INDEX(i,j,k) (((size_t)((i-1)>>1)*((i-2)>>1)+(j>>1)-1)*(max-1)+k-1)
#define INDEX(i,j,k) (((size_t)((i-1)>>1)*((i-2)>>1)+(j>>1)-1)*(g.max-1)+k-1)

/* Free allocated space. Uses globals code, num, and done. */
local void cleanup(void)
{
size_t n;

if (done != NULL) {
for (n = 0; n < size; n++)
if (done[n].len)
free(done[n].vec);
free(done);
if (g.done != NULL) {
for (n = 0; n < g.size; n++)
if (g.done[n].len)
free(g.done[n].vec);
free(g.done);
}
if (num != NULL)
free(num);
if (code != NULL)
free(code);
if (g.num != NULL)
free(g.num);
if (g.code != NULL)
free(g.code);
}

/* Return the number of possible Huffman codes using bit patterns of lengths
Expand All @@ -214,11 +216,11 @@ local big_t count(int syms, int len, int left)
return 1;

/* note and verify the expected state */
assert(syms > left && left > 0 && len < max);
assert(syms > left && left > 0 && len < g.max);

/* see if we've done this one already */
index = INDEX(syms, left, len);
got = num[index];
got = g.num[index];
if (got)
return got; /* we have -- return the saved result */

Expand All @@ -231,8 +233,8 @@ local big_t count(int syms, int len, int left)
/* we can use at most this many bit patterns, lest there not be enough
available for the remaining symbols at the maximum length (if there were
no limit to the code length, this would become: most = left - 1) */
most = (((code_t)left << (max - len)) - syms) /
(((code_t)1 << (max - len)) - 1);
most = (((code_t)left << (g.max - len)) - syms) /
(((code_t)1 << (g.max - len)) - 1);

/* count all possible codes from this juncture and add them up */
sum = 0;
Expand All @@ -247,7 +249,7 @@ local big_t count(int syms, int len, int left)
assert(sum != 0);

/* save the result and return it */
num[index] = sum;
g.num[index] = sum;
return sum;
}

Expand All @@ -265,14 +267,14 @@ local int beenhere(int syms, int len, int left, int mem, int rem)

/* point to vector for (syms,left,len), bit in vector for (mem,rem) */
index = INDEX(syms, left, len);
mem -= 1 << root;
mem -= 1 << g.root;
offset = (mem >> 3) + rem;
offset = ((offset * (offset + 1)) >> 1) + rem;
bit = 1 << (mem & 7);

/* see if we've been here */
length = done[index].len;
if (offset < length && (done[index].vec[offset] & bit) != 0)
length = g.done[index].len;
if (offset < length && (g.done[index].vec[offset] & bit) != 0)
return 1; /* done this! */

/* we haven't been here before -- set the bit to show we have now */
Expand All @@ -284,14 +286,15 @@ local int beenhere(int syms, int len, int left, int mem, int rem)
do {
length <<= 1;
} while (length <= offset);
vector = realloc(done[index].vec, length);
vector = realloc(g.done[index].vec, length);
if (vector != NULL)
memset(vector + done[index].len, 0, length - done[index].len);
memset(vector + g.done[index].len, 0,
length - g.done[index].len);
}

/* otherwise we need to make a new vector and zero it out */
else {
length = 1 << (len - root);
length = 1 << (len - g.root);
while (length <= offset)
length <<= 1;
vector = calloc(length, sizeof(char));
Expand All @@ -305,12 +308,12 @@ local int beenhere(int syms, int len, int left, int mem, int rem)
}

/* install the new vector */
done[index].len = length;
done[index].vec = vector;
g.done[index].len = length;
g.done[index].vec = vector;
}

/* set the bit */
done[index].vec[offset] |= bit;
g.done[index].vec[offset] |= bit;
return 0;
}

Expand All @@ -328,29 +331,29 @@ local void examine(int syms, int len, int left, int mem, int rem)
/* see if we have a complete code */
if (syms == left) {
/* set the last code entry */
code[len] = left;
g.code[len] = left;

/* complete computation of memory used by this code */
while (rem < left) {
left -= rem;
rem = 1 << (len - root);
rem = 1 << (len - g.root);
mem += rem;
}
assert(rem == left);

/* if this is a new maximum, show the entries used and the sub-code */
if (mem > large) {
large = mem;
if (mem > g.large) {
g.large = mem;
printf("max %d: ", mem);
for (use = root + 1; use <= max; use++)
if (code[use])
printf("%d[%d] ", code[use], use);
for (use = g.root + 1; use <= g.max; use++)
if (g.code[use])
printf("%d[%d] ", g.code[use], use);
putchar('\n');
fflush(stdout);
}

/* remove entries as we drop back down in the recursion */
code[len] = 0;
g.code[len] = 0;
return;
}

Expand All @@ -367,32 +370,32 @@ local void examine(int syms, int len, int left, int mem, int rem)
/* we can use at most this many bit patterns, lest there not be enough
available for the remaining symbols at the maximum length (if there were
no limit to the code length, this would become: most = left - 1) */
most = (((code_t)left << (max - len)) - syms) /
(((code_t)1 << (max - len)) - 1);
most = (((code_t)left << (g.max - len)) - syms) /
(((code_t)1 << (g.max - len)) - 1);

/* occupy least table spaces, creating new sub-tables as needed */
use = least;
while (rem < use) {
use -= rem;
rem = 1 << (len - root);
rem = 1 << (len - g.root);
mem += rem;
}
rem -= use;

/* examine codes from here, updating table space as we go */
for (use = least; use <= most; use++) {
code[len] = use;
g.code[len] = use;
examine(syms - use, len + 1, (left - use) << 1,
mem + (rem ? 1 << (len - root) : 0), rem << 1);
mem + (rem ? 1 << (len - g.root) : 0), rem << 1);
if (rem == 0) {
rem = 1 << (len - root);
rem = 1 << (len - g.root);
mem += rem;
}
rem--;
}

/* remove entries as we drop back down in the recursion */
code[len] = 0;
g.code[len] = 0;
}

/* Look at all sub-codes starting with root + 1 bits. Look at only the valid
Expand All @@ -407,30 +410,30 @@ local void enough(int syms)
size_t index; /* index of this case in *num */

/* clear code */
for (n = 0; n <= max; n++)
code[n] = 0;
for (n = 0; n <= g.max; n++)
g.code[n] = 0;

/* look at all (root + 1) bit and longer codes */
large = 1 << root; /* base table */
if (root < max) /* otherwise, there's only a base table */
g.large = 1 << g.root; /* base table */
if (g.root < g.max) /* otherwise, there's only a base table */
for (n = 3; n <= syms; n++)
for (left = 2; left < n; left += 2)
{
/* look at all reachable (root + 1) bit nodes, and the
resulting codes (complete at root + 2 or more) */
index = INDEX(n, left, root + 1);
if (root + 1 < max && num[index]) /* reachable node */
examine(n, root + 1, left, 1 << root, 0);
index = INDEX(n, left, g.root + 1);
if (g.root + 1 < g.max && g.num[index]) /* reachable node */
examine(n, g.root + 1, left, 1 << g.root, 0);

/* also look at root bit codes with completions at root + 1
bits (not saved in num, since complete), just in case */
if (num[index - 1] && n <= left << 1)
examine((n - left) << 1, root + 1, (n - left) << 1,
1 << root, 0);
if (g.num[index - 1] && n <= left << 1)
examine((n - left) << 1, g.root + 1, (n - left) << 1,
1 << g.root, 0);
}

/* done */
printf("done: maximum of %d table entries\n", large);
printf("done: maximum of %d table entries\n", g.large);
}

/*
Expand Down Expand Up @@ -464,66 +467,66 @@ int main(int argc, char **argv)
code_t word; /* for counting bits in code_t */

/* set up globals for cleanup() */
code = NULL;
num = NULL;
done = NULL;
g.code = NULL;
g.num = NULL;
g.done = NULL;

/* get arguments -- default to the deflate literal/length code */
syms = 286;
root = 9;
max = 15;
g.root = 9;
g.max = 15;
if (argc > 1) {
syms = atoi(argv[1]);
if (argc > 2) {
root = atoi(argv[2]);
g.root = atoi(argv[2]);
if (argc > 3)
max = atoi(argv[3]);
g.max = atoi(argv[3]);
}
}
if (argc > 4 || syms < 2 || root < 1 || max < 1) {
if (argc > 4 || syms < 2 || g.root < 1 || g.max < 1) {
fputs("invalid arguments, need: [sym >= 2 [root >= 1 [max >= 1]]]\n",
stderr);
return 1;
}

/* if not restricting the code length, the longest is syms - 1 */
if (max > syms - 1)
max = syms - 1;
if (g.max > syms - 1)
g.max = syms - 1;

/* determine the number of bits in a code_t */
for (n = 0, word = 1; word; n++, word <<= 1)
;

/* make sure that the calculation of most will not overflow */
if (max > n || (code_t)(syms - 2) >= (((code_t)0 - 1) >> (max - 1))) {
if (g.max > n || (code_t)(syms - 2) >= (((code_t)0 - 1) >> (g.max - 1))) {
fputs("abort: code length too long for internal types\n", stderr);
return 1;
}

/* reject impossible code requests */
if ((code_t)(syms - 1) > ((code_t)1 << max) - 1) {
if ((code_t)(syms - 1) > ((code_t)1 << g.max) - 1) {
fprintf(stderr, "%d symbols cannot be coded in %d bits\n",
syms, max);
syms, g.max);
return 1;
}

/* allocate code vector */
code = calloc(max + 1, sizeof(int));
if (code == NULL) {
g.code = calloc(g.max + 1, sizeof(int));
if (g.code == NULL) {
fputs("abort: unable to allocate enough memory\n", stderr);
return 1;
}

/* determine size of saved results array, checking for overflows,
allocate and clear the array (set all to zero with calloc()) */
if (syms == 2) /* iff max == 1 */
num = NULL; /* won't be saving any results */
g.num = NULL; /* won't be saving any results */
else {
size = syms >> 1;
if (size > ((size_t)0 - 1) / (n = (syms - 1) >> 1) ||
(size *= n, size > ((size_t)0 - 1) / (n = max - 1)) ||
(size *= n, size > ((size_t)0 - 1) / sizeof(big_t)) ||
(num = calloc(size, sizeof(big_t))) == NULL) {
g.size = syms >> 1;
if (g.size > ((size_t)0 - 1) / (n = (syms - 1) >> 1) ||
(g.size *= n, g.size > ((size_t)0 - 1) / (n = g.max - 1)) ||
(g.size *= n, g.size > ((size_t)0 - 1) / sizeof(big_t)) ||
(g.num = calloc(g.size, sizeof(big_t))) == NULL) {
fputs("abort: unable to allocate enough memory\n", stderr);
cleanup();
return 1;
Expand All @@ -543,25 +546,25 @@ int main(int argc, char **argv)
printf("%llu %d-codes\n", got, n);
}
printf("%llu total codes for 2 to %d symbols", sum, syms);
if (max < syms - 1)
printf(" (%d-bit length limit)\n", max);
if (g.max < syms - 1)
printf(" (%d-bit length limit)\n", g.max);
else
puts(" (no length limit)");

/* allocate and clear done array for beenhere() */
if (syms == 2)
done = NULL;
else if (size > ((size_t)0 - 1) / sizeof(struct tab) ||
(done = calloc(size, sizeof(struct tab))) == NULL) {
g.done = NULL;
else if (g.size > ((size_t)0 - 1) / sizeof(struct tab) ||
(g.done = calloc(g.size, sizeof(struct tab))) == NULL) {
fputs("abort: unable to allocate enough memory\n", stderr);
cleanup();
return 1;
}

/* find and show maximum inflate table usage */
if (root > max) /* reduce root to max length */
root = max;
if ((code_t)syms < ((code_t)1 << (root + 1)))
if (g.root > g.max) /* reduce root to max length */
g.root = g.max;
if ((code_t)syms < ((code_t)1 << (g.root + 1)))
enough(syms);
else
puts("cannot handle minimum code lengths > root");
Expand Down