Skip to content

Commit

Permalink
Simplify chunking in the copy ladder here
Browse files Browse the repository at this point in the history
As it turns out, trying to peel off the remainder with so many branches
caused the code size to inflate a bit too much that this function
wouldn't inline without some fairly aggressive optimization flags. Only
catching vector sized chunks here makes the loop body small enough and
having the byte by byte copy idiom at the bottom gives the compiler some
flexibility that it is likely to do something there.
  • Loading branch information
KungFuJesus authored and Dead2 committed Sep 26, 2024
1 parent 8a1205f commit b80eb4c
Showing 1 changed file with 2 additions and 21 deletions.
23 changes: 2 additions & 21 deletions inflate_p.h
Original file line number Diff line number Diff line change
Expand Up @@ -174,25 +174,13 @@ static inline uint8_t* chunkcopy_safe(uint8_t *out, uint8_t *from, uint64_t len,
* behind or lookahead distance. */
uint64_t non_olap_size = llabs(from - out); // llabs vs labs for compatibility with windows

memcpy(out, from, (size_t)non_olap_size);
out += non_olap_size;
from += non_olap_size;
len -= non_olap_size;

/* So this doesn't give use a worst case scenario of function calls in a loop,
* we want to instead break this down into copy blocks of fixed lengths */
while (len) {
tocopy = MIN(non_olap_size, len);
len -= tocopy;

while (tocopy >= 32) {
memcpy(out, from, 32);
out += 32;
from += 32;
tocopy -= 32;
}

if (tocopy >= 16) {
while (tocopy >= 16) {
memcpy(out, from, 16);
out += 16;
from += 16;
Expand All @@ -213,14 +201,7 @@ static inline uint8_t* chunkcopy_safe(uint8_t *out, uint8_t *from, uint64_t len,
tocopy -= 4;
}

if (tocopy >= 2) {
memcpy(out, from, 2);
out += 2;
from += 2;
tocopy -= 2;
}

if (tocopy) {
while (tocopy--) {
*out++ = *from++;
}
}
Expand Down

0 comments on commit b80eb4c

Please sign in to comment.