Ushering out strlcpy()
Ushering out strlcpy()
Posted Aug 31, 2022 3:51 UTC (Wed) by wtarreau (subscriber, #51152)In reply to: Ushering out strlcpy() by tialaramex
Parent article: Ushering out strlcpy()
Optimized grep like those relying on the Boyer-Moore algorithm are extremely effective to look for patterns in large blocks (a complete file), because the initialization cost is very quickly amortized (often after just a few lines).
HTTP headers are painful because they're rarely large enough to amortize many optimizations, need to be processed at very high frequency and as such often suffer from even a function call or the construct of a likely()/unlikely().
I wrote a log parser quite a long time ago, that I'm still using, that processes between 2 and 4 GB of logs per second, looking for some fields or producing statistics on output. It shared similar principles, and all the time was spent in fgets() looking for the LF! I reimplemented this in asm for i386 by then, then for x86_64, then added an option to choose whether or not to use memchr() instead when glibc improved and could sometimes be faster and compensate for the extra cost of the function call, and the differences can be significant. I then adopted similar mechanisms for HTTP headers and gained ~10% global performance probably indicating +50-100% on the parsing alone, so that does count quite a lot.
For the HTTP header example above, among the possibilities to be faster are taking into account the abysmally low probability to meet the pattern. On average sized headers it could be faster to use the fast grep twice over the block if you know it's fit in L1 cache, searching for "\n " then "\n\t" in the whole block at once. Since most headers will be at least ~100 bytes long, this could amortize the cost of initializing the lookup function.
But I definitely agree with you that in any case this must be measured, confronted to various pathological patterns (and that's what I'm doing all day long, trashing lots of apparently smart code changes that don't pass the performance test on real data).