ãµã¤ãã¦ãºç¤¾å ã§ã¯C++ã§éçºãã¦ãã製åãããã¾ãã æªç¥ã®ãããã¡ãªã¼ãã¼ã©ã³ãªã©ã®èå¼±æ§ã¸ã®å¯¾çã¨ãã¦ãéè¦ãªã³ã³ãã¼ãã³ãã«ã¤ãã¦ã¯ãããã¯ã·ã§ã³ç°å¢ã§å©ç¨ãã¦ãããã¤ããªã§ã AddressSanitizer ãæå¹ã«ãã¦ãã«ããã¦ãã¾ãã
ãã®è£½åã§å©ç¨ãã¦ããã³ã³ãã¤ã©ãgcc5.3.0ããgcc7.5.0ã«æ´æ°ããã¨ããæ§è½å£åãçºçãã¾ããã 製åã³ã¼ãã¨ã¯å¥ã®é¨åãåå ã®ãããæ ¹æ¬åå ã®è¿½è·¡ãé£ãããã§ããperf,bpftraceã使ã£ã¦æ§è½å£åã追ãããã¦ã¿ã¾ãããã
æ¬è¨äºã§å©ç¨ãã¦ããAddressSanitizer, bpftrace, perfã³ãã³ãã¯ãããä¸ã«è¯è³ªãªè¨äºãããã¾ãã®ã§ã使ãæ¹ãªã©ã®è§£èª¬ã¯ä»åã¯çç¥ããã¦ããã ãã¾ãã
gcc7.5.0ã«ããã¦ãæ§è½å£åãçºçããåç¾ã³ã¼ãã¨ãã¦æ¬¡ã®ãããªãã®ãç¨æãã¾ããã
#include <string.h> #include <iostream> #include <string> #include <vector> int main() { const int L = 1000000; const int N = 1000; const int R = 100; std::vector<std::string> bufs(N, std::string(L, 'A')); int hash = 0; for(int r = 0; r < R; r++) { for (int i = 0; i < N; i++) { int sum = strlen(bufs[i].c_str() + r); const char* found = strchr(bufs[i].c_str(), 'A'); hash += (sum + *found) % 10; } } std::cout << hash << std::endl; }
AddressSanitizerãæå¹å(-fsanitize=address -static-libasan
)ãã¦ã³ã³ãã¤ã«ãå®è¡æéãè¨æ¸¬ãã¦ã¿ã¾ãã
$ gcc -v # => gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04) $ g++ -DNDEBUG -std=c++11 -Wall -Wextra -fsanitize=address -O3 -g -static-libasan samplecode.cpp $ time ./a.out 450000 real 0m36.898s user 0m36.426s sys 0m0.380s
æ¯è¼ã¨ãã¦AddressSanitizerãç¡å¹åãã¦ã³ã³ãã¤ã«ãå®è¡æéãè¨æ¸¬ãã¦ã¿ã¾ãã
$ g++ -DNDEBUG -std=c++11 -Wall -Wextra -O3 -g samplecode.cpp $ time ./a.out 450000 real 0m5.058s user 0m4.805s sys 0m0.253s
AddressSanitizerã®æ§è½å½±é¿ã«é¢ããèå¯ ãèªãã¨
The average slowdown introduced by full instrumentation (red) is 1.73. When only writes are instrumented (blue), the average slowdown is 1.26
ã¨ããã¾ãã®ã§ãAddressSanitizerã®æå³ãã¦ããªãæ§è½å£åãèµ·ãã¦ããã¨èãã¦ããããã§ãã
perfã³ãã³ãã§ããã©ã¼ãã³ã¹å£åã®ãããã¹ããããæ¢ã
AddressSanitizeræå¹åãããã¤ããªãperfã³ãã³ãã§è¨æ¸¬ãã¦ã¿ã¾ãã
$ sudo perf record ./a.out --stdio $ sudo perf report --stdio # To display the perf.data header info, please use --header/--header-only options. # Overhead Command Shared Object Symbol # ........ ....... .................. ..................................................................................... # 83.86% a.out a.out [.] __sanitizer::internal_strlen 12.70% a.out libc-2.27.so [.] __strlen_avx2 1.36% a.out a.out [.] __sanitizer::mem_is_zero
perfã®è¨ºæçµæãã__sanitizer::internal_strlen
ãæ§è½å£åã®å¤§é¨åãå ãã¦ãããã¨ããããã¾ãã
__sanitizer::internal_strlen
ã¯gccã®ã¬ãã¸ããªãæ¢ãã¨AddressSanitizerã®ã³ã¼ãã¨ãããã¨ããããã¾ãã
__sanitizer::internal_strlen
ãé
ãçç±ã¯æå¾ã«ç°¡åã«ç´¹ä»ãã¦ãã¾ãããã以ä¸ã®æ·±å
¥ãã¯ãã¾ããã
bpftraceã使ã£ã¦ããããã³ã¼ããçµãè¾¼ã
bpftraceã¯ãã¤ããªä¸ã®å¼ã³åºãããã¬ã¼ã¹ãããã¨ãã§ããã®ã§ãããå©ç¨ãã¾ãã
bpftraceã§çºçããuprobeã®ãã¬ã¼ã·ã³ã°ã«strlené¢é£ã«çµãè¾¼ãã§ã¿ã¾ãããã
$ sudo bpftrace -l 'uprobe:/home/yokotaso/a.out' | grep strlen uprobe:/home/yokotaso/a.out:_ZN11__sanitizer15internal_strlenEPKc uprobe:/home/yokotaso/a.out:__asan_internal_strlen uprobe:/home/yokotaso/a.out:__interceptor_strlen uprobe:/home/yokotaso/a.out:__interceptor_strlen.part.24 uprobe:/home/yokotaso/a.out:strlen
_ZN11__sanitizer15internal_strlenEPKc
㯠nmã¨objdump ã使ã£ã¦èª¿ã¹ãã¨ãæ§è½å£åã®åå ã«ãªã£ã¦ãã __sanitizer::internal_strlen
ã¨ãããã¨ããããã¾ãã
$ nm a.out | grep _ZN11__sanitizer15internal_strlenEPKc 00000000000d87f0 t _ZN11__sanitizer15internal_strlenEPKc $ objdump -CSlw -M intel a.out | c++filt | grep 00000000000d87f0 -A10 00000000000d87f0 <__sanitizer::internal_strlen(char const*)>: __sanitizer::internal_strlen(char const*)(): d87f0: 31 c0 xor eax,eax d87f2: 80 3f 00 cmp BYTE PTR [rdi],0x0 d87f5: 74 19 je d8810 <__sanitizer::internal_strlen(char const*)+0x20> d87f7: 66 0f 1f 84 00 00 00 00 00 nop WORD PTR [rax+rax*1+0x0] d8800: 48 83 c0 01 add rax,0x1 d8804: 80 3c 07 00 cmp BYTE PTR [rdi+rax*1],0x0 d8808: 75 f6 jne d8800 <__sanitizer::internal_strlen(char const*)+0x10> d880a: f3 c3 repz ret d880c: 0f 1f 40 00 nop DWORD PTR [rax+0x0]
_ZN11__sanitizer15internal_strlenEPKc
ã®æ£ä½ãåãã£ãã®ã§ãbpftraceã®ãã¬ã¼ã·ã³ã°çµæã«ä¸åº¦æ»ãã¾ãããã
ããã¯å¦çãä»è¾¼ãã§ããã㪠__interceptor_strlen
ã調ã¹ã¦ã¿ã¾ãã
__interceptorã®ãã¯ãå®ç¾© ã 確èªããã¨å¼ã³åºãããé¢æ°æ¬ä½ãå®è¡ãããåã«åå¦çãããã¯ãã¦ããæ§åããããã¨æãã¾ãã
strlenã®interceptorã®å®è£ ã 確èªããã¨strlenãå¼ã³åºãåã«ãå¦çãããã¯ãã¦ããæ§åããããã¨æãã¾ãã
INTERCEPTOR(SIZE_T, strlen, const char *s) { // Sometimes strlen is called prior to InitializeCommonInterceptors, // in which case the REAL(strlen) typically used in // COMMON_INTERCEPTOR_ENTER will fail. We use internal_strlen here // to handle that. if (COMMON_INTERCEPTOR_NOTHING_IS_INITIALIZED) return internal_strlen(s); void *ctx; COMMON_INTERCEPTOR_ENTER(ctx, strlen, s); SIZE_T result = REAL(strlen)(s); if (common_flags()->intercept_strlen) COMMON_INTERCEPTOR_READ_RANGE(ctx, s, result + 1); return result; }
__interceptor_*
ãã _ZN11__sanitizer15internal_strlenEPKc
ã®å¼ã³åºããèµ·ããã±ã¼ã¹ã追ãããã¦ã¿ã¾ãããã
bpftraceã§ç¹å®ã®ãã¤ããªã®ã·ã³ãã«ã®å¼ã³åºãåæ°ã測å®ãããã¨ãã§ãã¾ãã å¼ã³åºãå ã®æããããå¾ããããã§ãã
$ cat strlencount.bc uprobe:/home/yokotaso/a.out:_ZN11__sanitizer15internal_strlenEPKc, uprobe:/home/yokotaso/a.out:__interceptor_* { @[probe] = count(); } $ sudo bpftrace strlencount.bc Attaching 385 probes... ^C ... Ommit ... @[uprobe:/home/yokotaso/a.out:__interceptor_memcpy]: 1000 @[uprobe:/home/yokotaso/a.out:__interceptor_strchr]: 100000 @[uprobe:/home/yokotaso/a.out:__interceptor_index]: 100000 @[uprobe:/home/yokotaso/a.out:__interceptor_strlen]: 100000 @[uprobe:/home/yokotaso/a.out:_ZN11__sanitizer15internal_strlenEPKc]: 100026
perfã®æ®µéã§ã¯ãstrlenãç¯äººã®é°å²æ°ã§ãããã__interceptor_strlen
, __interceptor_strchr
ãå¼ã³åºããã¦ãã strchrã容çã®çããåºã¦ãã¾ããã
strlenã®å¤ç½ªã¨çç¯äººstrchr
åç¾ã³ã¼ãã§strlenã®ã¿ãã³ã¼ã«ããããã«ä¿®æ£ããã³ã¼ããã³ã³ãã¤ã«ãã¦å®è¡ãã¦ã¿ã¾ãã
$ sudo bpftrace strlencount.bc Attaching 385 probes... ^C ... Ommit ... @[uprobe:/home/yokotaso/a.out:__interceptor_malloc]: 11 @[uprobe:/home/yokotaso/a.out:_ZN11__sanitizer15internal_strlenEPKc]: 26 @[uprobe:/home/yokotaso/a.out:__interceptor_memcpy]: 1000 @[uprobe:/home/yokotaso/a.out:__interceptor_strlen]: 100000
Oops!strlenã¯ç¯äººã§ã¯ããã¾ããã§ãã!
åç¾ã³ã¼ãã§strchrã®ã¿ãã³ã¼ã«ããããã«ä¿®æ£ããã³ã¼ããã³ã³ãã¤ã«ãã¦å®è¡ãã¦ã¿ã¾ãã
$ sudo bpftrace strlencount.bc Attaching 385 probes... ^C ... Ommit ... @[uprobe:/home/yokotaso/a.out:__interceptor_memcpy]: 1000 @[uprobe:/home/yokotaso/a.out:__interceptor_strlen]: 100000 @[uprobe:/home/yokotaso/a.out:__interceptor_index]: 100000 @[uprobe:/home/yokotaso/a.out:__interceptor_strchr]: 100000 @[uprobe:/home/yokotaso/a.out:_ZN11__sanitizer15internal_strlenEPKc]: 100026
çç¯äººã¯...strchr!ãåã !証æ ãããã£ã¦ãã!
å®è£
ã確èªãã¦ã¿ãã¨ãstrchrã¯æªå½±é¿ãåã¼ãã¦ããinternal_strlen
ãç¡æ¡ä»¶ã«å¼ã³åºãã¦ãã¾ã£ã¦ãã¾ãã
AddressSanitizerã®strlenã確èªããã¨internal_strlen
ã¯ãç¹å®ã®æ¡ä»¶ã§å¼ã³åºãããããã«ãªã£ã¦ãããã¨ããããã¾ãã
å®éã«ã¯ãããªãé è³ææ°ãªèª¿æ»ãã§ããããã§ã¯ãªãperfã³ãã³ãã®æç¹ã§strlenãæªããã¨ããèªç¥ãã¤ã¢ã¹æ²¼ã«ã¯ã¾ã£ãããã æ³¥èãgccã®ã³ã¼ãä¿®æ£å±¥æ´ã追ãããã¦åé¡è§£æ±ºã®ç³¸å£ã«ãã¦ãã¾ããbpftraceã使ã£ã¦strchrãæªããè£ãåã£ãã¨ããå½¢ã«ãªãã¾ãã ä»å¾ã¯ãä¼¼ããããªåé¡ãèµ·ããæã«bpftrace, perfãæ´»ç¨ãã¦åé¡è§£æ±ºã®ç³¸å£ãããç°¡åã«è¦ã¤ãããããã§ãã
gcc-9以éã¯ãã®åé¡ã解決ããã¦ãããããªã®ã§ããã®åé¡ã«ééãã¦ããæ¹ãããã£ãããã¾ããããgccã®ã¢ãããã¼ãããã¦ã¿ã¦ãã ããã
$ gcc -v gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04) $ g++ -DNDEBUG -std=c++11 -Wall -Wextra -fsanitize=address -O3 -g -static-libasan samplecode.cpp $ time ./a.out 450000 real 0m5.915s user 0m5.524s sys 0m0.380s
ãã¾ã: internal_strlenã¯ãªãé ãã®ãï¼
ããã©ã¼ãã³ã¹å£åã®åå ã«ãªã£ã internal_strlen ã®å®è£ ã§ãã é ããªãè¦å ã¯ãªãããã«è¦ãã¾ãã
uptr internal_strlen(const char *s) { uptr i = 0; while (s[i]) i++; return i; }
glicã®strlenã èªãã§ã¿ãã¨1ãã¤ããã¤èªãã§ããã®ã§ã¯ãªãã4 or 8 ãã¤ãã¾ã¨ãã¦èªãã§ããããé«éåããã¦ããã¿ããã§ããglibcã®strlenãé«éåããã¦ããã¨ãããã¨ãªãã§ããã
size_t STRLEN (const char *str) { const char *char_ptr; const unsigned long int *longword_ptr; unsigned long int longword, himagic, lomagic; // ... Omitted ... /* All these elucidatory comments refer to 4-byte longwords, but the theory applies equally well to 8-byte longwords. */ longword_ptr = (unsigned long int *) char_ptr; /* Bits 31, 24, 16, and 8 of this number are zero. Call these bits the "holes." Note that there is a hole just to the left of each byte, with an extra at the end: bits: 01111110 11111110 11111110 11111111 bytes: AAAAAAAA BBBBBBBB CCCCCCCC DDDDDDDD The 1-bits make sure that carries propagate to the next 0-bit. The 0-bits provide holes for carries to fall into. */ himagic = 0x80808080L; lomagic = 0x01010101L; if (sizeof (longword) > 4) { /* 64-bit version of the magic. */ /* Do the shift in two steps to avoid a warning if long has 32 bits. */ himagic = ((himagic << 16) << 16) | himagic; lomagic = ((lomagic << 16) << 16) | lomagic; } // ... Omitted ... /* Instead of the traditional loop which tests each character, we will test a longword at a time. The tricky part is testing if *any of the four* bytes in the longword in question are zero. */ for (;;) { longword = *longword_ptr++; if (((longword - lomagic) & ~longword & himagic) != 0) { /* Which of the bytes was the zero? If none of them were, it was a misfire; continue the search. */ const char *cp = (const char *) (longword_ptr - 1); if (cp[0] == 0) return cp - str; if (cp[1] == 0) return cp - str + 1; if (cp[2] == 0) return cp - str + 2; if (cp[3] == 0) return cp - str + 3; if (sizeof (longword) > 4) { if (cp[4] == 0) return cp - str + 4; if (cp[5] == 0) return cp - str + 5; if (cp[6] == 0) return cp - str + 6; if (cp[7] == 0) return cp - str + 7; } } }
ãªãã社å ã®æ¬è¨äºã®å¯¾ãããã£ã¼ãããã¯ã§è¿½å æ å ±ãããã ãã¾ããã(ã§ãããããåæãã)
strlenã®Cã«ããå®è£ ã¯å¤å ¸çãªãã®ã§ãããå®éã«ã¯ https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86_64/multiarch/strchr-avx2.S;h=413942b96a835c4a989fb9ea8777d3966b6bb973;hb=0679442defedf7e52a94264975880ab8674736b2 ãªã©ã®AVX2ç¨ã«ã¢ã»ã³ããªè¨èªã§æ¸ããããã®ã使ããã¾ãã
åèã¾ã§ã«ææ¸ããæååæ¤ç´¢ã©ã¤ãã©ãªã®è¨äºã https://blog.cybozu.io/entry/2016/08/25/080000
å ¨ãç¥ãã¾ããã§ããã奥ãæ·±ãã§ã...!
ã¾ã¨ã
- gcc-7ã®Address Sanitizerã¯strchrãããã©ã¼ãã³ã¹ã«æªå½±é¿ãããã®ã§åé¡ãããå ´åã¯gcc-9ã«æ´æ°ãã¾ããã
- perf, bpftraceãæ´»ç¨ããã¨ãä»ã¾ã§é£ããã£ã調æ»ãä¸è¬äººã§ã追跡ã§ãããã¨ããã
- glibcã®strlenã¯éãããã¨ãªãã¢ã»ã³ããªçãããããã§ããããã¨ããスーï¾ï¾ï½°ï¾ï½¯ï½¶ï½°
以ä¸ãbpftraceã使ã£ãè¬ã®ç¾è±¡ã®çç¸ç©¶æã¬ãã¼ãã§ãã! bpftraceãå°ã£ãã¨ãã«æã«å½¹ã«ãã¤äºãå¤ããã§ãããããæ©ã«ãã£ã¨ç¥ããããªãã¾ããã
å·ç: @yokotaso