OProfileã®ä½¿ãæ¹åå¿é²
ããã°ã©ã ã®ããã«ããã¯ãã©ãã«ããã®ãããªãã¦èª¿ã¹ãã¨ãã«ã¯è¨æ¸¬ããå¿
è¦ãããã¾ãããããã°ã©ã ä¸ã®ç¹å®å¦çã®åå¾ã§rdtscå½ä»¤ä½¿ã£ã¦æéãè¨æ¸¬ãã¦å¦çæéãæ±ãããã¨ããããããã¨ãã§ãããã§ããã©ãã¾ãããã©ããããªãã§ããããããã¡ã¤ã©ã使ãã¾ãããã
プロファイラとはなんぞや、Wikipediaの性能解析のページに色々書いてますねã
ããããããã§OProfileというLinuxで動くプロファイラã使ã£ã¦ããã®ã§ãæªæ¥ã®èªåã¨ããOProfileåããã¦ã¿ã¦ã¼ãã©ãã£ã±ãããããï¼ãã¿ãããªäººã®ããã«ã¾ã¨ãã¦ããã¾ãã
OProfileã®ç¹å¾´
OProfileã¯
- è¨æ¸¬ãããããã°ã©ã ã«å¯¾ãã¦ç¹å¥ãªå¦çãããªãã¦ããã
- ä½ã¬ã¤ã¤ã¼ã®æ å ±ãè¨æ¸¬ã§ãã
- gprofå½¢å¼ã®ã³ã¼ã«ã°ã©ãã表示ã§ãã
- ãªã¼ãã¼ããããã¨ã¦ãå°ãã
これらの特徴があるらしいですã使ã£ã¦ã¿ã¦ç¹ã«å¬ããã¨æããã®ã¯ãä½ã¬ã¤ã¤ã¼ã®æ å ±ãè¨æ¸¬ã§ãããã§ãããCPUã®ç¨®é¡ã«ããã*1ã®ã§ããå¦çæéã ãã§ã¯ãªããL1ãã£ãã·ã¥ãã¹ããåæ°ããDRAMã¢ã¯ã»ã¹åæ°ã¨ããããã°ã©ã ã®ã©ãã§çºçãã¦ããã®ãä¸ç®ã§ããã£ããããã®ã¯ããããªãã¨ãããããã®ã ãªãã¨æåã®å¿µãè¦ãã¾ãã
ã¤ã³ã¹ãã¼ã«
ã¾ã対å¿ãã¦ããCPUã¯ã ããã
http://oprofile.sourceforge.net/docs/
ã®æå¾ã®ãEvent type referenceãã«ã¾ã¨ãããã¦ãã¨æãã¾ãããソースコード見た感じãããã«æ¸ããã¦ããªããã®ãã¡ãã£ã¨ã¯å¯¾å¿ã¨ãããã¦ãæ°ããã¾ããç¡ãã£ããæ¸ãã°ããããããªãã§ããï¼
ä»åå ¥ããç°å¢ã®CPUã¯
$ cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 43 model name : AMD Athlon(tm)64 X2 Dual Core Processor 4600+ ...
AMD Athlon64ã¨ãããã¨ã§å¹¸ãã«ã対å¿ãã¦ããã®ã§OProfileãã¤ã³ã¹ãã¼ã«ãããã¨ã«ãã¾ããOSã¯Ubuntu 8.10ããã®ç°å¢ã ã¨ãããªç¡¬æ´¾ãªOProfileã§ãã
$ sudo apt-get install oprofile oprofile-gui
ãããªã«è»æ´¾ã«ä¸è¡ã§ã¤ã³ã¹ãã¼ã«ã§ããã
# ã«ã¼ãã«ããã°ã©ã ã®ãããã¡ã¤ã«ã¾ã§åããããªãã«ã¼ãã«åæ§ç¯ãã¦vmlinuxä½ããªãã¨ãããªãã®ã§ããããªãã«é¢åãªãã¨ãããå¿
è¦ãããã
使ãæ¹
oprofileã¯åºæ¬çã«opcontrolã³ãã³ãããopcontrol --start
ã®ããã«oprofileãã¼ã¢ã³ã«å½ä»¤ãä¸ããæ±ãæ¹ããã¾ããã¨ãããã以ä¸ã®ã³ãã³ãããããã°ãããã¡ã¤ãªã³ã°ã£ã½ããã¨ãã§ãã¾ãã
- --start-daemon
- oprofileãã¼ã¢ã³ãéå§ï¼ãããã¡ã¤ã«ã¯å§ã¾ããªãï¼
- --start
- ï¼ãã¼ã¢ã³ãèµ·åãã¦ããªããã°èµ·åãã¦ããï¼ãããã¡ã¤ã«ãã¯ãããã
- --dump
- ç¾å¨ã¾ã§ã®ãããã¡ã¤ã«çµæãåºåãã
- --stop
- ãããã¡ã¤ã«ãåæ¢ããçµæãåºåãã
- --shutdown
- oprofileãã¼ã¢ã³ãåæ¢
- --reset
- ãããã¡ã¤ã«çµæããªã»ãããã
æå
ã®ãã¼ã¸ã§ã³ã®opcontrolã ã¨ã«ã¼ã権éãå¿
è¦ãã«ã¼ã権éç¡ãã§ã使ãããã¼ã¸ã§ã³ï¼è¨å®ï¼ï¼ãããããã ãã©ããã¯ããããããªãã
ã¨ããããåããã¦ã¿ãããªããã¾ã
$ sudo opcontrol --no-vmlinux
ã®ããã«ãã«ã¼ãã«ã®ãããã¡ã¤ã«ã¾ã§ã¯åããªããã¨è¨å®ãï¼è¨å®å
容ã¯/etc/oprofile/daemonrcã«ä¿åããã¾ãï¼opcontrol --start
ã§ãããã¡ã¤ãªã³ã°ãéå§ããopcontrol --stop
ã§çµäºãããã®éã«åããããã°ã©ã ã«ã¤ãã¦è¨é²ãã¾ãã
ãããã«ãããªããã°ã©ã ã®ãããã¡ã¤ã«ãåã£ã¦ã¿ã¾ãã
#include <stdio.h> #include <stdlib.h> #define ROWS 1000 #define COLS 1000 #define table_ref(table,row,col) ((table)[(row)*COLS+(col)]) typedef int* Table; void touch_row_col(Table from, Table to) { int i, j; for(i=0;i<ROWS;++i) { for(j=0;j<COLS;++j) { table_ref(from, i, j) = table_ref(to, i, j); } } } void touch_col_row(Table from, Table to) { int i, j; for(i=0;i<COLS;++i) { for(j=0;j<ROWS;++j) { table_ref(from, j, i) = table_ref(to, j, i); } } } int main() { Table from, to; from = malloc(sizeof(int)*ROWS*COLS); to = malloc(sizeof(int)*ROWS*COLS); touch_row_col(from, to); touch_col_row(from, to); return 0; }
gcc -gã§ã³ã³ãã¤ã«ãã¦ï¼-gãªãã·ã§ã³ãä»ããã¨ããã¨ã§opannotateã§ã½ã¼ã¹ã¨ä¸¦ã¹ã¦çµæãè¦ããã¨ãã§ããï¼ã--startã¨--stopã®éã§ããã°ã©ã ãå®è¡ãã¾ãã
$ gcc -g array.c $ sudo opcontrol --reset $ sudo opcontrol --start && ./a.out ; sudo opcontrol --stop Using default event: CPU_CLK_UNHALTED:100000:0:1:1 Using 2.6+ OProfile kernel interface. Using log file /var/lib/oprofile/samples/oprofiled.log Daemon started. Profiler running. Stopping profiling.
opreportã¾ãã¯opannotateãªã©ã§ãããã¡ã¤ã«ã®çµæãè¦ã¾ãã
$ opreport CPU: AMD64 processors, speed 1000 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No u nit mask) count 100000 CPU_CLK_UNHALT...| samples| %| ------------------ 2711 42.3461 no-vmlinux 2618 40.8935 a.out 351 5.4827 ld-2.8.90.so 343 5.3577 libc-2.8.90.so 139 2.1712 dash 53 0.8279 libglib-2.0.so.0.1800.2 25 0.3905 oprofiled ...
å¼æ°ãæå®ããªãã¨ããããã¡ã¤ã«ä¸ã«å®è¡ããã¦ãããã¹ã¦ã®ããã»ã¹ã«ã¤ãã¦æ å ±ã表示ãã¾ããç¹å®ã®ããã°ã©ã ã®ãããã¡ã¤ã«æ å ±ã®ã¿è¦ããå ´åã¯å¼æ°ã§æå®ã
$ opreport a.out /usr/local/bin/zsh /bin/grep CPU: AMD64 processors, speed 1000 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000 CPU_CLK_UNHALT...| samples| %| ------------------ 2618 99.3548 a.out 15 0.5693 grep 2 0.0759 zsh
å¼æ°-lã§ã·ã³ãã«æ å ±ã-cã§ã³ã¼ã«ã°ã©ãã¨ãããã¦è¡¨ç¤ºããã¾ããtouch_col_rowãtouch_row_colã«æ¯ã¹ã¦ã¨ã¦ãé ãã®ãè¦ã¦åãã¾ãã
$ opreport -l a.out CPU: AMD64 processors, speed 1000 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000 samples % symbol name 2025 77.3491 touch_col_row 593 22.6509 touch_row_col $ opreport -c a.out CPU: AMD64 processors, speed 1000 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000 samples % symbol name ------------------------------------------------------------------------------- 2025 77.3491 touch_col_row 2025 100.000 touch_col_row [self] ------------------------------------------------------------------------------- 593 22.6509 touch_row_col 593 100.000 touch_row_col [self] -------------------------------------------------------------------------------
opannotateã§ã¯-sã§ã½ã¼ã¹ã³ã¼ãã-aã§ã¢ã»ã³ããªã³ã¼ãã¨ããããçµæã表示ãããã¨ãå¯è½ã§ãã
$ opannotate -s a.out /* * Command line: opannotate -s a.out * * Interpretation of command line: * Output annotated source file with samples * Output all files * * CPU: AMD64 processors, speed 1000 MHz (estimated) * Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000 */ /* * Total samples for file : "/home/komuro/lang/c/etc/array.c" * * 2618 100.000 */ :#include <stdio.h> :#include <stdlib.h> :#define ROWS 1000 :#define COLS 1000 :#define table_ref(table,row,col) ((table)[(row)*COLS+(col)]) :typedef int* Table; :void touch_row_col(Table from, Table to) { /* touch_row_col total: 593 22.6509 */ : int i, j; 3 0.1146 : for(i=0;i<ROWS;++i) { 235 8.9763 : for(j=0;j<COLS;++j) { 355 13.5600 : table_ref(from, i, j) = table_ref(to, i, j); : } : } :} :void touch_col_row(Table from, Table to) { /* touch_col_row total: 2025 77.3491 */ : int i, j; 7 0.2674 : for(i=0;i<ROWS;++i) { 471 17.9908 : for(j=0;j<COLS;++j) { 1547 59.0909 : table_ref(from, j, i) = table_ref(to, j, i); : } : } :} :int main() { : Table from, to; : from = malloc(sizeof(int)*ROWS*COLS); : to = malloc(sizeof(int)*ROWS*COLS); : touch_row_col(from, to); : touch_col_row(from, to); : return 0; :} $ opannotate -a a.out /* * Command line: opannotate -a a.out * * Interpretation of command line: * Output annotated assembly listing with samples * * CPU: AMD64 processors, speed 1000 MHz (estimated) * Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (N o unit mask) count 100000 */ : :/home/komuro/lang/c/etc/a.out: file format elf32-i386 : : :Disassembly of section .text: : 08048422 <touch_col_row>: /* touch_col_row total: 2025 77.3491 */ : 8048422: push %ebp : 8048423: mov %esp,%ebp : 8048425: sub $0x10,%esp : 8048428: movl $0x0,-0x4(%ebp) : 804842f: jmp 8048475 <touch_col_row+0x53> : 8048431: movl $0x0,-0x8(%ebp) : 8048438: jmp 8048468 <touch_col_row+0x46> 80 3.0558 : 804843a: mov -0x8(%ebp),%eax 465 17.7617 : 804843d: imul $0x3e8,%eax,%eax 116 4.4309 : 8048443: add -0x4(%ebp),%eax : 8048446: shl $0x2,%eax : 8048449: mov %eax,%edx 454 17.3415 : 804844b: add 0x8(%ebp),%edx : 804844e: mov -0x8(%ebp),%eax : 8048451: imul $0x3e8,%eax,%eax 239 9.1291 : 8048457: add -0x4(%ebp),%eax : 804845a: shl $0x2,%eax : 804845d: add 0xc(%ebp),%eax 193 7.3720 : 8048460: mov (%eax),%eax : 8048462: mov %eax,(%edx) : 8048464: addl $0x1,-0x8(%ebp) 471 17.9908 : 8048468: cmpl $0x3e7,-0x8(%ebp) : 804846f: jle 804843a <touch_col_row+0x18> : 8048471: addl $0x1,-0x4(%ebp) 7 0.2674 : 8048475: cmpl $0x3e7,-0x4(%ebp) : 804847c: jle 8048431 <touch_col_row+0xf> : 804847e: leave : 804847f: ret ... $ opannotate -sa a.out /* * Command line: opannotate -sa a.out * * Interpretation of command line: * Output annotated assembly listing with samples * * CPU: AMD64 processors, speed 1000 MHz (estimated) * Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (N o unit mask) count 100000 */ : :/home/komuro/lang/c/etc/a.out: file format elf32-i386 : : :Disassembly of section .text: : 08048422 <touch_col_row>: /* touch_col_row total: 2025 77.3491 */ : for(j=0;j<COLS;++j) { : table_ref(from, i, j) = table_ref(to, i, j); : } : } :} :void touch_col_row(Table from, Table to) { : 8048422: push %ebp : 8048423: mov %esp,%ebp : 8048425: sub $0x10,%esp : int i, j; : for(i=0;i<ROWS;++i) { : 8048428: movl $0x0,-0x4(%ebp) : 804842f: jmp 8048475 <touch_col_row+0x53> : for(j=0;j<COLS;++j) { : 8048431: movl $0x0,-0x8(%ebp) : 8048438: jmp 8048468 <touch_col_row+0x46> : table_ref(from, j, i) = table_ref(to, j, i); 80 3.0558 : 804843a: mov -0x8(%ebp),%eax 465 17.7617 : 804843d: imul $0x3e8,%eax,%eax 116 4.4309 : 8048443: add -0x4(%ebp),%eax : 8048446: shl $0x2,%eax : 8048449: mov %eax,%edx 454 17.3415 : 804844b: add 0x8(%ebp),%edx : 804844e: mov -0x8(%ebp),%eax : 8048451: imul $0x3e8,%eax,%eax 239 9.1291 : 8048457: add -0x4(%ebp),%eax : 804845a: shl $0x2,%eax : 804845d: add 0xc(%ebp),%eax 193 7.3720 : 8048460: mov (%eax),%eax : 8048462: mov %eax,(%edx) : } :} ...
opcontrol --event=ã§è£è¶³ããã¤ãã³ããæå®ãããã¨ãã§ãã¾ããã¤ãã³ãã®ç¨®é¡ã¯ドキュメントãophelpã³ãã³ããopcontrol --list-eventsãªã©ã§è¡¨ç¤ºããããã®ã使ç¨å¯è½ã§ãã
touch_col_rowãã¨ã¦ãé
ãã®ã¯ããã£ã¨ã¡ã¢ãªé åã«å¯¾ããã¢ã¯ã»ã¹ãã¿ã¼ã³ãé£ã³é£ã³ã«ãªã£ã¦ãã¾ã£ã¦ã¦ãã£ãã·ã¥ãã¹ãçºçãã¾ãã£ã¦ããã ãããªãã¨ããããã¤ãã¦ããã£ãã·ã¥ãã¹ã®çºçå
·åãè¦ããã¨ã«ãã¾ããããAMD64ã ã¨DATA_CACHE_MISSESã¨ãã§ããã
$ sudo opcontrol --shutdown Stopping profiling. Killing daemon. $ sudo opcontrol --reset $ sudo opcontrol --event=DATA_CACHE_MISSES:500 $ sudo opcontrol --start && ./a.out ; sudo opcontrol --stop Using 2.6+ OProfile kernel interface. Using log file /var/lib/oprofile/samples/oprofiled.log Daemon started. Profiler running. Stopping profiling. $ opreport -l a.out CPU: AMD64 processors, speed 1000 MHz (estimated) Counted DATA_CACHE_MISSES events (Data cache misses) with a unit mask of 0x00 (No unit mask) count 500 samples % symbol namet 3965 95.0839 touch_col_row 205 4.9161 touch_row_col
touch_col_rowã§ãã£ãã·ã¥ãã¹ãå¤çºãã¦ãããã¨ããããã¾ããã
ãã®ããã«ãåã«ã¯èãã¦ãã¦æèãã¦æ¸ããã¨ã§å®éã«ããã°ã©ã ãéããªã£ãããããã¨ã¯ç¢ºèªããã®ã ããã©ããã©ãã«ãåå¨ãé ãã¦ããããããªããã£ãã·ã¥ããã身è¿ã«æãããã¨ã®ã§ãã¦å¬ããï¼ ãããOProfileã®ãã¦ããã大ããªä»äºãªã®ã§ãããããããããããããã
ãã¨GUIããæä½ãããã¨ã¨ããå¯è½ããããã ãã©å ¨ã試ãã¦ãªãã®ã§ããããªãã
*1:OProfileã対å¿ãã¦ããªãCPUã ã£ããå ¨ã使ããªã