ã·ã¹ãã ã³ã¼ã«ã«ããéåæI/O API
ãã®APIã¯ãã«ã¼ãã«2.6以éã«ã«ã¼ãã«ã«è¿½å ãããæ©è½ãã¤ããããã®ã·ã¹ãã ã³ã¼ã«ã«ãªãã¾ãã
ã«ã¼ãã«ã¯ãéåæI/Oããµãã¼ãããããã«ãVFS層ã®ãã¡ã¤ã«ãªãã¸ã§ã¯ã(ç°¡åã«èª¬æããã¨ãå®éã®ãã¡ã¤ã«ã·ã¹ãã ããã¡ã¤ã«ã«å¯¾ããæä½ãä¿æãããªãã¸ã§ã¯ãã ã¨æãã¾ãããã®ä¸ã«ãf_opã¨ãããã¡ã¤ã«æä½ã®ããã®é¢æ°ãã¼ãã«ããã£ã¦ãã¾ãããªãã¸ã§ã¯ãæåã«ãããã¹ã¼ãã¼ã¯ã©ã¹ã®ãããªãã®ã§ãã)ã®f_opã«ããã¡ã¤ã«æä½ã®aio_read()ã¨aio_write()ã¨ããã¨ã³ããªãã¤ã³ããããããããããã«ãªãã¾ããã(å人çã«ãPOSIXã®APIã¨ååããã¶ãã®ã§ãããã§ä¸åº¦æ··ä¹±ãã¾ãããLinuxã®ã«ã¼ãã«å¨ãã¯ãæ¥æ¬èªã«ç´ãã¦ããããããããã¾ããããåãååã§å¼ã¹ããã©å®ã¯å¥ç©ã¨ãããã®ãããã¤ãããã¾ããã¾ããã§ãããã¯ãããã)
ãã®APIããã¡ããã¨ãI/Oå¦çãã«ã¼ãã«ã¹ã¬ããã§å®è¡ãããã¨ã«ãªãã¾ããããã§ããã«ã¼ãã«ã¹ã¬ããã¨ã¯ãå¥åã«ã¼ãã«ãã¼ã¢ã³ã¨å¼ã°ãããã®ã§ãPOSIXã®éåæI/Oã§çæããããã®ã¨ã¯å¥ç©ã§ããã©ãéããã¨ããã¨ãPOSIXã®APIã§çæãããã¹ã¬ããã¯ãã¦ã¼ã¶ã¼ç©ºéã§å®è¡ãããã®ã«å¯¾ãããã®ã«ã¼ãã«ã¹ã¬ããã¯ååã表ãããã«ãã«ã¼ãã«ç©ºéã§å®è¡ããã¾ããï¼ä»ã®Unixã§ã¯ãã¦ã¼ã¶ã¼ç©ºéã§åãã¹ã¬ãããã«ã¼ãã«ã¹ã¬ããã¨å¼ã¶ã¿ããã§ãããLinuxã§ã¯ããã®ã«ã¼ãã«ãã¼ã¢ã³ã¨ããã®ãåå¨ããããï¼ä»ã®Unixã§ã¯ç¡ãã¿ããã)ããããã«ã¼ãã«ã¹ã¬ããã¨è¡¨ç¾ãããã¨ãå¤ããããªæ°ããã¾ãããã®ã¹ã¬ãããå®è¡ããé¢æ°èªä½ããkernel_thread()ã¨ããé¢æ°ãå®è¡ãã¾ããã)
ps -efãå®è¡ããã¨
[aio/0]
ã®ããã«è¡¨ç¤ºããã¦ãããã¤ã§ããã
ã·ã¹ãã ã³ã¼ã«ã«ããéåæI/Oã¯ã以ä¸ã®ã·ã¹ãã ã³ã¼ã«ããã¡ã¾ãã
io_setup(),io_submit(),io_getevents() io_cancel(),io_destroy()
ãããã®é¢æ°ç¾¤ã使ã£ã¦ãã©ããã£ã¦I/Oè¦æ±ãè¡ãã®ããç°¡åã«èª¬æããã¨ã
io_setup()ã§éåæI/Oã使ãæºåã â iocbåã¨ããI/Oè¦æ±ç¨ã®æ§é ä½ãæºåãã â è¦æ±ãçºè¡ãããåã®iocbåã®ãã¤ã³ã¿ã®é åãã¤ãããæºåããiocbåã®é åãã¤ãã â iocbåã®é åãå¼æ°ã«ããio_submit()ã§è¦æ±ãçºè¡ â io_getevents()ã§æªçµäºã®éåæI/Oã®å®äºç¶æ ãåå¾ãã â io_destroy()ã§io_setup()ã§æºåãããã®åé¤ããããªã
ã®ããã«ãªãã¾ãã
io_submit()é¢æ°ã§éåæI/Oã³ã³ããã¹ãã¨ããã®ãçæããããã§I/Oè¦æ±ãéå§ããã¦ããã¾ãã
ã§ããã®ã·ã¹ãã ã³ã¼ã«ãå©ç¨ããã«ã¯ãã©ãããKernel Asynchronous I/O (AIO) Support for Linuxã¨ããã®ã使ããªãã¨ãããªãã¿ããã§ã(åè:moratorium | libaio(Linuxの非同期I/Oライブラリ)の使い方ãlab.klab.org - MediaWiki - 社内勉強会)ãlibaio.hã«ã¯ãã·ã¹ãã ã³ã¼ã«ã®é¢æ°å®£è¨ããio_prep_readãªã©ã®iocbæ§é ä½ãç°¡åã«ã»ããã¢ããããããã®ã©ããã¼é¢æ°ãå®ç¾©ããã¦ãã¾ãã
ãã ãã¡ãã£ã¨æ°ã«ãªã£ã¦ããã®ã¯ã詳解Linuxã«ã¼ãã«ãLinuxã«ã¼ãã«è§£èªå®¤ãlibaioã®ãã¼ã¸ã«ã¯ãç´æ¥è»¢éã®æã®ã¿æå¹ã®ããã«æ¸ããã¦ããã®ã§ãããåèã¨ãã¦ããããã¼ã¸ãlibaioã®manãã¼ã¸ã«ã¯ããµã¤ãã«openãããããªãã¡ã¤ã«ã«å¯¾ãã¦ãæå¹ã®ããã«æ¸ããã¦ãã¾ããã©ã¡ããæ£ããã®ã§ããããï¼
ã¾ããã©ã¡ããç½®ãã¦ããã¦ãããã§ã¯O_DIRECTã®ã¨ãã®å¦çã追ã£ã¦ããã¾ããããã§ãªãã¨ãã«ãæ¬å½ã«è¡ãããªããã¯ãå¥ã®æ©ä¼ã«(ã¾ãã¯å¥ã®äººã«(ry))ã
ä»åã¯ãéåæI/Oè¦æ±ããI/Oå®äºã®å¾ ã¡åãããè¡ããã復帰ããããã¿ã¾ãã
io_submit()é¢æ°ã«ã¯ãI/Oè¦æ±ã表ãiocbåã®ãã¤ã³ã¿ã®é åãããããã¾ããiocbåã«å ¥ããæ å ±ã¨ãã¦ã¯ãPOSIXçã®éåæI/Oã¨åæ§ã«ãã¡ã¤ã«ãã£ã¹ã¯ãªãã¿(aio_fildes)ããããã¡(aio_buf)ããããã¡ãµã¤ãº(aio_nbytes)ããªãã»ãã(aio_offset)ã«å ããI/Oè¦æ±ã®ç¨®é¡ã表ãè¦ç´ (aio_lio_opcode)ãæã¡ã¾ãã
/* kernel2.6.23 include/linux/aio_abi.h ãã */ 78 struct iocb { 79 /* these are internal to the kernel/libc. */ 80 __u64 aio_data; /* data to be returned in event's data */ 81 __u32 PADDED(aio_key, aio_reserved1); 82 /* the kernel sets aio_key to the req # */ 83 84 /* common fields */ 85 __u16 aio_lio_opcode; /* see IOCB_CMD_ above */ 86 __s16 aio_reqprio; 87 __u32 aio_fildes; 88 89 __u64 aio_buf; 90 __u64 aio_nbytes; 91 __s64 aio_offset; 92 93 /* extra parameters */ 94 __u64 aio_reserved2; /* TODO: use this for a (struct sigevent *) */ 95 96 /* flags for the "struct iocb" */ 97 __u32 aio_flags; 98 99 /* 100 * if the IOCB_FLAG_RESFD flag of "aio_flags" is set, this is an 101 * eventfd to signal AIO readiness to 102 */ 103 __u32 aio_resfd; 104 }; /* 64 bytes */
ãã®aio_lio_opcodeããã¨ã«ãio_submit()ã®å®ä½ã§ããsys_io_submit()ã®ãªãã§ãkiocbã¨ããæ§é ä½ã®ki_retryã¨ããé¢æ°ãã¤ã³ã¿ã«è¦æ±ã«å¯¾ããå¦çé¢æ°ãã»ããããã¾ããwriteã§ããã°ãã¡ã¤ã«ã·ã¹ãã ããã¤aio_writeãreadã§ããã°aio_readã¨ãªãã¾ãã
ããã¦ãã®ãã¡ã¤ã«ã·ã¹ãã ã§ã¯ãèªåã§å®è£ ããã«æçµçã«ã¯ã__generic_file_aio_[read/write]ãå¼ã°ãããã¨ã«ãªãã¾ãã
話ã¯ããã¾ããããã¡ã¤ã«ã·ã¹ãã ããã¤é¢æ°ã«ã¯ãread/writeã¨ããé¢æ°ãã¤ã³ã¿ãããã®ã§ãããæçµçã«ã¯åãã__generic_file_aio_[read/write]ããã°ãããã¨ã«ãªãã¾ãã
__generic_file_aio_read/__generic_file_aio_writeã«æ¥ãå¾ãO_DIRECTã®ç´æ¥è»¢éã®å ´åãwriteç³»ã®å ´åã¯generic_file_direct_writeã®ä¸ã§ãå ±éã®generic_file_direct_IOã«è¡ãçãã¾ãã
generic_file_direct_IOã®ãªãã§å¼ã°ããã__blockdev_direct_IOã®ä¸ã§ãç´æ¥è»¢éã®ããã®æ§é ä½(ç´æ¥è»¢éè¦æ±ã®å¦çç¶æ ã表ãã¦ãã)ã§ããdioæ§é ä½ãã»ããã¢ãããã¦ãã¾ããdioæ§é ä½ã®ä¸ã«ã¯ããã®I/Oãéåæã®I/Oãªã®ãã表ãis_asyncã¨ãããã©ã°ããããããã§ã»ããããã¾ãã
/* kernel2.6.23 fs/direct_io.c ãã */ 1215 /* 1216 * For file extending writes updating i_size before data 1217 * writeouts complete can expose uninitialized blocks. So 1218 * even for AIO, we need to wait for i/o to complete before 1219 * returning in this case. 1220 */ 1221 dio->is_async = !is_sync_kiocb(iocb) && !((rw & WRITE) && 1222 (end > i_size_read(inode)));
ãã®ãã©ã°ããã¦ãå¾ã«ãdirect_io_workeré¢æ°ã§I/Oã®è»¢éå¦çãè¡ã£ã¦ããã¾ããI/Oã®è»¢éè¦æ±ãåºããå¾ã«ãé常ãè¦æ±ã®å®äºãå¾ ã¡åããã¦å¾©å¸°ãè¡ãã¾ãããéåæã®è¦æ±ã®å ´åã¯ãå®äºãå¾ ããªãããã«ãªã£ã¦ãã¾ãã
/* kernel2.6.23 fs/direct_io.c ãã */ 1065 /* 1066 * The only time we want to leave bios in flight is when a successful 1067 * partial aio read or full aio write have been setup. In that case 1068 * bio completion will call aio_complete. The only time it's safe to 1069 * call aio_complete is when we return -EIOCBQUEUED, so we key on that. 1070 * This had *better* be the only place that raises -EIOCBQUEUED. 1071 */ 1072 BUG_ON(ret == -EIOCBQUEUED); 1073 if (dio->is_async && ret == 0 && dio->result && 1074 ((rw & READ) || (dio->result == dio->size))) 1075 ret = -EIOCBQUEUED; 1076 1077 if (ret != -EIOCBQUEUED) 1078 dio_await_completion(dio);
dio->asyncããã£ã¦ããã¨ããæ¡ä»¶ããµãã¾ããifã®æ¡ä»¶å¼ã§retã«-EIOCBQUEDãããããã次ã®ifã§dio_await_completionã¨ããå®äºå¾ ã¡åããé¢æ°ããã°ããªãã®ã§ããã®ã¾ã¾å¾©å¸°ã¨ãªãã¾ãã
ããã§ãO_DIRECTã§éåæã§å¼ã°ããå ´åã¯ãå®äºãå¾ ã¡ãããªãã¨ãããã¨ããããã¾ããã
ã§ã¯ãã©ããã£ã¦I/Oè¦æ±ã®å®äºãããã»ã¹ã¯ç¥ããã¨ãã§ããã®ã§ããããï¼
ã¨ããããã§ã次åã¯I/Oè¦æ±ã®å®äºãç¥ãæ¹æ³ãç´¹ä»ãã¾ãã