ãLinuxã«ã¼ãã«2.6解èªå®¤ãï¼ä»¥éãæ§çï¼åºçå¾ãLinuxã«ã¯å¤ãã®æ©è½ã追å ãããã¨ã³ã¿ã¼ãã©ã¤ãºé åãã¯ããã¨ããæ§ã ãªå ´æã§ä½¿ãããããã«ãªãã¾ããã ããã«ä¼´ãã³ã¼ããè¥å¤§ãã¤è¤éåããå¤ãã®ã¨ã³ã¸ãã¢ã«ã¨ã£ã¦è§£èªä¸è½ãªãã©ãã¯ããã¯ã¹ã¨ãªã£ã¦ãã¾ãã ä¸çä¸ã®ãããã¨ã³ã¸ãã¢éã®åä½ã§ããLinuxã«ã¼ãã«ã«ã¡ã¹ãå ¥ãããã©ãã¯ããã¯ã¹ãããéãã¦ãæã«å¥½å¥å¿ã®èµ´ãã¾ã¾ã«ã«ã¼ãã«ã®ä¸çã解èªãããæ°Linuxã«ã¼ãã«è§£èªå®¤ãããã¸ã§ã¯ãã
æ¬ç¨¿ã§ã¯ããããã¯ã¼ã¯æ©è½ã®åä¿¡å¦çã«ãããEthernetãã©ã¤ãã«ã¤ãã¦ã«ã¼ãã«v6.8ã®ã³ã¼ãããã¼ã¹ã«è§£èª¬ãã¾ãã
- ã¯ããã«
- 1. æ¦è¦
- 2. å²ãè¾¼ã¿ãã³ãã©ã«ãããå¦ç
- 3. ãã¼ãªã³ã°ãã³ãã©(NAPI)ã«ããåä¿¡å¦ç
- 4. sk_buffæ§é ä½ã®çæ
- 次åäºå
å·çè : é ç° å²å¿ã稲è è²´æ
â» ãæ°Linuxã«ã¼ãã«è§£èªå®¤ãé£è¼è¨äºä¸è¦§ã¯ãã¡ã
ã¯ããã«
ååã¾ã§ã¯ã½ã±ããã¤ã³ã¿ã¼ãã§ã¼ã¹ã«ã¤ãã¦è§£èª¬ãã¦ãã¾ããã ä»åããã¯ããã¤ã¹ãåä¿¡ãããã±ãããã½ã±ããã«é éããã¾ã§ãããªãã¡ãã±ããã®åä¿¡å¦çã«ã¤ãã¦è§£èª¬ãã¦ããããã¨æãã¾ãã ä»åã¨æ¬¡åã¯Ethernetãã©ã¤ãã«ããããã±ããåä¿¡å¦çãè¦ã¦ããã¾ãã (ãã¨ãã¨1æ¬ã®è¨äºã®äºå®ã§ããããã¨ãã§ããªãé·ããªã£ã¦ãã¾ã£ããã2åã«åãããã¨ã«ãªãã¾ããã)
åææ¡ä»¶
ãã±ããã®åä¿¡å¦çã¯ããããã¯ã¼ã¯ããã¤ã¹ãã©ã¤ãããããã³ã«ããã³ã«ã¼ãã«ãã©ã¡ã¼ã¿ãªã©æ§ã ãªçµã¿åãããããããããã®çµã¿åããã«ãã£ã¦éããã¸ãã¯ãç°ãªãããããã¹ã¦ã調æ»ããã®ã¯éç¾å®çã§ãã ããã§ãæ¬è¨äºããè¤æ°åã«æ¸¡ã£ã¦è§£èª¬ãããã±ããã®åä¿¡å¦çã§ã¯ä»¥ä¸ãåæã¨ãã¦è©±ãé²ãã¾ãã
- ã¢ããªã
recvfrom(2)
ã§UDP/IPv4ã®ãã±ãããåä¿¡ããéã®ã«ã¼ãã«ã®æµãã追ããã¨ãã¡ã¤ã³ã¨ãã - ç°å¸¸ç³»ã»ã¨ã©ã¼ç³»ã®ãã¹ã¯å¯¾è±¡å¤ã¨ãã
- NICã¯Intelã®NIC(e1000eãã©ã¤ã)ã1æã ãæè¼
â»å²ãè¾¼ã¿æ¹å¼ã¯MSI(Message Signaled Interrupts)ãæ³å® - ãã±ããã®åä¿¡å¦çã追ããã¨ã«éä¸ããããã以ä¸ã®æ©è½ã«ã¤ãã¦ã¯ã¾ãå¥è¨äºã§è§£èª¬ãã
- GRO/LROãªã©ã®offloadæ©è½
- RSS/RPS/RFSãªã©ã®ãã«ããã¥ã¼é¢é£æ©è½
- VLAN
- ãã§ãã¯ãµã
- ã©ã®ç¨åº¦ä½¿ç¨ããã¦ãããä¸æãªæ©è½ããè¿å¹´ä½¿ç¨ããããã¨ãå°ãªããªã£ã¦ããæ©è½ï¼IPãã©ã°ã¡ã³ããªã©ï¼ã«ã¤ãã¦ã¯ã¹ã³ã¼ãå¤ã¨ãã
ã¾ããIPv6ãTCPãã«ã¼ãã£ã³ã°ãªã©ã®éè¦ãªæ©è½ã¯å°æ¥çã«è§£èª¬ããäºå®ã§ãã
1. æ¦è¦
ä»åã®è¨äºã§ã¯ããã¤ã¹(NIC)ããã±ãããåä¿¡ãããã±ãããIPã¬ã¤ã¤ã¼ã«æ¸¡ãã¾ã§ã®éç¨ãæ¦è¦³ãã¦ããã¾ãã ã¾ãã¯ãã±ããåä¿¡æã®å ¨ä½ã®æµãã大ã¾ãã«è¦ã¦ã¿ã¾ãããã NICãåä¿¡ãããã±ããããã¢ããªã±ã¼ã·ã§ã³ã¾ã§é éããæµãã¯å³1ã®ããã«ãªãã¾ãã
å³1ä¸ã®çªå·ã®å¦çã¯ä»¥ä¸ã®ããã«ãªãã¾ãã
1. NICã«ãã±ãããå°ç
2. NICãåä¿¡ãããã±ããããªã³ã°ãããã¡ã«æ¸ãè¾¼ã
3. NICãå²ãè¾¼ã¿ãçºè¡ãããã¼ãå²ãè¾¼ã¿ã³ã³ããã¹ãã«å¦çã移ã
4. å²ãè¾¼ã¿ãã³ãã©ã§ãã¼ãªã³ã°ã®ã¹ã±ã¸ã¥ã¼ãªã³ã°ãè¦æ±ããã½ããå²ãè¾¼ã¿ãããã¯ãã
5. ãã¼ãªã³ã°ãã³ãã©*1ã§ãã¼ãªã³ã°ãè¡ããé 次ãªã³ã°ãããã¡ã«ä¿åããã¦ãããã±ãããåä¿¡ãã
6. åä¿¡ãããã±ãããä¸ä½ã¬ã¤ã¤ã¼ã«é
éãã
7. ã½ã±ããã¤ã³ã¿ã¼ãã§ã¼ã¹ã«åä¿¡ãã¼ã¿ãé
éãã
8. ã¢ããªã±ã¼ã·ã§ã³ãrecvfrom(2)
ã·ã¹ãã ã³ã¼ã«ãå¼ã³åºããéã«ãåä¿¡ãã¼ã¿ãã¢ããªã±ã¼ã·ã§ã³ã«æ¸¡ã
å³1ã§ã¯å¤§ã¾ããªæµãã示ãã¾ããããå®éã«ã¯éåæã§åãã¦ããé¨åããããããå³ä¸ã®â ãâ§ããã¹ã¦é 次å¦çããã¨ã¯éãã¾ããã ããã§ãä»åº¦ã¯å³1ã®æµããã·ã¼ã±ã³ã¹ã¨ãã¦è¦ã¦ã¿ã¾ãããã
å³2ä¸ã®â ãâ¦ã¯å³1ä¸ã®çªå·ã¨å¯¾å¿ãã¦ãã¾ãã(â§ã®å¦çã¯ã¦ã¼ã¶ç©ºéããã®ã·ã¹ãã ã³ã¼ã«å¼ã³åºãã®ããå²æãã¾ããã) 大ããªæµãã¨ãã¦ããã¼ãå²ãè¾¼ã¿ã³ã³ããã¹ããâãã½ããå²ãè¾¼ã¿ã³ã³ããã¹ããã¨å¦çã移ã£ã¦ãããã¨ããããã¾ãã ã¾ããå½ç¶ã§ããå³2ã®éããLinuxã«ã¼ãã«ãåä¿¡å¦çããã¦ããéã«ããããã¤ã¹(NIC)ã«ã¯ãã±ãããå°çããé次ãªã³ã°ãããã¡ã«ãã¼ã¿ãæ¸ãè¾¼ã¾ãã¾ãã ãã®é次ãªã³ã°ãããã¡ã«æ¸ãè¾¼ã¾ããåä¿¡ãã¼ã¿ãå®éã«åãåºãã¦ãã½ã±ããã¤ã³ã¿ã¼ãã§ã¼ã¹ã¾ã§é éãã¦ããã®ã¯ãã½ããå²ãè¾¼ã¿ã³ã³ããã¹ããã§åãå¦ç(å³1,2ã®â¤ãâ¦)ãæ ã£ã¦ãã¾ãã ãã®ãããããã¤ã¹(NIC)ããã®å²ãè¾¼ã¿è¦æ±(IRQ)ãæ¤ç¥ããã½ããå²ãè¾¼ã¿ãããã¯ãããã¨ãå²ãè¾¼ã¿ãã³ãã©(inããã¼ãå²ãè¾¼ã¿ã³ã³ããã¹ãã)ã®éè¦ãªå½¹å²ã«ãªãã¾ãã
ã¨ããã§ããªããã±ããã®åä¿¡å¦çãããã¼ãå²ãè¾¼ã¿ã³ã³ããã¹ããã¨ãã½ããå²ãè¾¼ã¿ã³ã³ããã¹ããã«åããã¦ããã®ã§ããããã ãã±ããã®ãã¼ãªã³ã°ã«ããåä¿¡å¦çã§ã¯NAPI(New API)ã¨å¼ã°ããä»çµã¿ãå©ç¨ããã¦ããããã®NAPIã®å°å ¥èæ¯ãã¾ãã«ãã®çç±ãç©èªã£ã¦ãã¾ãã NAPIã«ã¤ãã¦ã¯å¼ç¤¾ã®éå»ã®æè¡ããã°ã§è§£èª¬ãã¦ãã¾ãã®ã§ãä»åã¯ãã¡ãããå¼ç¨ãããã¨æãã¾ãã
NAPI ã¨ã¯ New API ã®ç¥ã§ããã±ããã®åä¿¡å¦çã§å©ç¨ããã¦ããä»çµã¿ã§ãã
"New" ã¨ã¯è¨ã£ã¦ãã¾ãããããããç»å ´ããã®ã v2.5(å¾ã«ãv2.4 ã«ããã¯ãã¼ãããã)ã§ãå ¨ããæ°ãããã¯ããã¾ãããããã以åã¯ããã±ãããåä¿¡ãã度ã«å²ãè¾¼ã¿ãä¸ãã¦ãã®å²ãè¾¼ã¿å¦çã«ããåä¿¡å¦çãè¡ã£ã¦ãã¾ããããã®æ¹æ³ã§ã¯ããããã¯ã¼ã¯ã®è² è·ã軽ãå ´åã¯ãã±ãããé«éã«å¦çã§ããã¨ããã¡ãªãããããã¾ããããããã¯ã¼ã¯ã®è² è·ãä¸ãã㨠CPU ãå²ãè¾¼ã¿å¦çã«ããé«è² è·ã¨ãªããã·ã¹ãã ã®å¿çæ§ãæªããªãã¨ããåé¡ãããã¾ããã
NAPI ã§ã¯ããã±ããã®å°çã®éç¥(å²ãè¾¼ã¿)ã¨ãã±ããã®åä¿¡å¦çãåé¢ãã¦ã
* ãã±ãããå°çããã¨ã½ããã¦ã§ã¢å²ãè¾¼ã¿ã raise ããåä¿¡å¦çã¯ã½ããã¦ã§ã¢å²ãè¾¼ã¿å¦çã¨ãã¦è¡ãã
* åä¿¡å¦çä¸ã«ãã±ãããåä¿¡ãã¦ããéç¥(ã½ããã¦ã§ã¢å²ãè¾¼ã¿ã raise)ããªãã
* åä¿¡å¦çã§ã¯ NIC ã®ãã¥ã¼ã polling ãã(ãªã®ã§éç¥ããå¿ è¦ããªã)ã
ã¨ãããµãã«ãå²ãè¾¼ã¿ã«ããéç¥ã¨ polling ã®ãã¤ããªãããªå®è£ ã¨ãããã¨ã§ä¸è¨ã®åé¡ã解決ãã¦ãã¾ãã
æ¬¡ç« ãããå²ãè¾¼ã¿ãã³ãã©ã¨ãã¼ãªã³ã°ãã³ãã©ã«ãããå¦çããããã追ã£ã¦ããã¾ãã
2. å²ãè¾¼ã¿ãã³ãã©ã«ãããå¦ç
ããã§ã¯å²ãè¾¼ã¿ãã³ãã©ã®å¦çããè¦ã¦ããã¾ãããã åç« ã§è¦ãã¨ãããå²ãè¾¼ã¿ãã³ãã©ã¯NICããå²ãè¾¼ã¿è¦æ±(IRQ)ãåä¿¡ããéã«ããã¼ãå²ãè¾¼ã¿ã³ã³ããã¹ããã§å®è¡ããå¦çã«ãªãã¾ãã
ãã±ãããå®éã«åä¿¡ããå¦çã¯ããã¼ãªã³ã°ãã³ãã©ãå ¥å£ã¨ãããã½ããå²ãè¾¼ã¿ã³ã³ããã¹ããã®å¦çãæ ã£ã¦ãã¾ãããå²ãè¾¼ã¿ãã³ãã©ã¯ãã®ãè³ç«ã¦ã®å¦çããã¦ããã¨è¨ãã¾ãã ãã®ãè³ç«ã¦ã®å¦çã¨ãã¦ä»¥ä¸ã®3ã¤ãéè¦ã«ãªãã¾ãã
NICã®å²ãè¾¼ã¿ç¦æ¢
ãã¼ãªã³ã°ã«ãã£ã¦ãã±ãããåä¿¡ããããã以éã®ãã±ããåä¿¡æã®å²ãè¾¼ã¿ãç¦æ¢ãã¾ããNAPIãã¼ãªã³ã°ã®ã¹ã±ã¸ã¥ã¼ãªã³ã°
ãã½ããå²ãè¾¼ã¿ã³ã³ããã¹ããã«å¦çã移ãããã±ããã®åä¿¡å¦çãå®è¡ããéã«ãããã¤ã¹(NIC)ããã¼ãªã³ã°ããããã«ããã¼ãªã³ã°ã®ã¹ã±ã¸ã¥ã¼ãªã³ã°ãè¦æ±ãã¾ããNET_RX_SOFTIRQ
ã®raise
ã½ããå²ãè¾¼ã¿ãããã¯ãããã½ããå²ãè¾¼ã¿ã³ã³ããã¹ããã«å¦çã移ã£ãéã«ããã±ããã®åä¿¡å¦çãå®è¡ãããã¨ãè¦æ±ãã¾ãã
ããã§ã¯ãããã3ã¤ã®å¦çãã©ã®ããã«å®è£
ãã¦ããã®ããå®éã®ã½ã¼ã¹ã³ã¼ããè¦ã¦ã¿ã¾ãããã
ä»åã®ä¾(e1000eãã©ã¤ã)ã§ã¯ãå²ãè¾¼ã¿ãã³ãã©ã¨ãã¦ä»¥ä¸ã®e1000_intr_msi()
ãå®è¡ãã¾ãã
*2
(/drivers/net/ethernet/intel/e1000e/netdev.c)
static irqreturn_t e1000_intr_msi(int __always_unused irq, void *data) { ... u32 icr = er32(ICR); // çè ã³ã¡ã³ã: å²ãè¾¼ã¿ç¦æ¢ ... if (napi_schedule_prep(&adapter->napi)) { ... __napi_schedule(&adapter->napi); // çè ã³ã¡ã³ã: ãã¼ãªã³ã°ã®ã¹ã±ã¸ã¥ã¼ã«ã¨NET_RX_SOFTIRQã®raise } ... }
ã¾ãã¯NICã®å²ãè¾¼ã¿ãç¦æ¢ããå¦çã§ãã
ä¸è¦ãe1000eãã©ã¤ãã§ã¯æ示çã«å²ãè¾¼ã¿ãç¦æ¢ããå¦çãç¡ãããã«è¦ãã¾ãããer32(ICR)
ãå®è¡ããã¨èªåçã«å²ãè¾¼ã¿ãç¦æ¢ããã¾ãã
NICã®ICR(Interrupt Cause Read)ã¬ã¸ã¹ã¿ã¯å²ãè¾¼ã¿ã®åå ã示ãã¬ã¸ã¹ã¿ã§ãããer32(ICR)
ã¯ICRã¬ã¸ã¹ã¿ã®å¤ãèªã¿åºãå¦çã§ãã*3
(ä¸è¨ã®ã½ã¼ã¹ã³ã¼ãã§ã¯çç¥ãã¦ãã¾ã£ã¦ãã¾ãããe1000_intr_msi()
ã§ã¯ICRã¬ã¸ã¹ã¿ãåç
§ãã¦ãªã³ã¯ã¹ãã¼ãã®ç¶æ
ããã§ãã¯ããçãã¦ãã¾ãã)
ãã®ã¨ãNICã®ä»æ§ã«ããICRã¬ã¸ã¹ã¿ãèªã¿åºãã¨ãèªåçã«NICãå²ãè¾¼ã¿ç¦æ¢ç¶æ
ã«ãªãã¾ãã*4
ç¶ãã¦ãã¼ãªã³ã°ã®ã¹ã±ã¸ã¥ã¼ãªã³ã°ã¨NET_RX_SOFTIRQ
ã®raiseã«ã¤ãã¦è¦ã¦ããã¾ãããã
ãããã®å¦çã¯__napi_schedule()
ãå¼ã³ã ã____napi_schedule()
ã§å®è¡ãã¾ãã
(e1000_intr_msi()
ãèµ·ç¹ã¨ããå¼ã³åºãé¢ä¿ãæ´çããã¨ä»¥ä¸ã®ããã«ãªãã¾ãã)
e1000_intr_msi() âââ __napi_schedule() âââ ____napi_schedule()
ããã§ã¯____napi_schedule()
ãè¦ã¦ã¿ã¾ãããã
(/net/core/dev.c)
static inline void ____napi_schedule(struct softnet_data *sd, struct napi_struct *napi) { ... list_add_tail(&napi->poll_list, &sd->poll_list); // çè ã³ã¡ã³ã: NAPIãã¼ãªã³ã°ã®ã¹ã±ã¸ã¥ã¼ãªã³ã°è¦æ± ... if (!sd->in_net_rx_action) __raise_softirq_irqoff(NET_RX_SOFTIRQ); // çè ã³ã¡ã³ã: NET_RX_SOFTIRQãraise }
ä¸è¨ã½ã¼ã¹ã³ã¼ãã®æå¾ã«__raise_softirq_irqoff(NET_RX_SOFTIRQ)
ã§NET_RX_SOFTIRQ
ãraiseãã¦ãããã¨ããããã¾ãã
NET_RX_SOFTIRQ
ãraiseãã¦ãããã¨ã§ãå¾ã«ã½ããå²ãè¾¼ã¿ãããã¯ãããéã«NET_RX_SOFTIRQã½ããå²ãè¾¼ã¿(net_rx_action()
)ãå®è¡ãããã¨ã«ãªãã¾ãã
ããã¦ããã¼ãªã³ã°ã®ã¹ã±ã¸ã¥ã¼ãªã³ã°ã«ã¤ãã¦ã§ãããåè¿°ã®ã¨ããLinuxã«ã¼ãã«ã§ã¯ãã±ããã®åä¿¡å¦çã«NAPI(New API)ã¨ããä»çµã¿ãå©ç¨ãã¦ãã¼ãªã³ã°ãè¡ãã¾ãã
list_add_tail(&napi->poll_list, &sd->poll_list)
ã§ã¯softnet_dataæ§é ä½ã®poll_list
ã«ãã©ã¤ãã®napiæ§é ä½ãç¹ãã¦ãã¾ãã
ããã«ãããå¦çãNET_RX_SOFTIRQã½ããå²ãè¾¼ã¿ã³ã³ããã¹ãã«ç§»ã£ãéã«NAPIããã¼ãªã³ã°ãå®è¡ãã¾ãã
ãã¼ãå²ãè¾¼ã¿ã³ã³ããã¹ãããã½ããå²ãè¾¼ã¿ã³ã³ããã¹ãã¸ã®åãæ¿ããã«ã¤ãã¦ã¯éå»ã®ããã°ã§ã解説ãã¦ãã¾ããèå³ã®ããæ¹ãããã詳細ãç¥ãããæ¹ã¯åç §ãã¦ã¿ã¦ãã ããã
ããã¾ã§ããã¼ãå²ãè¾¼ã¿ã³ã³ããã¹ããã§åä½ããå²ãè¾¼ã¿ãã³ãã©ã®å¦çãè¦ã¦ãã¾ããã å²ãè¾¼ã¿ãã³ãã©ã§ã¯ãã½ããå²ãè¾¼ã¿ã³ã³ããã¹ããã®ããã®ãè³ç«ã¦ããã¦ããã ãã§ããã±ããã®åä¿¡ã«é¢ããå¦çã¯ã»ã¨ãã©ä½ããã¦ããªããã¨ãããã£ãã¨æãã¾ãã æ¬¡ç« ã§ã¯ããããåä¿¡å¦çã®å ¥å£ã¨ãªããã¼ãªã³ã°ãã³ãã©ã®å¦çãè¦ã¦ããã¾ãã
3. ãã¼ãªã³ã°ãã³ãã©(NAPI)ã«ããåä¿¡å¦ç
3.1 åæç¥è: sk_buffæ§é ä½
åä¿¡å¦çã®è§£èª¬ãããåã«sk_buffæ§é ä½ã¨ããéè¦ãªãã¼ã¿æ§é ãç´¹ä»ãã¦ããã¾ãã sk_buffæ§é ä½ã¯ä»åã®è¨äºã ãã§ãªããä»å¾ããããã¯ã¼ã¯é¢ä¿ã®è¨äºã§ãåºã¦ãããã¨ã«ãªãã¾ãã
Linuxããã±ãããéåä¿¡ããéããã±ããã®ãã¼ã¿ã¯å½ç¶ã§ãããããã¡ã¢ãªé åã«ä¿åããã¦ãã¾ãã sk_buffæ§é ä½ã¯ããã±ããã®ã¡ã¿ãã¼ã¿ã§ããããã±ããã®ãã¼ã¿ãä¿åããã¡ã¢ãªé åãæã示ãã¦ãã¾ãã
å種å¦çã§ãã±ãããæä½ãããåç §ããå ´åã«ã¯ããã®sk_buffæ§é ä½ãä»ãã¾ãã
sk_buffæ§é ä½ã®ã¡ã³ãå¤æ°ã¯é常ã«å¤ããããã§ãã¹ã¦ã解説ãããã¨ã¯ã§ãã¾ããã sk_buffæ§é ä½ã®å種ã¡ã³ãå¤æ°ãæä½é¢æ°ã¨ãã£ããã®ã¯ã解説ãå¿ è¦ã«ãªã£ãã¿ã¤ãã³ã°ã§é½åº¦ã触ãã¦ããããã¨æãã¾ãã
3.2 æ¦è¦
æ¬ç« ã§ã¯ãã¼ãªã³ã°ã«ãããã±ããã®åä¿¡å¦çãè¦ã¦ããã¾ãã å³2ã®ã·ã¼ã±ã³ã¹å³ãããã¼ãªã³ã°å¦çé¨åãæç²ãã¦ã¿ã¾ãã
loop
ã§å²ãã§ããé¨åããã¼ãªã³ã°ã«ãããã±ããã®åä¿¡å¦çã«è©²å½ãã¾ãã
ãªã³ã°ãããã¡ããé 次ãè¤æ°ã®ãã±ãããèªã¿ã ãã¦ããã¾ã¨ãã¦ä¸ä½ã®ãããã¯ã¼ã¯ãããã³ã«ã¹ã¿ãã¯ã«ãã¼ã¿(=åä¿¡ãã±ãã)ã渡ãã¦ãã(=ip_list_rcv()
ãå¼ã³åºãã¦ãã)ãã¨ããããã¾ãã
ã¾ããä¸ä½ã®ãããã¯ã¼ã¯ãããã³ã«ã¹ã¿ãã¯ããã¼ã¿ãå¦çããã¾ã§ã¯æ¬¡ã®ãã¼ã¿èªã¿åããéå§ããªããã¤ã¾ããªã³ã°ãããã¡ããã®ãã¼ã¿èªã¿åãã¨ãããã¯ã¼ã¯ãããã³ã«ã¹ã¿ãã¯ã§ã®å¦çãåæãã¦ãã¾ã£ã¦ãããã¨ããããã¾ãã
å®éã®ã½ã¼ã¹ã³ã¼ããè¦ã¦ã¿ã¾ãããããããã®å¦çã¯å³6ä¸ã«ãè¨è¼ãã¦ããã¨ããe1000_clean_rx_irq()
é¢æ°ã§è¡ãã¾ãã
(/drivers/net/ethernet/intel/e1000e/netdev.c)
static bool e1000_clean_rx_irq(struct e1000_ring *rx_ring, int *work_done, int work_to_do) { ... while (staterr & E1000_RXD_STAT_DD) { struct sk_buff *skb; ... skb = buffer_info->skb; // çè ã³ã¡ã³ã: ãªã³ã°ãããã¡ã®ãããã¡ããskbãåãåºã(ãã¼ã¿èªã¿åã) ... e1000_receive_skb(adapter, netdev, skb, staterr, rx_desc->wb.upper.vlan); // çè ã³ã¡ã³ã: skbã®ãªã¹ãä½æ or ä¸ä½ãããã¯ã¼ã¯ãããã³ã«ã¹ã¿ãã¯å¦ç next_desc: ... buffer_info = next_buffer; // çè ã³ã¡ã³ã: åç §å ã次ã®ãããã¡ã«ã»ãã ... } ... }
while
ã«ã¼ãã§ãªã³ã°ãããã¡ã®ãããã¡ããé 次sk_buffæ§é ä½ãåãåºããe1000_receive_skb()
é¢æ°ãçµç±ãã¦æçµçã«ä¸ä½ãããã¯ã¼ã¯ãããã³ã«ã¹ã¿ãã¯ã®å¦çãå¼ã³åºãã¾ãã
ä¸è¦ããã¨1ãã±ããèªã¿åããã¨ã«ãä¸ä½ãããã¯ã¼ã¯ãããã³ã«ã¹ã¿ãã¯ã®å¦çãå¼ã³åºãããã§ãããå®éã«ã¯è¤æ°ã®ãã±ãã(sk_buffæ§é ä½)ããªã¹ãã«ã¾ã¨ãããªã¹ããä¸å®ã®é·ããè¶
ããã¿ã¤ãã³ã°ã§ä¸ä½ãããã¯ã¼ã¯ãããã³ã«ã¹ã¿ãã¯ã«éã£ã¦ãã¾ãã
次åã¯ãã®e1000_receive_skb()
é¢æ°ä»¥éã®å¦çãæ·±å ãã¦ããã¾ãã
æ¬ç¯ã®æå¾ã«å²ãè¾¼ã¿ç¦æ¢ã®è§£é¤ã«ã¤ãã¦è§¦ãã¦ããããã¨æãã¾ãã
å²ãè¾¼ã¿ãã³ãã©ã§ã¯ããã¼ãªã³ã°ã«ãã£ã¦ãã±ãããåä¿¡ããããã«NICããã®å²ãè¾¼ã¿ãç¦æ¢ã«ãã¦ãã¾ããã
ãã®ããããã¼ãªã³ã°ã«ãããã±ããåä¿¡å¦çãçµäºãã段éã§NICããã®å²ãè¾¼ã¿ãå度æå¹ã«ããå¿
è¦ãããã¾ãã
ãããè¡ã£ã¦ããã®ããe1000_clean_rx_irq()
ã®å¼ã³åºãå
ã§ããe1000e_poll()
(=ãã¼ãªã³ã°ãã³ãã©)ã§ãã
net_rx_action() // NET_RX_SOFTIRQã½ããå²ãè¾¼ã¿ã®èµ·ç¹ âââ napi_poll() âââ e1000e_poll() // ãã¼ãªã³ã°ãã³ãã©: ãã¼ãªã³ã°å¦çã®å¼ã³åºããçµäºå¦ç(å²ãè¾¼ã¿ç¦æ¢è§£é¤) âââ e1000_clean_rx_irq() // ãã¼ãªã³ã°å¦ç(æ¬ç¯ã§è§£èª¬ãã¦ããé¢æ°)
(/drivers/net/ethernet/intel/e1000e/netdev.c)
static int e1000e_poll(struct napi_struct *napi, int budget) { ... adapter->clean_rx(adapter->rx_ring, &work_done, budget); // çè ã³ã¡ã³ã: ãã¼ãªã³ã°ã«ããåä¿¡å¦ç(e1000_clean_rx_irqã®å¼ã³åºã) ... if (likely(napi_complete_done(napi, work_done))) { // çè ã³ã¡ã³ã: ãã¼ãªã³ã°çµäºå¦ç ... if (!test_bit(__E1000_DOWN, &adapter->state)) { ... else e1000_irq_enable(adapter); // çè ã³ã¡ã³ã: å²ãè¾¼ã¿ç¦æ¢è§£é¤ } } ... }
e1000e_poll()
ã§ã¯å
ç¨ã¾ã§è§£èª¬ãã¦ãããe1000_clean_rx_irq()
ãå¼ã³åºããå¾ã«ãã¼ãªã³ã°çµäºå¦çã¨ãã¦ãe1000_irq_enable()
ã§å²ãè¾¼ã¿ãæå¹å(=å²ãè¾¼ã¿ç¦æ¢è§£é¤)ãã¦ãããã¨ããããã¾ãã
ããã«ãããNICãå度ãã±ãããåä¿¡ããéã«ã¯å²ãè¾¼ã¿ãä¸ããããã«ãªãã¾ãã
4. sk_buffæ§é ä½ã®çæ
åç« ã§è¦ã¦ããããã«ãsk_buffæ§é ä½ã¯Linuxã«ã¼ãã«ããã±ãããæ±ãä¸ã§ããã±ããã¨çµã«ãªãé常ã«éè¦ãªãã¼ã¿æ§é ã§ãã æ¬è¨äºã®æå¾ã§ã¯ããã®sk_buffæ§é ä½ãçæããæµããè¦ã¦ããããã¨æãã¾ãã
4.1 ãªã³ã°ãããã¡ã®ãããã¡æ§é
åä¿¡å¦çã«ããã¦æåã«sk_buffæ§é ä½ãç»å ´ããã®ã¯ãªã³ã°ãããã¡ã®ãããã¡ããåä¿¡ãã¼ã¿ãèªã¿åãã¨ããã§ããã ããã§ã¯ãªã³ã°ãããã¡ã¯ã©ã®ãããªæ§é ã«ãªã£ã¦ããã®ã§ããããã ãªã³ã°ãããã¡ã®åãããã¡ã¯e1000_bufferæ§é ä½ã¨ãã¦å³7ã«ç¤ºãæ§é ã¨ãªã£ã¦ãã¾ãã
e1000_bufferæ§é ä½ã§ã¯sk_buffæ§é ä½ã¨ã»ããã§dma
ã¨ããã¡ã³ãå¤æ°ãæã£ã¦ãããã¨ããããã¾ãã
ããã¯åä¿¡ãã±ããã®æ¸ãè¾¼ã¿ä½ç½®ã®ç©çã¢ãã¬ã¹ã示ãã¦ãã¾ãã
sk_buffæ§é ä½ã§ãdata
ã¨ãããã¤ã³ã¿åå¤æ°ã§åä¿¡ãã¼ã¿ã®æ¸ãè¾¼ã¿ä½ç½®ãä¿æãã¦ãã¾ãããããã¯ä»®æ³ã¢ãã¬ã¹ã«ãªãã¾ãã
NICã¯DMAã«ãã£ã¦åä¿¡ãã±ãããã¡ã¢ãªã«æ¸ãè¾¼ã¿ã¾ãããNICã解éã§ããã®ã¯ç©çã¢ãã¬ã¹ã®ã¿ã«ãªãã¾ãã
ãã®ãããe1000_bufferæ§é ä½ã§ã¯NICã«åä¿¡ãã±ããã®æ¸ãè¾¼ã¿ä½ç½®ãæ示ããããã®ç©çã¢ãã¬ã¹ã®æ
å ±ã¨ãã¦ãdma
ã¨ããã¡ã³ãå¤æ°ãæã£ã¦ãã¾ãã
(å³å¯ã«ã¯NICã解éã§ããã®ã¯ãã¹ã¢ãã¬ã¹ã§ãããããã§ã¯è©±ãç°¡åã«ããããã«ããã¹ã¢ãã¬ã¹=ç©çã¢ãã¬ã¹ãã¨ãã¦ãã¾ãã)
ã¨ããã§NAPIãã¼ãªã³ã°ã§åä¿¡å¦çãå®è¡ããç´å¾ã®ãããã¡ç¶æ ã¯ã©ã®ããã«ãªã£ã¦ããã®ã§ããããã å®éã«ã½ã¼ã¹ã³ã¼ãè¦ã¦ã¿ã¾ãããã
(/drivers/net/ethernet/intel/e1000e/netdev.c)
static bool e1000_clean_rx_irq(struct e1000_ring *rx_ring, int *work_done, int work_to_do) { ... while (staterr & E1000_RXD_STAT_DD) { struct sk_buff *skb; ... skb = buffer_info->skb; // çè ã³ã¡ã³ã: ãªã³ã°ãããã¡ããskbãåãåºã(ãã¼ã¿èªã¿åã) buffer_info->skb = NULL; // çè ã³ã¡ã³ã: ãªã³ã°ãããã¡ããskbãåãé¢ã ... buffer_info->dma = 0; // çè ã³ã¡ã³ã: æ¸ãè¾¼ã¿ä½ç½®(ç©çã¢ãã¬ã¹)ãã¯ãªã¢ ... e1000_receive_skb(adapter, netdev, skb, staterr, rx_desc->wb.upper.vlan); // çè ã³ã¡ã³ã: skbã®ãªã¹ãä½æ or ä¸ä½ãããã¯ã¼ã¯ã¹ã¿ãã¯å¦ç next_desc: ... buffer_info = next_buffer; // çè ã³ã¡ã³ã: åç §å ã次ã®ãããã¡ã«ã»ãã ... } ... }
ãããã¡ããsk_buffæ§é ä½ãåãåºããå¾ããããã¡ã®skb
(sk_buffæ§é ä½ã¸ã®ãã¤ã³ã¿)ã«ã¯NULL
ãã»ãããããããã¡ããsk_buffæ§é ä½ãåãé¢ãã¦ãããã¨ããããã¾ãã
ã¾ãããããã¡ã®dma
ã«ã0
ã代å
¥ããåä¿¡ãã±ããã®æ¸ãè¾¼ã¿ä½ç½®ãã¯ãªã¢ãã¦ãããã¨ããããã¾ãã
ãã®ç¶æ
ãå³ç¤ºããã¨å³8ã®ãããªç¶æ
ã«ãªãã¾ãã
ãããã£ã¦ããã®åä¿¡å¦çãçµãã£ãã¿ã¤ãã³ã°ã§sk_buffæ§é ä½ãçæ(=ã¡ã¢ãªç¢ºä¿)ããå度ãå³7ã®ç¶æ
ã«ãããã¡ãæ»ãå¿
è¦ãããã¾ãã
ãããè¡ã£ã¦ããã®ãã以ä¸ã®while
ã«ã¼ããæããå¾ã«è¡ã£ã¦ããadapter->alloc_rx_buf(rx_ring, cleaned_count, GFP_ATOMIC);
ã§ãã
(/drivers/net/ethernet/intel/e1000e/netdev.c)
static bool e1000_clean_rx_irq(struct e1000_ring *rx_ring, int *work_done, int work_to_do) { ... while (staterr & E1000_RXD_STAT_DD) { // çè ã³ã¡ã³ã: ãã¼ãªã³ã°ã«ããåä¿¡å¦ç ... } ... if (cleaned_count) adapter->alloc_rx_buf(rx_ring, cleaned_count, GFP_ATOMIC); // çè ã³ã¡ã³ã: åä¿¡ãããã¡è¿½å 確ä¿(skbã®çæ) ... }
adapter->alloc_rx_buf
ã¯é¢æ°ãã¤ã³ã¿ã«ãªã£ã¦ãããããã«ã¯e1000_alloc_rx_buffers
ãç»é²ãã¦ãã¾ãã*5
次ç¯ã§ã¯ãã®e1000_alloc_rx_buffers()
ã®å¦çã«ã¤ãã¦è¦ã¦ã¿ã¾ãã
4.2 sk_buffæ§é ä½ã®çæ
e1000_alloc_rx_buffers()
ã§ã¯sk_buffæ§é ä½ãçæ(=ã¡ã¢ãªç¢ºä¿)ããNICãå度ãå½è©²ãããã¡ã«æ¸ãè¾¼ããç¶æ
ã«ãã¦ãã¾ãã
(è¦ããã«å³8ã®ç¶æ
ããå³7ã®ç¶æ
ã«ããå¦çãè¡ã£ã¦ãã¾ãã)
ã¾ãã¯ã½ã¼ã¹ã³ã¼ããè¦ã¦ã¿ã¾ãããã
(/drivers/net/ethernet/intel/e1000e/netdev.c)
static void e1000_alloc_rx_buffers(struct e1000_ring *rx_ring, int cleaned_count, gfp_t gfp) { ... struct sk_buff *skb; ... while (cleaned_count--) { ... skb = __netdev_alloc_skb_ip_align(netdev, bufsz, gfp); // çè ã³ã¡ã³ã: dataå«ãskbã®ã¡ã¢ãªé åãç¢ºä¿ ... buffer_info->skb = skb; // çè ã³ã¡ã³ã: ãããã¡ã«skbãç»é² map_skb: buffer_info->dma = dma_map_single(&pdev->dev, skb->data, adapter->rx_buffer_len, DMA_FROM_DEVICE); // çè ã³ã¡ã³ã: ãã¹ã¢ãã¬ã¹ãåå¾ ... rx_desc->read.buffer_addr = cpu_to_le64(buffer_info->dma); // çè ã³ã¡ã³ã: NICã«æ¸ãè¾¼ã¿ä½ç½®(ãã¹ã¢ãã¬ã¹)ãç»é² ... } ... }
__netdev_alloc_skb_ip_align()
é¢æ°ã§ãåä¿¡ãã±ãããæ¸ãè¾¼ãdata
ã®ã¡ã¢ãªé åãå«ãã¦ãsk_buffæ§é ä½å
¨ä½ã®ã¡ã¢ãªé åã確ä¿ãã¾ãã
ããã¦ç¢ºä¿ããsk_buffæ§é ä½(skb
)ããããã¡(buffer_info->skb
)ã«ã»ãããã¾ãã
dma_map_single()
é¢æ°ã¯skb->data
ããNICã解éã§ããç©çã¢ãã¬ã¹(ãã¹ã¢ãã¬ã¹)ãåå¾ãã¦ãã¾ãã
*6
æå¾ã«åå¾ããç©çã¢ãã¬ã¹(ãã¹ã¢ãã¬ã¹)ãNICã«ã»ãããããã¨ã§ãNICãå½è©²ãããã¡ã«åä¿¡ãã¼ã¿ãæ¸ãè¾¼ããç¶æ
ã¨ãªãã¾ãã
ãããã®æµããå³ç¤ºããã¨å³10ã®ããã«è¡¨ç¾ã§ãã¾ãã
æ¬ç« ã§ã¯ãã¼ãªã³ã°ã«ããåä¿¡å¦çå¾ã®åä¿¡ãããã¡ã®è¿½å 確ä¿(sk_buffæ§é ä½ã®çæ)ã®æµãã説æãã¾ãããã
åä¿¡å¦çéå§åãããªãã¡ããã¤ã¹ãã¢ããããç´å¾ã®åæåå¦çã§ãe1000_alloc_rx_buffers()
ã«ããåä¿¡ãããã¡ã確ä¿ãã¦ãã¾ãã
次åäºå
ä»åã¯ãEthernetãã©ã¤ã æ¦è¦ç·¨ãã¨ãã¦ãNICããã±ãããåä¿¡ããã¨ããããããã±ãããIPã¬ã¤ã¤ã¼ã«æ¸¡ãé¨åã¾ã§ã俯ç°ãã¦è¦ã¦ãã¾ããã 次åã¯æ¬è¨äºã®3ç« ããã¼ãªã³ã°ãã³ãã©(NAPI)ã«ããåä¿¡å¦çãã§è§£èª¬ãããã¼ãªã³ã°ãã³ãã©ã®å¦çãæ·±å ãã¦ããã¾ãã ãã¼ãªã³ã°ãã³ãã©ã¯ãã±ããã®åä¿¡å¦çã®å ¥å£ã«ãããå¦çãæ ã£ã¦ãã¾ãããããã§ã¯ä¸ä½ãããã¯ã¼ã¯ã¹ã¿ãã¯ã¸ãã±ãããé éããããã®é¢æ°å¼ã³åºãããªãã¹ãæ¸ãããããªå·¥å¤«ããªããã¦ãã¾ãã ã¾ãGeneric XDPå¦çã¸å ¥ã£ã¦ããããã®ããã¯ãªã©ãåãè¾¼ã¾ãã¦ãã¾ãã 次åã®å 容ã¯ä»åã«æ¯ã¹ã¦ãããªãç´°ãã話ãå¤ããªãã¨æãã¾ããã楽ãã¿ã«å¾ ã£ã¦ããã ããã¨å¹¸ãã§ãã
*1:解説対象ã§ããe1000eã«ããããã¼ãªã³ã°ãã³ãã©ã¯e1000e_poll()ã«ãªãã¾ãã
*2:e1000eãã©ã¤ãã§ã¯ãå²ãè¾¼ã¿ãã³ãã©ã¨ãã¦e1000_intr()ãe1000_intr_msi()ãe1000_intr_msix()ãç¨æãã¦ãã¾ããåææ¡ä»¶ã«è¨è¼ãã¦ãã¾ãããä»åã¯å²ãè¾¼ã¿æ¹å¼ãMSIã§ãããã¨ãæ³å®ãã¦ãããããæ¬è¨äºã§ã¯MSIã«å¯¾å¿ããe1000_intr_msi()ã解説対象ã¨ãã¦ãã¾ãã
*3:å種ã¬ã¸ã¹ã¿ã®ä»æ§ã«ã¤ãã¦ã¯ãã¡ããåç §ãã ãããhttps://www.intel.com/content/dam/www/public/us/en/documents/manuals/pcie-gbe-controllers-open-source-manual.pdf
*4:èªã¿åºãæã«å²ãè¾¼ã¿ãç¦æ¢ããã«ã¯ãäºåã«IAM(Interrupt Acknowledge Auto Mask)ã¬ã¸ã¹ã¿ãCTRL_EXTã¬ã¸ã¹ã¿(Extended Device Control Register)ã®IAME(Interrupt Acknowledge Auto-Mask Enable)ãã£ã¼ã«ããè¨å®ããå¿ è¦ãããã¾ããããããã¯e1000_configure_rx()ã§è¨å®ãã¦ãã¾ãã
*5:ããã¤ã¹ãUPããã¨ãã«å®è¡ããåæåé¢æ°(e1000e_open->e1000_configure_rx)ã®ä¸ã§ãadapter->alloc_rx_bufã«e1000_alloc_rx_buffersãç»é²ãã¾ãã
*6:ãã®è¾ºãã¯DMAã®è©±ãªã®ã§è©³ç´°ã¯å²æãã¾ãããNICã解éã§ããã¢ãã¬ã¹ã¯å³å¯ã«ã¯ãã¹ã¢ãã¬ã¹ã¨å¼ã°ãããã®ã§ãç©çã¢ãã¬ã¹ã¨ç°ãªããã¨ãããã¾ããåä¿¡ãã±ãããæ¸ãè¾¼ã¾ããç©çã¢ãã¬ã¹ã¯skb->dataãããè¨ç®ã§ãã¾ããããã¹ã¢ãã¬ã¹ãç©çã¢ãã¬ã¹ã¨ç°ãªãå¯è½æ§ããããã¨ããDMAåºæã®å¦ç(ãã¦ã³ã¹ãããã¡ã®ç¢ºä¿ãªã©)ãå®æ½ããããã«dma_map_single()ãå¼ã³åºãå¿ è¦ãããã¾ãã