ãLinuxã«ã¼ãã«2.6解èªå®¤ãï¼ä»¥éãæ§çï¼åºçå¾ãLinuxã«ã¯å¤ãã®æ©è½ã追å ãããã¨ã³ã¿ã¼ãã©ã¤ãºé åãã¯ããã¨ããæ§ã ãªå ´æã§ä½¿ãããããã«ãªãã¾ããã ããã«ä¼´ãã³ã¼ããè¥å¤§ãã¤è¤éåããå¤ãã®ã¨ã³ã¸ãã¢ã«ã¨ã£ã¦è§£èªä¸è½ãªãã©ãã¯ããã¯ã¹ã¨ãªã£ã¦ãã¾ãã ä¸çä¸ã®ãããã¨ã³ã¸ãã¢éã®åä½ã§ããLinuxã«ã¼ãã«ã«ã¡ã¹ãå ¥ãããã©ãã¯ããã¯ã¹ãããéãã¦ãæã«å¥½å¥å¿ã®èµ´ãã¾ã¾ã«ã«ã¼ãã«ã®ä¸çã解èªãããæ°Linuxã«ã¼ãã«è§£èªå®¤ãããã¸ã§ã¯ãã
æ¬ç¨¿ã§ã¯ããããã¯ã¼ã¯æ©è½ã«ããããã±ããã®åä¿¡å¦çã«ã¤ãã¦ã«ã¼ãã«v6.8ã®ã³ã¼ãããã¼ã¹ã«è§£èª¬ãã¾ãã
- ã¯ããã«
- 1. ååã®ãããã
- 2. ä¸ä½ã¬ã¤ã¤ã¸ã®é é: e1000_receive_skb()é¢æ°ä»¥éã®å¦çã«ã¤ãã¦
- 3 ãã¼ãªã³ã°ã®çµäºæ¡ä»¶
- 次åäºå
å·çè : é ç° å²å¿ã稲è è²´æ
â»ãæ°Linuxã«ã¼ãã«è§£èªå®¤ãé£è¼è¨äºä¸è¦§ã¯ãã¡ã
ã¯ããã«
ååã¯ããã¤ã¹(NIC)ããã±ãããåä¿¡ãããã±ãããIPã¬ã¤ã¤ã¼ã«æ¸¡ãã¾ã§ã®éç¨ãè¦ã¾ããã
ä»åã¯ååã®3ç« ã§åãæ±ã£ãããã¼ãªã³ã°ãã³ãã©(NAPI)ã«ããåä¿¡å¦çããæ·±å ãã¦ããã¾ãã
ååã®è¨äºã«æ¯ã¹ãã¨ããªãç´°ãã話ãå¤ãã§ããããã£ããè
°+è
±éç+æ»æ«ã®ä¸é å¼µã£ã¦æ¸ããã®ã§ãã²èªãã§ããã ããã¨å¬ããã§ãã(æ¶)
(ããã»ã©éé
·ãªå´åç°å¢ã¨ããããã§ã¯ãªããã«ããªã³ã°çã§æªæããã¦ãã¾ãã¾ãã...ã)
1. ååã®ãããã
æåã«ååã®ããããã軽ããããã¨æãã¾ãã ååã¯ããã¤ã¹(NIC)ããã±ãããåä¿¡ãããã±ãããIPã¬ã¤ã¤ã¼ã«æ¸¡ãã¾ã§ã®éç¨(å³1ã«ãããâ ãâ¥)ãè¦ã¾ããã (åææ¡ä»¶çã¯ååã®è¨äºãã覧ãã ããã)
å²ãè¾¼ã¿ãã³ãã©ã¯NICããã®å²ãè¾¼ã¿è¦æ±(IRQ)ãæ¤ç¥ããã¨ããã±ããåä¿¡å¦çã®ããã®ãè³ç«ã¦ãè¡ããã½ããå²ãè¾¼ã¿ãããã¯ãã¾ãã ããã¦ããã½ããå²ãè¾¼ã¿ã³ã³ããã¹ããã§ã¯ãã¼ãªã³ã°ãã³ãã©ãå ¥å£ã¨ãã¦ãã±ããã®åä¿¡å¦çãéå§ããã®ã§ããã
å³2ä¸ã®loop
é¨åã§ããã¼ãªã³ã°ãè¡ããè¤æ°ã®ãã±ãããã¾ã¨ãã¦ä¸ä½ãããã¯ã¼ã¯ãããã³ã«ã¹ã¿ãã¯(以éãä¸ä½ã¬ã¤ã¤)ã¸é
éãããã¨ãç¹°ãè¿ãã¦ãã¾ãã
ãããå®è£
ãã¦ããã®ãe1000_clean_rx_irq()
é¢æ°ã§ãã
(/drivers/net/ethernet/intel/e1000e/netdev.c)
static bool e1000_clean_rx_irq(struct e1000_ring *rx_ring, int *work_done, int work_to_do) { ... while (staterr & E1000_RXD_STAT_DD) { struct sk_buff *skb; ... skb = buffer_info->skb; // çè ã³ã¡ã³ã: ãªã³ã°ãããã¡ã®ãããã¡ããskbãåãåºã(ãã¼ã¿èªã¿åã) ... e1000_receive_skb(adapter, netdev, skb, staterr, rx_desc->wb.upper.vlan); // çè ã³ã¡ã³ã: skbã®ãªã¹ãä½æ or ä¸ä½ã¬ã¤ã¤å¦ç next_desc: ... buffer_info = next_buffer; // çè ã³ã¡ã³ã: åç §å ã次ã®ãããã¡ã«ã»ãã ... } ... }
ä¸è¨ã½ã¼ã¹ã³ã¼ãã®e1000_receive_skb()
é¢æ°ã§è¤æ°ã®ãã±ãããã¾ã¨ããããä¸ä½ã¬ã¤ã¤ã«é
éãè¡ã£ã¦ãã¾ãã
ä»åã¯ãã®e1000_receive_skb()
é¢æ°ãæ·±å ãã¦ããã¾ãã
2. ä¸ä½ã¬ã¤ã¤ã¸ã®é
é: e1000_receive_skb()
é¢æ°ä»¥éã®å¦çã«ã¤ãã¦
2.1 åæç¥è: EtherTypeã¨packet_typeæ§é ä½
e1000_receive_skb()
ã®è§£èª¬ã«å
¥ãåã«EtherTypeã¨packet_typeæ§é ä½ã«ã¤ãã¦ç°¡åã«èª¬æãã¾ãã
Ethernetãã©ã¤ãè¦ç¹ã§ã¯IPãARPãªã©ãä¸ä½ãããã³ã«ã®ç°ãªããã±ãããé 次åä¿¡ãããã¨ã«ãªãã¾ãã
ãã®ã¨ãããã±ãããã¨ã«ç°ãªããããã³ã«ãã©ã®ããã«ãã³ããªã³ã°ãã¦ããã®ã§ããããã
å
·ä½çã«ã¯ä¸ä½ãããã³ã«ã¨ãããã«å¯¾å¿ãããã³ãã©(é¢æ°)ãã©ã®ããã«ç¹å®ãã¦ããã®ã§ããããã
ã¾ããä¸ä½ãããã³ã«ã®ç¹å®ã«ã¤ãã¦ã§ãããããã¯ãã±ããã®Ethernetããããããããã¾ãã
Ethernetãããã§ã¯ãä¸ä½ãããã³ã«ã示ããEtherTypeãã¨ãããã£ã¼ã«ããç¨æãã¦ãã¾ãã
ããã¦ããã®ãEtherTypeãã®è§£æçµæãããã¼ãªã³ã°æã«sk_buffæ§é ä½ã®.protocol
ã«è¨é²ãã¾ãã
ãã¨ã¯ããã®ãEtherTypeã(sk_buffæ§é ä½ã®.protocol
)ã¨ãã³ãã©(é¢æ°)ãç´ä»ããæ
å ±ãããã°ãä¸ä½ã¬ã¤ã¤ã¸ãã±ãããé
éã§ãã¾ãã
ãã®æ
å ±ãä¿æãã¦ããã®ããpacket_typeæ§é ä½ã§ãã
Linuxã«ã¼ãã«ã§ã¯ãEthernetTypeãã¨ã«packet_typeåã®ãªãã¸ã§ã¯ããçæããããã·ã¥ãã¼ãã«çã§ç®¡çãã¦ãã¾ãã
ããã¦ãå³4ã®ããã«sk_buffæ§é ä½ã®.protocol
ã¨packet_typeæ§é ä½ã®.type
ã¨æ¯è¼ãããã¨ã§ãã³ãã©(.func
ãããã¯.list_func
)ãç¹å®ã§ãã¾ãã
å³4ã®ä¾ã§ã¯ip_rcv
é¢æ°(ãããã¯ip_list_rcv
é¢æ°)ã«ãã±ãã(sk_buffæ§é ä½)ã渡ãã°è¯ãã¨ãããã¾ãã
ããªãã¡ãEtherTypeã®ç°ãªãåãã±ãã(sk_buffæ§é ä½)ãããããé©åãªãã³ãã©(é¢æ°)ãéãã¦ä¸ä½ã¬ã¤ã¤ã¸æ¸¡ãããã«ã¯ã
ãã±ãã(ãããã³ã«)ã«é©ããpacket_typeæ§é ä½ãåç
§ããã°è¯ãã¨ãããã¨ã«ãªãã¾ãã
ã¾ãããã®ãã¨ããããã2ã¤ã®ãã±ãããåãpacket_typeæ§é ä½ãåç §ãã¦ããã¨ããããã2ã¤ã®ãã±ããã¯åãEtherTypeã§ããã¨è¨ãã¾ãã (ããªãã¡ãåãä¸ä½ãããã³ã«ã§ããã¨è¨ãã¾ãã)
2.2 æ¦è¦
ããã§ã¯ãã±ãããä¸ä½ã¬ã¤ã¤ã¸é
éããé¨åãè¦ã¦ããã¾ãããã
å³4ã®ä¸ä½ã¬ã¤ã¤ã¸ã®ãã³ãã©ãå¼ã³åºãã¦ããã®ã¯ãe1000_receive_skb()
é¢æ°ã®å»¶é·ã«ãã__netif_receive_skb_list_ptype()
é¢æ°ã§ãã
e1000_receive_skb()
é¢æ°ããå³4ã®packet_typeæ§é ä½ã«ç»é²ãã¦ãããã³ãã©ãå¼ã³åºãã¾ã§ã®æµãã¯ä»¥ä¸ã®ããã«ãªã£ã¦ãã¾ãã
*1
e1000_receive_skb() âââ eth_type_trans() // åä¿¡ãã±ããã®Ethernetãããã解æãEtherTypeãåå¾ã»è¨é²ã(å³3åç §) âââ napi_gro_receive() âââ dev_gro_receive() // çµåå¯è½ãªãã±ããããã¼ã¸ãã1ã¤ã®å¤§ããªãã±ããã«ãããï¼æ¬¡å以éã®è¨äºã§è§£èª¬äºå®ï¼ âââ napi_skb_finish() âââ gro_normal_one() // åä¿¡ãã±ãã(=sk_buffæ§é ä½)ããªã¹ãã«ã¤ãªãããªã¹ããè¦å®ã®é·ãæªæºã§ããã°returnã âââ gro_normal_list() âââ netif_receive_skb_list_internal() âââ __netif_receive_skb_list() âââ __netif_receive_skb_list_core() // å種ããã¯é¢æ°ã®åä¿¡ãã³ãã©ãå¼ã³åºãã2.3ç¯ã§è§£èª¬ã âââ __netif_receive_skb_list_ptype() // packet_typeæ§é ä½ã«ç»é²ãã¦ãããã³ãã©ãå¼ã³åºãã(å³4åç §) âââ ip_list_rcv() // ä¸ä½ã¬ã¤ã¤ã®å¦ç
å¼ã³åºãé¢ä¿ãé常ã«æ·±ãã§ããããã®éã«å¦çæ§è½åä¸ã®ããã«ä»¥ä¸ã®å·¥å¤«ãè¡ã£ã¦ãã¾ãã
(1) dev_gro_receive()
çµåå¯è½ãªãã±ããããã¼ã¸ãã1ã¤ã®å¤§ããªãã±ããã«ããã
ãã ãæ¬è¨äºã®åæã§ããUDPã®å ´åãããã©ã«ãã§ã¯OFFã§ããã
(2) gro_normal_one()
åä¿¡ãã±ããå(=sk_buffæ§é ä½ã®ãªã¹ã)ãä½æããã
ããã«ãããå¾ç¶ã®é¢æ°ã®å¼ã³åºãã«ããããªã¼ãããããåæ¸ããã
(3) __netif_receive_skb_list_core()
åãä¸ä½ã¬ã¤ã¤ã«æ¸¡ãããã±ããããµããªã¹ãã«ã¾ã¨ãããã¨ã§ãä¸ä½ã¬ã¤ã¤ã§æéã®ãããå¦çï¼æ¬¡åã®è¨äºã§è§£èª¬äºå®ï¼ãå¹çåããããã®ä¸æºåãããã
(1)ã«ã¤ãã¦ã¯GRO(Generic Receive Offload)ã«é¢ããæ©è½ã§ããã次å以éã®è¨äºã§è§£èª¬äºå®ã§ãã
(3)ã®__netif_receive_skb_list_core()
ã¯å°ã
è¤éãªå®è£
ã«ãªã£ã¦ãããã次ç¯ã§è§£èª¬ãã¾ãã
æ¬ç¯ã§ã¯æå¾ã«(2)ã®gro_normal_one()
é¢æ°ã«ã¤ãã¦ç°¡åã«è§£èª¬ãã¾ãã
å¾è¿°ãã__netif_receive_skb_list_core()
é¢æ°ã§ã¯ãåä¿¡ãã±ããå(=sk_buffæ§é ä½ã®ãªã¹ã)ãå¦çãã¾ãããåä¿¡ãã±ããããªã¹ãã«ãã¦ããã®ãããã®gro_normal_one()
é¢æ°ã§ãã
gro_normal_one()
é¢æ°ã¯ãå¦çè² è·ã®åæ¸ãç®çã«è¤æ°ã®åä¿¡ãã±ãã(=sk_buffæ§é ä½)ããªã¹ãã«ã¾ã¨ãã¦ãã¾ãã
ããã§ã¯ããªã¹ããä¸å®ã®é·ã(8ãã±ããå)ãè¶
éããã¿ã¤ãã³ã°ã§æ¬¡ã®å¦ç(gro_normal_list()
)ãå¼ã³åºãã¾ãã*2*3
ãã®ããã«è¤æ°ã®åä¿¡ãã±ãããã¾ã¨ãããã¨ã§ã__netif_receive_skb_list_core()
ã®å¼ã³åºãåæ°ã®åæ¸å¹æã次å以é解説ããå¾ç¶å¦çã®å¹çåã¨ãã£ãã¡ãªãããå¾ããã¾ãã
次ç¯ã§ã¯(3)ã®__netif_receive_skb_list_core()
é¢æ°ã®æåã追ã£ã¦ããããã¨æãã¾ãã
2.3 __netif_receive_skb_list_core()
é¢æ°ã®å¦ç
å
è¿°ã®ã¨ããã__netif_receive_skb_list_core()
é¢æ°ã¯ãgro_normal_one()
é¢æ°ã§ãªã¹ãã«ããåä¿¡ãã±ããã
é£ç¶ããåä¸EtherType(IPv4ãARPãªã©)ã®ãã±ãããã¨ã«ãµããªã¹ãã«åå²ããä¸ä½ã¬ã¤ã¤ã¸æ¸¡ãæºåããã¾ãã
å®è£ æ¹é
大ã¾ããªå®è£ ã®æµãã¨ãã¦ã¯ãåãåã£ããã±ããå(=sk_buffæ§é ä½ã®ãªã¹ã)ã®ããããã«å¯¾ãã
- åã®ãã±ããã¨ä¸ä½ããã³ãã«ãåä¸ã§ããã°ããã±ããããµããªã¹ãã«ç¹ãå¤ãã
- åã®ãã±ããã¨ä¸ä½ãããã³ã«ãç°ãªãã°ããµããªã¹ãã確å®ãä¸ä½ãããã³ã«ã®ãã³ãã©ã«ãµããªã¹ãã渡ã
(ãã®å¾ããµããªã¹ãããªã»ãããã¦ããç¾å¨æ³¨ç®ãã¦ãããã±ããããµããªã¹ãã«ã¤ãªã)
ã¨ããæµãã«ãªãã¾ãã ãã®ã¨ããåã®ãã±ããã¨åãä¸ä½ãããã³ã«ãã¨ããå¤æã«ã¯ãpacket_typeæ§é ä½ã¸ã®åç §(ãã¤ã³ã¿)ããç¨ãã¾ãã (ãåç §å ã®packet_typeæ§é ä½ãåä¸ã=ãä¸ä½ãããã³ã«ãåä¸ãã¨ãããã¨ãå©ç¨ãã¦ãã¾ã(å³4åç §))
ã½ã¼ã¹ã³ã¼ã解説
ããã§ã¯å®éã®ã½ã¼ã¹ã³ã¼ããè¦ã¦ã¿ã¾ãããã
(/net/core/dev.c)
static void __netif_receive_skb_list_core(struct list_head *head, bool pfmemalloc) { ... INIT_LIST_HEAD(&sublist); list_for_each_entry_safe(skb, next, head, list) { // çè ã³ã¡ã³ã: skbã®ãªã¹ããèµ°æ» struct net_device *orig_dev = skb->dev; struct packet_type *pt_prev = NULL; skb_list_del_init(skb); __netif_receive_skb_core(&skb, pfmemalloc, &pt_prev); // çè ã³ã¡ã³ã: ãã±ããã¿ã¤ã(pt_prev)ãåå¾ if (!pt_prev) continue; if (pt_curr != pt_prev || od_curr != orig_dev) { // çè ã³ã¡ã³ã: EtherTypeãå¤ãã£ãå ´å(pt_curr != pt_prev) /* dispatch old sublist */ __netif_receive_skb_list_ptype(&sublist, pt_curr, od_curr); // çè ã³ã¡ã³ã: sublistãä¸ä½ã¬ã¤ã¤ã¸æ¸¡ã /* start new sublist */ INIT_LIST_HEAD(&sublist); // çè ã³ã¡ã³ã: sublistã®ãªã»ãã pt_curr = pt_prev; od_curr = orig_dev; } list_add_tail(&skb->list, &sublist); // çè ã³ã¡ã³ã: sublistã«ã¤ãªã(=EtherTypeãåãå ´å) } /* dispatch final sublist */ __netif_receive_skb_list_ptype(&sublist, pt_curr, od_curr); }
ããã§ãã¤ã³ãã¨ãªãå¤æ°ã¯ä»¥ä¸ã®4ã¤ã§ãã
skb
ç¾å¨çç®ãã¦ãããã±ããsublist
ä¸ä½ã¬ã¤ã¤ã®ãã³ãã©ã«æ¸¡ãããã®ãã±ããå(ããã¾ã§ãµããªã¹ãã¨èª¬æãã¦ãããã®)ã
å è¿°ã®ã¨ãããããã«ç¹ãããã±ããã¯ãã¹ã¦åä¸ã®EtherTypeå¤(=ä¸ä½ãããã³ã«)ã§ãããpt_curr
ãµããªã¹ãå (=sublist
)ã®ãã±ããã«å¯¾å¿ããpacket_typeæ§é ä½ã¸ã®ãã¤ã³ã¿ã
ãã®pt_curr
ã®.func
ãããã¯.list_func
ãä¸ä½ã¬ã¤ã¤ã®ãã³ãã©ã¨ãªã£ã¦ãããpt_prev
ç¾å¨çç®ãã¦ãããã±ããã«å¯¾å¿ããpacket_typeæ§é ä½ã¸ã®ãã¤ã³ã¿ã
ãã®pt_prev
ã¨pt_curr
ãæ¯è¼ãããã¨ã§ãåã®ãã±ããã¨åä¸ã®EtherTypeããå¤æããã
(å³å¯ã«ã¯åã®ãã±ããã¨æ¯è¼ãã¦ããã®ã§ã¯ãªããsublist
ã«ç¹ãããã±ããã¨æ¯è¼ãã¦ãã)
åä¿¡ãã±ããã®ãªã¹ã(head
)ãèµ°æ»ãããã±ããã®EtherTypeãåä¸ã§ããéããã±ããã1ã¤ãã¤sublist
ã«ã¤ãªãå¤ãã¦ããã¾ãã
sublist
ã®EtherTypeã¨ç°ãªããã±ãããåºç¾ãã段é(pt_curr != pt_prev
)ã§sublist
ã確å®*4ãã__netif_receive_skb_list_ptype()
é¢æ°ãéãã¦ä¸ä½ã¬ã¤ã¤ã¸ãã±ãããé
éãã¾ãã
ãã®ã¨ãpt_prev
ã¯__netif_receive_skb_core()
é¢æ°ãçµç±ãã¦åå¾ãã¦ãã¾ãã(å³8åç
§)
ããã§ã¯ããã®__netif_receive_skb_core()
é¢æ°ã¯ãã±ããã®packet_typeæ§é ä½ãåå¾ããããã®é¢æ°ãªã®ãã¨ããã¨ãããã§ã¯ããã¾ããã
__netif_receive_skb_core()
é¢æ°ã«ã¯ããã±ãã(sk_buffæ§é ä½)ãä¸ä½ã¬ã¤ã¤ã¸é
éããåã«å®æ½ããå¿
è¦ã®ããããã¯ãã¤ã³ããéç´ãã¦ãã¾ãã
次ç¯ã§ã¯ããã®__netif_receive_skb_core()
é¢æ°ã«ã¤ãã¦ç°¡åã«ç´¹ä»ãã¾ãã
2.4 __netif_receive_skb_core()
ã®å¦ç: ããã¯å¦çã¨RAWã½ã±ããã¸ã®é
é
åè¿°ã®ã¨ããã__netif_receive_skb_core()
é¢æ°ã«ã¯ããã±ãã(sk_buffæ§é ä½)ãä¸ä½ã¬ã¤ã¤ã¸é
éããåã«å®æ½ããå¿
è¦ã®ããããã¯ãã¤ã³ããéç´ãã¦ãã¾ãã
å
·ä½çã«ã¯ä»¥ä¸ã®æ©è½ã®åä¿¡å¦çããã®é¢æ°ããå¼ã³åºãã¾ãã
- Generic XDP
- TC ingress
- netfilter ingress
- Macsec
- MACVLAN
- IPVLAN
- MacVTap
- Teaming
- Bridge
- Bonding
ã¾ããsocket(AF_PACKET, SOCK_RAW, ETH_P_ALL)
ã®ããã«åä¿¡ãããã¹ã¦ã®ãã±ãããæ±ãRAWã½ã±ããã¸ã®é
éã¯ããã§è¡ãã¾ãã
3 ãã¼ãªã³ã°ã®çµäºæ¡ä»¶
3.1 e1000_clean_rx_irq()
é¢æ°ã®çµäºæ¡ä»¶
æå¾ã«NAPIã«ãããã¼ãªã³ã°ã®çµäºæ¡ä»¶ã«ã¤ãã¦ç°¡åã«è§£èª¬ãã¾ãã
ããã¾ã§é¢æ°ãæ·±ã辿ã£ã¦ããã®ã§ãã¼ãªã³ã°é¨åã«ã¤ãã¦ç°¡åã«ãããããã¦ããã¨ãNAPIã®ãã¼ãªã³ã°ã¯e1000_clean_rx_irq()
é¢æ°å
ã§ãããªããã¦ãããã¨ã説æãã¾ããã
loop
é¨åãNAPIãã¼ãªã³ã°ã«ãããã±ããã®åä¿¡å¦çã«è©²å½ãã¾ãã
ããã¯e1000_clean_rx_irq()
é¢æ°ã«ããã¦ä»¥ä¸ã®while
ã«ã¼ãã«è©²å½ããé¨åã§ãã
(/drivers/net/ethernet/intel/e1000e/netdev.c)
static bool e1000_clean_rx_irq(struct e1000_ring *rx_ring, int *work_done, int work_to_do) { ... while (staterr & E1000_RXD_STAT_DD) { struct sk_buff *skb; if (*work_done >= work_to_do) break; (*work_done)++; ... // çè ã³ã¡ã³ã: ãã±ããåä¿¡å¦ç } ... }
ã¤ã¾ãããã®while
ã«ã¼ãã®çµäº/è±åºæ¡ä»¶ããã®ã¾ã¾NAPIãã¼ãªã³ã°ã®çµäºæ¡ä»¶ã¨ãªãããã§ããã
ã½ã¼ã¹ã³ã¼ããããã®while
ã«ã¼ããæããæ¡ä»¶ã¯ã以ä¸ã®2ã¤ã®ã©ã¡ãããæºãããã¨ãã¨ãããã¾ãã
staterr & E1000_RXD_STAT_DD
ã0ã¨ãªãã¨ã*work_done >= work_to_do
ãæºããã¨ã
æ¡ä»¶ ãã®1: staterr & E1000_RXD_STAT_DD
ã0ã¨ãªãã¨ã
staterr
ã¯NICã®ã¬ã¸ã¹ã¿ã®å¤ã示ãã¦ãããE1000_RXD_STAT_DD
ã¯åä¿¡ãã¼ã¿ã®ãããã¡ã¸ã®æ¸ãè¾¼ã¿ãå®äºãã¦ãããã¨ãæå³ãããã©ã°ã«ãªã£ã¦ãã¾ãã
while
ã«ã¼ãã®ç¶ç¶æ¡ä»¶ã§ããstaterr & E1000_RXD_STAT_DD
ã¯åä¿¡ãã¼ã¿ããªã³ã°ãããã¡ããåãåºããç¶æ
ã§ãããã¨ãæå³ãã¦ãã¾ãã
ã¤ã¾ãããªã³ã°ãããã¡ã«åä¿¡ãã¼ã¿ããªãå ´åãstaterr & E1000_RXD_STAT_DDã0ã¨ãªããã¼ãªã³ã°ãçµäºãã¾ãã
æ¡ä»¶ ãã®2: *work_done >= work_to_do
ãæºããã¨ã
work_done
ã¯ã½ã¼ã¹ã³ã¼ãã§ã¤ã³ã¯ãªã¡ã³ããã¦ãã((*work_done)++;
)ãã¨ããããããã¨ããããã¼ãªã³ã°ã«ããåä¿¡å¦çãã(=ä¸ä½ã¬ã¤ã¤ã«é
éãã)ãã±ããæ°ã示ãã¦ãã¾ãã
*5
ããªãã¡ã*work_done >= work_to_do
ã¯ãã¼ãªã³ã°ã«ããåä¿¡ãã±ããæ°ãè¦å®æ°(work_to_do
)ãä¸åã£ãå ´åãwhile
ã«ã¼ããbreak
ãããã¼ãªã³ã°ãçµäºãã¾ãã
ãã®è¦å®æ°ã¨ãªãwork_to_do
ã¯é¢æ°ã®å¼æ°ã¨ãã¦æ¸¡ããã¦ãããä½ã®å¤ãå
¥ã£ã¦ãããã¯é¢æ°ãããã¤ãé¡ãå¿
è¦ãããã¾ãããçµè«ã¨ãã¦ã¯64
(NAPI_POLL_WEIGHT
)ã¨ããåºå®å¤ãã»ããããã¾ãã
(napi_structæ§é ä½ã®.weight
ã®å¤ãå
¥ãã¾ãã)
ã¤ã¾ãããã¼ãªã³ã°å¦çã§1度ã«åä¿¡ã§ãããã±ããæ°ã¯æ大ã§64ãã±ããã¨ãªãã¾ãã
3.2 ãã¼ãªã³ã°ã®åã¹ã±ã¸ã¥ã¼ã«
åç¯ã§ããã¼ãªã³ã°å¦çã§1度ã«åä¿¡ã§ãããã±ããæ°ã¯æ大ã§64ãã±ãããã¨èª¬æãã¾ããããNICã64ãã±ãã以ä¸åä¿¡ãã¦ããå ´åããã®å¾ã®å¦çã¯ã©ããªãã®ã§ããããã
åç¯ã®ã¨ãããNICã64ãã±ãã以ä¸åä¿¡ãã¦ããå ´åã§ããe1000_clean_rx_irq()
é¢æ°èªä½ã¯while
ã«ã¼ããæãã¦çµäºãã¾ãã
ã¨ãããã以ä¸ã®ããã«é¢æ°ã®å¼ã³åºãå
ã辿ã£ã¦ããã¨ãå®ã¯__napi_poll()
é¢æ°/napi_poll()
é¢æ°ã§ãæªå¦çã®åä¿¡ãã±ãããããã¨å¤å®ããå度NAPIãã¼ãªã³ã°ã®åã¹ã±ã¸ã¥ã¼ãªã³ã°ãè¡ãã¾ãã
net_rx_action() // 4. NAPIãã¼ãªã³ã°ã®åã¹ã±ã¸ã¥ã¼ã«ã¨NET_RX_SOFTIRQã®raise âââ napi_poll() // 3. NAPIãã¼ãªã³ã°ã®åã¹ã±ã¸ã¥ã¼ã«ãè¦æ± âââ __napi_poll() // 2. æªå¦çã®åä¿¡ãã±ããããã¨å¤å® âââ n->poll: e1000e_poll() âââ adapter->clean_rx: e1000_clean_rx_irq() // 1. 64ãã±ããåä¿¡
ãã®ããã«ãNICã64ãã±ãã以ä¸åä¿¡ãã¦ããå ´åã§ããå度ãã¼ãªã³ã°ã«ããåä¿¡å¦çãå®è¡ããããã«ãªã£ã¦ãã¾ãã ããã§ã¯ãNICããã±ãããåä¿¡ãç¶ãã¦ããå ´åãæ°¸é ã¨ãã¼ãªã³ã°ã«ããåä¿¡å¦çãç¹°ãè¿ãã®ã§ããããï¼ ãã¡ãããããªãã¨ãªããããæ¡ä»¶ãæºãã¨ä»¥éã®ãã¼ãªã³ã°ã«ããåä¿¡å¦çãksoftirqdã«ç§»è²ãã CPUã解æ¾ãããã¨ã§ãã±ããã®åä¿¡å¦çãCPUãå æããªãããã«ãã¦ãã¾ãã ãªããksoftirqdã¯ã¹ã±ã¸ã¥ã¼ã©ããCPUãå²ãå½ã¦ãããã¿ã¤ãã³ã°ã§åä¿¡å¦çãåéãã¾ãã
æ¬ç« ã®æå¾ã§ã¯ããã¼ãªã³ã°ã®åã¹ã±ã¸ã¥ã¼ãªã³ã°ãå«ãããã¼ãªã³ã°ã¾ãããããå°ã俯ç°ãã¦ã¿ã¾ãã
ä»åã®ãã±ããåä¿¡å¦çã«é¢ããé£è¼ã§ã¯ãNICã1æã ãæè¼ããã¦ãããã¨ãåæã¨ãã¦ãã¾ãããä»ã ãè¤æ°æã®NICãæè¼ããã¦ããç¶æ³ãèãã¾ãã
ããã¨ãå³10*6ã«ç¤ºãã¨ããããã¼ãªã³ã°ã«ããåä¿¡å¦çã¯å¤§ãã3ã¤ã®ã«ã¼ãæ§é ã§å®è£
ããã¦ãããã¨ããããã¾ãã
(å³10ä¸ã®loop*
ã«ä»éããæ¡ä»¶ã¯ã«ã¼ãã®ç¶ç¶æ¡ä»¶ã§ããåloop*
ã«è¤æ°æ¡ä»¶ãããã¾ããããã¹ã¦ANDæ¡ä»¶ã§ãã)
ä»åã®è¨äºã§è§£èª¬ãã¦ããå¦çã¯ä¸»ã«å³10ä¸ã®loop3
ã«è©²å½ãã¾ãã
å¦çã®ã·ã¼ã±ã³ã¹ã¨ãã¦ã¯loop1
âloop2
âloop3
ã¨é 次ãã¹ããã¦ãããå
å´ã®ã«ã¼ãã«è¡ãã»ã©ãã±ããã®åä¿¡å¦çã«å¯¾ããå¶ç´ãå³ãããªã£ã¦ãããã¨ããããã¾ãã*7
ããã¯ãç¹å®ã®å¦çã«æéãå²ãéããªãããã«ãã½ããå²ãè¾¼ã¿ã³ã³ããã¹ãã(loop1
)ããNET_RX_SOFTIRQã(loop2
)ããEthernetãã©ã¤ãã(loop3
)ã®åã¬ã¤ã¤ã¼ãã¨ã«å¦çæéãåä¿¡ãã±ããæ°ã«å¶ç´ãè¨ãã¦ããã¨è¨ãã¾ãã
*8
ããã¦ãloop1
ã®ã«ã¼ãç¶ç¶æ¡ä»¶ãæºãããªããªã£ãæç¹ã§ãksoftirqdã«å¦çã移è²ãã¾ãã*9
ãªããloop1
ã®ã«ã¼ãç¶ç¶æ¡ä»¶ã¯ã½ããå²ãè¾¼ã¿å¦çå
¨è¬ã®å¦çã«å¯¾ããå¶ç´ã§ãããããå½è©²ã«ã¼ãç¶ç¶æ¡ä»¶ã¯ãNET_RX_SOFTIRQãã«å¯¾ãã¦ã®ã¿é©ç¨ãããã®ã§ã¯ãªããã¨ã«æ³¨æãã¦ãã ããã
ã¤ã¾ããNICããã±ãããåä¿¡ãç¶ããã¨ãã¦ãããããã¯loop1
ã®ã«ã¼ãç¶ç¶æ¡ä»¶ãæºãããªããªããksoftirqdã«æ®ãã®å¦çã移è²ãããã¨ã«ãªãã¾ãã
æå¾ã«ãã¼ãªã³ã°ã®åã¹ã±ã¸ã¥ã¼ã«ã®æµããå³10ã¨ç
§ããåããã¦ã¿ã¦ã¿ããã¨æãã¾ãã
åNICã§çºçãããã¼ãªã³ã°ã®åã¹ã±ã¸ã¥ã¼ãªã³ã°(loop3
)ã¯å³åº§ã«å®è¡ããã®ã§ã¯ãªããloop1
ã¾ã§å¦çãæ»ã£ã¦ããå®è¡ãã¦ãããã¨ããããã¾ãã
次åäºå
ååã¨ä»åã®2åã§Ethernetãã©ã¤ãã«ããããã±ããåä¿¡å¦çãè¦ã¦ãã¾ããã 次åããã¯IPã¬ã¤ã¤ã¼ããsocketã¤ã³ã¿ã¼ãã§ã¼ã¹ã¾ã§ã®æµãã追ã£ã¦ããããã¨æãã¾ãã
*1:ãgroãã¨ããåèªãããã¤ãè¦ããã¾ãããããã¯ãGeneric Receive Offloadãã®ç¥ã§ã次å以éã®è¨äºã§è§£èª¬ããäºå®ã§ãããã ããUDPã§ã¯ã¦ã¼ã¶ãæ示çã«æå®ããªãéãåºæ¬çã«ä½¿ããã¾ããã(Ethernetã¬ã¤ã¤ã¼ã«ããã¦ã¯éè¦ãªæ©è½ãªã®ã§è¨åãã¦ãã¾ãã)
*2:ãã®ãªã¹ãã®é·ãã¯ããã°ã©ã å ã®gro_normal_batchã¨ããå¤æ°ã§æ±ºå®ãã¦ãã¾ããããã°ã©ã ã§ã¯åæå¤ã8ã¨ãªã£ã¦ãããããããã©ã«ãã§8ãã±ããã«ãªãã¾ããå®ã¯ãã®gro_normal_batchã¯ã«ã¼ãã«ãã©ã¡ã¼ã¿ã«ãªã£ã¦ããããã/proc/sys/net/core/gro_normal_batchãã確èª/å¤æ´ãã§ãã¾ãã
*3:ãã¼ãªã³ã°ã®çµæãåä¿¡ãã±ããæ°ãgro_normal_batchæªæºã§ãã£ãå ´åãæ¬è¨äºã¨ã¯å¥ã®çµè·¯ã§ä¸ä½ã¬ã¤ã¤ã«é éãã¾ããå ·ä½çã«ã¯ãgro_normal_oneãããã®ã¾ã¾e1000e_pollã¾ã§returnãã¦ãããe1000_clean_rx_irq(=adaptor->clean_rx)ã®å¾ã«å¼ã³åºãnapi_complete_doneã§ä¸ä½ã¬ã¤ã¤ã«é éãã¾ãã
*4:å®éã«ã¯ãåä¿¡ããããã¤ã¹ãåä¸ãããå¤æåºæºã«ãªã£ã¦ãã¾ãããããã§ã¯è©±ãç°¡åã«ããããã«ãã¹ã¦ã®ãã±ãããåãããã¤ã¹ããåä¿¡ãã¦ãããã®ã¨ãã¾ãã
*5:work_doneãã¤ã³ã¿å¤æ°ãé¢æ°ã®å¼æ°ã¨ãã¦æ¸¡ãã¾ãããwork_doneãã¤ã³ã¿ã®æãå ã¯å¼ã³åºãå ã§ããe1000e_poll()é¢æ°ã§0ã¯ãªã¢ãã¦ãã¾ãã
*6:å³10ã¯è¤æ°ã®NICãæè¼ããã¦ããã ãã§ãªããCPUã®ã³ã¢ã1ã¤ã ãã¨ããç¹æ®ãªç¶æ³ã§ãã説æã®ããã«æ¥µç«¯ãªç¶æ³ãæ³å®ãã¾ãããé常ã¯NICãã¨ã«å²ãè¾¼ã¿å ã®CPUãå¤ããå¯è½æ§ãé«ããããå³10ã®ããã«1ã¤ã®ã³ã¢ãè¤æ°ã®NICã®ä¸è©±ãããã¨ããç¶æ³ã¯ã¬ã¢ã±ã¼ã¹ã¨æããã¾ããå³10ã®ç¶æ³ãåç¾ããå ´åã¯ãCPUã®ã³ã¢æ°ããå¤ãã®NICãæ¿ãããå²ãè¾¼ã¿å ã®CPUãåºå®ããçã®è¨å®ãå¿ è¦ã«ãªãã¾ãã
*7:loop1ã«ãããã±ããåä¿¡å¦ç以å¤ã®ã½ããå²ãè¾¼ã¿å¦çãå ¬å¹³ã«å¦çãããloop2ã«ããè¤æ°ããNICãããããå ¬å¹³ã«å¦çãããããã«å·¥å¤«ãã¦ãã¾ãã
*8:å³10ä¸ã«ãè¨è¼ãã¦ããã¨ããããNET_RX_SOFTIRQã(loop2)ã¬ãã«ã§ã®å¦çã®å¶ç´æ¡ä»¶ã«ã¤ãã¦ã¯ã«ã¼ãã«ãã©ã¡ã¼ã¿ã¨ãªã£ã¦ãã¾ãããã®ããããããã®å¤ã¯/proc/sys/net/core/netdev_budgetã¨/proc/sys/net/core/netdev_budget_usecsã§ãããã確èª/å¤æ´ãã§ãã¾ãã
*9:å³10ã§ã¯loop1ã®ç¶ç¶æ¡ä»¶ã¨ãã¦2ã¤ã®æ¡ä»¶ãè¨è¼ãã¦ãã¾ãããå³å¯ã«ã¯ãã1ã¤æ¡ä»¶ãããã¾ããæéãåæ°ã®å¶ç´ä»¥å¤ã«ã¹ã±ã¸ã¥ã¼ãªã³ã°ãå¿ è¦ãªããã»ã¹ãåå¨ããå ´åã«ã¯ããã¼ãªã³ã°ã®åã¹ã±ã¸ã¥ã¼ãªã³ã°ã¯è¡ããã«ãå¦çãksoftirqdã«ç§»è²ãã¾ãã