ãLinuxã«ã¼ãã«2.6解èªå®¤ãï¼ä»¥éãæ§çï¼åºçå¾ãLinuxã«ã¯å¤ãã®æ©è½ã追å ãããã¨ã³ã¿ã¼ãã©ã¤ãºé åãã¯ããã¨ããæ§ã
ãªå ´æã§ä½¿ãããããã«ãªãã¾ããã
ããã«ä¼´ãã³ã¼ããè¥å¤§ãã¤è¤éåããå¤ãã®ã¨ã³ã¸ãã¢ã«ã¨ã£ã¦è§£èªä¸è½ãªãã©ãã¯ããã¯ã¹ã¨ãªã£ã¦ãã¾ãã
ä¸çä¸ã®ãããã¨ã³ã¸ãã¢éã®åä½ã§ããLinuxã«ã¼ãã«ã«ã¡ã¹ãå
¥ãããã©ãã¯ããã¯ã¹ãããéãã¦ãæã«å¥½å¥å¿ã®èµ´ãã¾ã¾ã«ã«ã¼ãã«ã®ä¸çã解èªãããæ°Linuxã«ã¼ãã«è§£èªå®¤ãããã¸ã§ã¯ãã
å·çè : é«åé¼
â» ãæ°Linuxã«ã¼ãã«è§£èªå®¤ãé£è¼è¨äºä¸è¦§ã¯ãã¡ã
1. ã¯ããã«
æ¬ç¨¿ã§ã¯ãLinuxã®æä»å¶å¾¡ã®ä»çµã¿ã§ããèªã¿æ¸ãã¹ãã³ããã¯(RWããã¯)ã¨RCUããå®éã«Linuxã§ã©ã使ããã¦ããã®ã確èªããªããç´¹ä»ãã¾ããããããåç §ãæ´æ°ã«æ¯ã¹ã¦å¤ããã¼ã¿æ§é ã«å¯¾ãã¦ä½¿ããã¾ãããæ¬ç¨¿ã§ã¯å ·ä½ä¾ã¨ãã¦netfilterã«ãããæä»å¶å¾¡ãåãæãã¾ãã
1.1. RWããã¯ã¨RCUã®ãããã
æ¬ç¨¿ã¯å½å解èªå®¤ã®æä»å¶å¾¡ã«ãããè¨äºãæ³å®ãã¦æ¸ãã¦ããã¾ãããã解説ã«ãããå ·ä½ä¾ã¨ãã¦netfilterã«ã¤ãã¦èª¿ã¹ã¦ãããã¡ã«ãnetfilterèªä½ã®å¦çã主ã«è§£èª¬ãããã¨ã«ãªã£ã¦ãã¾ãã¾ãããæä»å¶å¾¡èªä½ã«ã¤ãã¦ã¯ã¾ãå¥é解説ããã¦ããã ããããããã§ã¯ç°¡åã«æ¬ç¨¿ã§åãæããæä»å¶å¾¡ã確èªãã¾ãããªãã·ã¼ã±ã³ã¹ã«ã¦ã³ã¿ï¼seqã«ã¦ã³ã¿ï¼ã«ã¤ãã¦ã¯ãæ¬ç¨¿ã§ç´¹ä»ããnetfilterã®æä»å¶å¾¡ã§ä½¿ç¨ããã¦ããããç°¡åã«ç´¹ä»ãã¦ãã¾ãã
é常ã®ã¹ãã³ããã¯
ããã¯å¤æ°ãç¨ãã¦å®ç¾ããã¦ãããè¤æ°ã®CPUããåæã«ä¸ã¤ã®ãã¼ã¿ã«å¯¾ãã¦æä½ãè¡ããããã¨ãé²ãã¾ããããã¯å¤æ°ã¯ãåç §ã»æ´æ°å¦çãåºå¥ãããããã¯ãåå¾ã§ãããã©ããã®å¤å®ãããããã«ç¨ãããã¾ãããã®ãããåç §å¦çå士ã»æ´æ°å¦çå士ã»åç §ã¨æ´æ°å¦çéã®ãããã®å¦çãæä»çã«å®è¡ããã¾ããRWããã¯
åç §å¦çã«ã¤ãã¦ã¯åæã«å®è¡ããããã¨ã許å¯ãããã¹ãã³ããã¯ã¨ãªãã¾ããæ´æ°å¦çå士ã»åç §ã¨æ´æ°å¦çéã«ã¤ãã¦ã¯ãé常ã®ã¹ãã³ããã¯åæ§ã«æä»çã«å®è¡ããã¾ãã
åç §å¦çå士ãåæã«ããã¯ãåå¾ããå ´åã«ã¯ãããã¯å¤æ°ã¯åç §æ°ãæ´æ°ããã«ã¦ã³ã¿ã¨ãã¦ä½¿ç¨ããã¾ããåç §ã¨æ´æ°å¦çéã®æä»ã§ã¯ãããã¯å¤æ°ã«åºã¥ãåç §ä¸ã®CPUãããªããªãç¶æ ã¾ã§æ´æ°å¦çã¯è¡ããã¾ãããseqã«ã¦ã³ã¿
RWããã¯ã¨åæ§ã«ãåç §å¦çå士ã«ã¤ãã¦ã¯åæã«å®è¡ãããåç §ã¨æ´æ°å¦çã»æ´æ°å¦çå士ãæä»çã«å®è¡ããã¾ãã
RWããã¯ã¨ç°ãªãç¹ã¨ãã¦ãåç §ã¨æ´æ°å¦çéã®æä»ãå®ç¾ããæ¹æ³ãæãããã¾ããseqã«ã¦ã³ã¿ã«ãããåç §ã¨æ´æ°å¦çéã®æä»ã§ã¯ãåç §å¦çã«ãã£ã¦æ´æ°å¦çãä¸æããããã¨ã¯ãªãã代ããã«åç §å¦çã¯æ´æ°å¦çã®çµäºãã«ã¦ã³ã¿ã«åºã¥ãã¦å¾ ã¡åããã¾ããã«ã¦ã³ã¿ã¯æ´æ°å¦çã®éå§ã¨çµäºæã«1ãã¤å ç®ãããåç §éå§æã«ã«ã¦ã³ã¿ãå¥æ°ãããã¯åç §éå§æã¨çµäºæã®ã«ã¦ã³ã¿ã®å¤ãç°ãªãå ´åã«å¾ ã¡åãããçºçãã¾ããRCU
åç §å¦çå士ã«å ãã¦ãåç §ã¨æ´æ°å¦çã«ã¤ãã¦ãåæã«å®è¡ããã¾ããæ´æ°å¦çå士ã«ã¤ãã¦ã¯æä»çã«å®è¡ããå¿ è¦ãããã¾ãã
RCUã§ã¯ããã¯ãç¨ããæä»å¶å¾¡ã¯è¡ããã¾ãããRCUã§ã¯æ´æ°å¾ã®ãã¼ã¿ã¨æ´æ°åã®ãã¼ã¿ã交æãã¾ãããã¼ã¿åç §ä¸ã«æ´æ°ãè¡ãããå ´åã«ã¯ãåç §å¦çã¯æ´æ°åã®ãã¼ã¿ãåºã«è¡ããã¾ããæ´æ°åã®ãã¼ã¿ã¯ãåç §å¦çãå ¨ã¦ã®CPUã§çµäºããå¾ã§è§£æ¾ããã¾ãã
以ä¸ã¯ãããããã®æä»å¶å¾¡ã®ä»çµã¿ã«ããã¦ãåæå®è¡ã許å¯ããã¦ããå¦ç (â) ã¨ç¦æ¢ããã¦ããå¦ç(Ã) ãä¸è¦§ã«ãããã®ã§ãã
é常ã®ã¹ãã³ãã㯠| èªã¿æ¸ãã¹ãã³ãã㯠| seqã«ã¦ã³ã¿ | RCU | |
---|---|---|---|---|
åç §ã¨åç § | à | â | â | â |
åç §ã¨æ´æ° | à | à | à | â |
æ´æ°ã¨æ´æ° | à | à | à | à |
2. netfilterã«ã¤ãã¦
netfilterã¯ãiptablesãnftablesãªã©ã§æå®ãããå¦çãå®è¡ããããã«Linuxã®ãããã¯ã¼ã¯ã¹ã¿ãã¯ã«çµã¿è¾¼ã¾ãã¦ããä»çµã¿ã¨ãªãã¾ããiptablesãnftablesã§ã¯ç¹å®ã®ãã±ããã«å¯¾ãã¦æå®ããå¦çãè¡ãããã«ããã®ãã±ããã®æ¡ä»¶ãå¦çã®å 容ãã«ã¼ã«ã¨ãã¦ç»é²ãã¾ããã«ã¼ã«ã¯ãã¦ã¼ã¶ã¼ã«ãã£ã¦è¿½å ã»æ´æ°ããããããã±ãããéåä¿¡ããå ´åã«ã¯åç §ã®å¯¾è±¡ã«ãªãã¾ãã3ç« ã§ã¯ããã®ãããªã«ã¼ã«ã«å¯¾ããåç §ã»æ´æ°å¦çãä¾ã«RWããã¯ã¨RCUãç´¹ä»ãã¾ãã
2.1. netfilterã®ãã¼ã¿æ§é ã»å®è£
ã«ã¼ã«ã«å¯¾ããæä»å¶å¾¡ã«ã¤ãã¦è§¦ããåã«ãnetfilterã§ã¯ã©ã®ããã«ã«ã¼ã«ãç»é²ãããç»é²ããã«ã¼ã«ã«åºã¥ããã±ããã®éåä¿¡ã®éã«ãã£ã«ã¿ãªã³ã°ãè¡ããã¦ããã®ããã以ä¸ã®ã«ã¼ã«ãç»é²ããå ´åãä¾ã«ç¢ºèªãã¾ãã
$ iptables -t filter -A INPUT -s 192.168.x.x -j DROP
iptablesãnftablesã§ã¯ãã±ããã®ãã£ã«ã¿ãªã³ã°ãè¡ããããã¤ã³ãï¼ãã§ã¤ã³ï¼ãipã¢ãã¬ã¹ãæå®ãã¦ã«ã¼ã«ã追å ã§ãã¾ããã追å ãããã«ã¼ã«ã¯netfilterã«ããã¦ãã¼ãã«ã«ç»é²ãããå½¢ã§ç®¡çããã¦ãã¾ãããã¼ãã«ã¯xt_tableæ§é ä½ã§ç®¡çããããã¼ãã«åã対象ã¨ãªããããã³ã«ã¨ãã£ããã¼ãã«ãã¨ã®æ
å ±ãä¿æãã¦ãã¾ãã
(include/linux/netfilter/x_tables.h)
222 /* Furniture shopping... */ 223 struct xt_table { 224 struct list_head list; 225 226 /* What hooks you will enter on */ 227 unsigned int valid_hooks; 228 229 /* Man behind the curtain... */ 230 struct xt_table_info *private; 231 232 /* hook ops that register the table with the netfilter core */ 233 struct nf_hook_ops *ops; 234 235 /* Set this to THIS_MODULE if you are a module, otherwise NULL */ 236 struct module *me; 237 238 u_int8_t af; /* address/protocol family */ 239 int priority; /* hook order */ 240 241 /* A unique name... */ 242 const char name[XT_TABLE_MAXNAMELEN]; 243 };
ä¾ã«æããã«ã¼ã«ãç»é²ãããfilterãã¼ãã«ã®å ´åã«ã¯ä»¥ä¸ã®ããã«å®ç¾©ããã¦ãã¾ãããã¼ãã«ã«ç»é²ããã¦ããã«ã¼ã«ã®å®ä½ã¯ãxt_tableæ§é ä½ã®ã¡ã³ãã§ããxt_table_infoæ§é ä½ã§ç®¡çããã¦ãããã«ã¼ã«ã®åç
§ãæ´æ°ã¯struct xt_table_info *private
ã«å¯¾ããæä½ã¨ãªãã¾ãããã®ãããã«ã¼ã«ã®åç
§ãæ´æ°ã«ä¼´ãæä»å¶å¾¡ã¯ãã¼ãã«ãã¨ã«è¡ããã¾ãã
(net/ipv4/netfilter/iptable_filter.c)
23 static const struct xt_table packet_filter = { 24 .name = "filter", 25 .valid_hooks = FILTER_VALID_HOOKS, 26 .me = THIS_MODULE, 27 .af = NFPROTO_IPV4, 28 .priority = NF_IP_PRI_FILTER, 29 };
ãªãæ¬ç¨¿ã§ç´¹ä»ããããã¯ã¯åç §ã主ã¨ãªããã¼ã¿æ§é ã«å¯¾ãã¦ä½¿ããã¾ããä¾ã«æããã³ãã³ããå®è¡ããå ´åã«ã¯ãã¼ãã«ã«å¯¾ããæ´æ°å¦çã¨ãªãã¾ããããã¼ãã«ã«å¯¾ããåç §ã¯ãã±ããã®éåä¿¡ã«ä¼´ãè¡ããããã£ã«ã¿ãªã³ã°ãªã©ã®å¦çã§çºçãã¾ãããã®ãããæ¬ç¨¿ã§ç´¹ä»ããããã¯ããã¼ãã«ãä¿è·ããããã«ä½¿ãããçç±ã¨ãã¦ã¯ããã±ããã®éåä¿¡ã®ãã³ã«çºçããåç §å¦çã¯ã¦ã¼ã¶ã¼ã«ããæ´æ°å¦çã«æ¯ã¹ã¦å¤ããªãã¨ããæ³å®ããããã¨ãæãããã¾ãã
2.2. ã«ã¼ã«ã®åç §
iptablesãnftablesã§ã¯ãã«ã¼ã«ãç»é²ããéã«ããã±ããéåä¿¡ã®çµè·¯ä¸ã«ãããã¤ã³ããæå®ãã¾ãããããã®ãã¤ã³ãã¯netfilterã«ãã£ã¦ãããããç¨æããã¦ãããç»é²ãããã«ã¼ã«ã¯æå®ãããã¤ã³ãã«å¯¾å¿ããé¢æ°ã§é©ç¨ããã¾ããä¾ã«æããã«ã¼ã«ã®å ´åãipã¬ã¤ã¤ã®åä¿¡çµè·¯ã®éä¸ã«ããip_local_deliver()
ã¨ãªãã¾ãã
netfilterã§ã¯ãNF_HOOKãã¯ãã«ãããã³ã«ããã±ãããåå¾ãããã¤ã³ããå¼æ°ã¨ãã¦æ¸¡ããã¨ã§ããã£ã«ã¿ãªã³ã°çã®å®éã®ãã±ããå¦çã¯ãããã«ç´ã¥ãé¢æ°ã«ãã£ã¦è¡ããã¾ããip_local_deliver()
ã®å ´åãå¦çã®å¯¾è±¡ã¨ãªããã±ããã®ãããã³ã«ã§ããipv4ï¼NFPROTO_IPV4ï¼ã¨ãã±ãããåä¿¡çµè·¯ã§åå¾ãããã¨ã示ãNF_INET_LOCAL_IN
ãNF_HOOKãã¯ãã«æ¸¡ããã¨ã§ãå®éã®ãã±ããå¦çã¯ãããã«ç´ã¥ãipt_do_table()
ã«ãã£ã¦è¡ããã¾ããå®éã«ã«ã¼ã«ã®åç
§ãçºçããã®ãipt_do_table()
ã¨ãªãã¾ãã
239 /* 240 * Deliver IP Packets to the higher protocol layers. 241 */ 242 int ip_local_deliver(struct sk_buff *skb) 243 { 244 /* 245 * Reassemble IP fragments. 246 */ 247 struct net *net = dev_net(skb->dev); 248 249 if (ip_is_fragment(ip_hdr(skb))) { 250 if (ip_defrag(net, skb, IP_DEFRAG_LOCAL_DELIVER)) 251 return 0; 252 } 253 254 return NF_HOOK(NFPROTO_IPV4, NF_INET_LOCAL_IN, 255 net, NULL, skb, skb->dev, NULL, 256 ip_local_deliver_finish); 257 } 258 EXPORT_SYMBOL(ip_local_deliver);
ipt_do_table()
ã§ã¯ãå¼æ°ã¨ãã¦æ¸¡ããããã¼ãã«ã«ç»é²ããã¦ããã«ã¼ã«ãåºã«ãã±ããã®ãã£ã«ã¿ãªã³ã°å¦çãè¡ãã¾ããä¾ã«æããã«ã¼ã«ã®å ´åã«ã¯filterãã¼ãã«ã«ç»é²ããã¦ãã¾ãããã£ã«ã¿ãªã³ã°å¦çã¯ããã¼ãã«ã«å¦çã®å¯¾è±¡ã§ãããã±ããã«è©²å½ããã«ã¼ã«ããããã©ãã確èªããå½¢ã§è¡ãããããå ´åã«ã¯è©²å½ããã«ã¼ã«ã«åºã¥ããã±ããã¯å¦çããã¾ãããã®ãããipt_do_table()
ã«ãããåç
§åºéã¯ããã¼ãã«ã®åç
§ãéå§ãããL.260ãããã¼ãã«ã«ç»é²ããã¦ããã«ã¼ã«ã®å¿
è¦ãªç¢ºèªãçµããL.354ã¾ã§ã¨ãªãã¾ãã
(net/ipv4/netfilter/ip_tables.c)
221 /* Returns one of the generic firewall policies, like NF_ACCEPT. */ 222 unsigned int 223 ipt_do_table(void *priv, 224 struct sk_buff *skb, 225 const struct nf_hook_state *state) 226 { ... 237 const struct xt_table_info *private; ... 258 local_bh_disable(); 259 addend = xt_write_recseq_begin(); 260 private = READ_ONCE(table->private); /* Address dependency. */ 261 cpu = smp_processor_id(); 262 table_base = private->entries; 263 jumpstack = (struct ipt_entry **)private->jumpstack[cpu]; ... 275 e = get_entry(table_base, private->hook_entry[hook]); 276 277 do { // ãã£ã«ã¿ãªã³ã°å¦ç ... 354 } while (!acpar.hotdrop); 355 356 xt_write_recseq_end(addend); 357 local_bh_enable(); 358 359 if (acpar.hotdrop) 360 return NF_DROP; 361 else return verdict; 362 }
ãªãã«ã¼ã«ã®åç
§å¦çã«ããã¦ã¯ãã«ã¼ã«ãã¨ã«ä¿æãã¦ããçµ±è¨æ
å ±ï¼pktsãbyteï¼ã«å¯¾ãã¦ãæä»å¶å¾¡ãè¡ããã¦ãã¾ããçµ±è¨æ
å ±ã¯ããã±ããã®éåä¿¡ã«ããã¦ã«ã¼ã«ãåç
§ããããã³ã«æ´æ°ããã¾ãã
æä»å¶å¾¡ã¯seqã«ã¦ã³ã¿ã«ãã£ã¦è¡ããã次ç¯ã§å¾è¿°ããæä»å¶å¾¡ã¯ã«ã¼ã«ã®åç
§åºéãçµ±è¨æ
å ±ã®æ´æ°åºéã§ãããã¨ãå©ç¨ãã¦ãã¾ããã«ã¼ã«ã®åç
§åºéãxt_write_recseq_begin/end()
ã¨ããæ´æ°åºéç¨ã®é¢æ°ã§éå§ã»çµäºãã¦ããã®ã¯ãã®ããã§ãã
$ iptables -v -L INPUT Chain INPUT (policy ACCEPT 178 packets, 12312 bytes) pkts bytes target prot opt in out source destination 0 0 DROP all -- any any 192.168.105.9 anywhere
2.3. ã«ã¼ã«ã®è¿½å ã»æ´æ°
ã«ã¼ã«ã®åç
§å¦çã¨åæ§ã«ãã«ã¼ã«ã®è¿½å ãªã©ã®æ´æ°å¦çããã¼ãã«ã«å¯¾ããå¦çã¨ãªãã¾ãããã®ãããipt_do_table()
ãªã©ã®ãã±ããã®éåä¿¡ã«ããã¦çºçããã«ã¼ã«ã®åç
§å¦çã¨ã®éã§ã¯æä»å¶å¾¡ãå¿
è¦ã¨ãªãã¾ããä¾ã«æãã¦ããã«ã¼ã«ã®å ´åããã±ããã®åä¿¡ã«ä¼´ãçºçããfilterãã¼ãã«ã¸ã®åç
§å¦çã¨ãfilterãã¼ãã«ã«ã«ã¼ã«ã追å ããæ´æ°å¦çã¨ãªãã¾ãã
netfilterã§ã¯ããã¼ãã«ã®æ´æ°ã¯ãããã³ã«å
±éã®å¦çã¨ãã¦xt_replace_table()
ã§è¡ããã¦ãã¾ããxt_replace_table()
ã¯ãå¼æ°ã¨ãã¦è¿½å ãããã«ã¼ã«ãç»é²ããã¦ãããã¼ãã«(struct xt_table_info *newinfo
)ã®ãã¤ã³ã¿ãåãåããæ´æ°å¯¾è±¡ã®ãã¼ãã«(table->private
)ãç½®ãæãã¾ãããã®ããæ´æ°åºéã¯ãæ§ãã¼ãã«ãåå¾ãã¦ããæ°ãã¼ãã«ã¸ã®ç½®ãæããå®äºããã¾ã§ã®L.1401ããL.1421ã¾ã§ã¨ãªãã¾ããæ§ãã¼ãã«ã«ã¤ãã¦ã¯ãxt_replace_table()
ã®æ»ãå¤ã¨ãã¦å¼ã³åºãå
ã«ã¦åé¤ããã¾ãã
ãªããç¾å¨ã®å®è£
ã§ã¯ãæ´æ°å¦çãè¡ã£ã¦ããã³ã¢ãé¤ãã¦æ´æ°ä¸ããã±ããã®éåä¿¡ãç¦æ¢ãã¦ããªã(L.1400)ãããæ´æ°ãè¡ããã¦ãããã¼ãã«ã®åç
§ã¯æ´æ°å¦çä¸ãçºçãã¾ããæ´æ°åºéã®ç´å¾ã«è¡ã£ã¦ããå¾
ã¡åãã(L.1430-L.1440)ã¯ãæ´æ°å¾ã«ã¾ã æ´æ°åã®ãã¼ãã«ãåç
§ãã¦ããCPUãããå ´åã«ãæ´æ°åã®ãã¼ãã«ã®åé¤ãåç
§ä¸ã«è¡ããããã¨ãé²ãç®çã§è¡ããã¦ãã¾ããå¾
ã¡åããã¯seqã«ã¦ã³ã¿ãç¨ããnetfilterç¬èªã®ä»çµã¿ã§è¡ã£ã¦ãããCPUãã¨ã®ã«ã¦ã³ã¿å¤æ°(u32 seq
)ãåç
§åºéä¸ï¼xt_write_recseq_begin
~xt_write_recseq_end()
ï¼ã«ããå ´åã«ã¯å¥æ°ãããªãå ´åã«ã¯å¶æ°ã¨ãªã£ã¦ãããã¨ãå©ç¨ãã¦ããã¹ã¦ã®CPUã§åç
§ãçµäºããï¼ã«ã¦ã³ã¿å¤æ°ã®æä¸ä½ãããã0ï¼ã¾ã§å¾
ã¡åãããè¡ã£ã¦ãã¾ãã
1383 struct xt_table_info * 1384 xt_replace_table(struct xt_table *table, 1385 unsigned int num_counters, 1386 struct xt_table_info *newinfo, 1387 int *error) 1388 { 1389 struct xt_table_info *private; ... 1399 /* Do the substitution. */ 1400 local_bh_disable(); 1401 private = table->private; ... 1412 newinfo->initial_entries = private->initial_entries; 1413 /* 1414 * Ensure contents of newinfo are visible before assigning to 1415 * private. 1416 */ 1417 smp_wmb(); 1418 table->private = newinfo; 1419 1420 /* make sure all cpus see new ->private value */ 1421 smp_mb(); 1422 1423 /* 1424 * Even though table entries have now been swapped, other CPU's 1425 * may still be using the old entries... 1426 */ 1427 local_bh_enable(); 1428 1429 /* ... so wait for even xt_recseq on all cpus */ 1430 for_each_possible_cpu(cpu) { 1431 seqcount_t *s = &per_cpu(xt_recseq, cpu); 1432 u32 seq = raw_read_seqcount(s); 1433 1434 if (seq & 1) { 1435 do { 1436 cond_resched(); 1437 cpu_relax(); 1438 } while (seq == raw_read_seqcount(s)); 1439 } 1440 } 1441 1442 audit_log_nfcfg(table->name, table->af, private->number, 1443 !private->number ? AUDIT_XT_OP_REGISTER : 1444 AUDIT_XT_OP_REPLACE, 1445 GFP_KERNEL); 1446 return private; 1447 } 1448 EXPORT_SYMBOL_GPL(xt_replace_table);
以ä¸ã®å³ã§ã¯ãCPU2ã«ãããæ´æ°å¦çåå¾ã§åç §å ã¨ãªããã¼ãã«ã表ããå³ã§ããCPU1ãæ´æ°å¦çéå§åãCPU3ãæ´æ°å¦çéå§å¾ã«ãã¼ãã«ã®åç §ãããããéå§ãã¦ãã¾ãã
3. RWããã¯ã»RCUã®æ¯è¼
æ¬ç« ã§ã¯ã2ç« ã§ç´¹ä»ããnetfilterã«ãããåç §ã»æ´æ°åºéããRWããã¯ã»RCUã§ä¿è·ããå ´åã®å®è£ ä¾ã¨ããããã®ããã¯ã®ç¹å¾´ã«ã¤ãã¦ç¢ºèªãã¾ããnetfilterã§ã¯éå»ã«ããããã®ããã¯ã使ã£ãå®è£ ãããã¦ãã¾ããããããã®ããã¯ãæ¡ç¨ãããã«è³ã£ãçµç·¯ã¯é¢ç½ãã®ã§ããã²èå³ã®ããæ¹ã¯å®éã®ã³ããã 784544739a25 ("netfilter: iptables: lock free counters") ãåç §ãã¦ã¿ã¦ãã ããã
3.1. RWããã¯ã§ä¿è·ããå ´å
2ç« ã§ã確èªããã¨ãããã«ã¼ã«ã¯ipt_do_table()
ã«ããã¦åç
§ãããéãxt_replace_table()
ã«ããã¦æ´æ°ãããéããã¼ãã«åä½ã§ã®å¦çã¨ãªãã¾ããã¾ãããããåç
§ã»æ´æ°ãè¡ãããåºéã«ããã¦ãã¼ãã«ãä¿è·ããããã«ãRWããã¯å¤æ°(struct rwlock_t
)ããã¼ãã«ãã¨ã«ç¢ºä¿ãã¾ããRWããã¯å¤æ°ã®åæåã¯rwlock_init()
ã§è¡ãã¾ãã
xt_register_table()
ã¯ãxt_replace_table()
ã¨åæ§ã«ãnetfilterããããã³ã«å
±éã§ç¨æãã¦ãããã¼ãã«åæåãè¡ãé¢æ°ã¨ãªãã¾ãã
diff --git a/include/linux/netfilter/x_tables.h b/include/linux/netfilter/x_tables.h index 5897f3dba..e93cbd2f4 100644 --- a/include/linux/netfilter/x_tables.h +++ b/include/linux/netfilter/x_tables.h @@ -226,6 +226,8 @@ struct xt_table { /* What hooks you will enter on */ unsigned int valid_hooks; + rwlock_t lock; + /* Man behind the curtain... */ struct xt_table_info *private; diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c index 21624d683..65f8ebe33 100644 --- a/net/netfilter/x_tables.c +++ b/net/netfilter/x_tables.c @@ -1476,6 +1459,8 @@ struct xt_table *xt_register_table(struct net *net, /* Simplifies replace_table code. */ table->private = bootstrap; + rwlock_init(&input_table->lock); + if (!xt_replace_table(table, 0, newinfo, &ret)) goto unlock;
3.1.1. 競ååºéã®ä¿è·
RWããã¯ã®ç¹å¾´ã¨ãã¦ãç¾å¨ã®seqã«ã¦ã³ã¿ãç¨ããæä»å¶å¾¡ã¨åæ§ã«ãåç
§åºéã«ã¤ãã¦ã¯è¤æ°ã®CPUãåæã«å®è¡ã§ããç¹ãæãããã¾ãããã±ããéåä¿¡ã®ããã«é »ç¹ã«è¤æ°ã®CPUã§åæã«è¡ãããå¦çã«ããã¦RWããã¯ãç¨ãããã¨ã§ãé常ã®spinããã¯ã§çºçãããããªãäºãã®åç
§å¦çãå¾
ã¡åããããã¨ã«ãããªã¼ãã¼ãããããªãããã¨ãã§ãã¾ãã
åç
§å¦çï¼read_lock_bh()
~read_unlock_bh()
ï¼ã¨æ´æ°å¦çï¼write_lock_bh()
~write_unlock_bh()
ï¼ã競åããå ´åã«ã¯ãããããã®å®è¡ã¯æä»çã«è¡ããã¾ãããã®ãããæ´æ°å¦çå¾ã«çºçããåç
§å¦çã¯å¸¸ã«ææ°ã®ãã¼ã¿ã«å¯¾ãã¦è¡ããããã¨ãããç¾å¨ã®seqã«ã¦ã³ã¿ãç¨ããå®è£
ã®ããã«å¾
ã¡åãããè¡ãå¿
è¦ã¯ããã¾ããããããããã¤ã§ãçºçãå¾ããã±ããã«å¯¾ãã¦å¸¸ã«ææ°ã®ã«ã¼ã«ãé©ç¨ããã¦ããå¿
è¦ã¯ãªããRWããã¯ãç¨ããå ´åã«çºçããæ´æ°å¦çã«ããåç
§å¦çã®ä¸æã¯ä¸è¦ãªãªã¼ãã¼ãããã¨ãªãã¾ãã
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c index 7da1df499..fe9594933 100644 --- a/net/ipv4/netfilter/ip_tables.c +++ b/net/ipv4/netfilter/ip_tables.c @@ -255,9 +255,8 @@ ipt_do_table(void *priv, acpar.state = state; WARN_ON(!(table->valid_hooks & (1 << hook))); - local_bh_disable(); - addend = xt_write_recseq_begin(); - private = READ_ONCE(table->private); /* Address dependency. */ + read_lock_bh(&table->lock); + private = table->private; cpu = smp_processor_id(); table_base = private->entries; jumpstack = (struct ipt_entry **)private->jumpstack[cpu]; @@ -353,8 +352,7 @@ ipt_do_table(void *priv, } } while (!acpar.hotdrop); - xt_write_recseq_end(addend); - local_bh_enable(); + read_unlock_bh(&table->lock); if (acpar.hotdrop) return NF_DROP; diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c index 21624d683..65f8ebe33 100644 --- a/net/netfilter/x_tables.c +++ b/net/netfilter/x_tables.c @@ -1397,14 +1397,14 @@ xt_replace_table(struct xt_table *table, } /* Do the substitution. */ - local_bh_disable(); + write_lock_bh(&table->lock); private = table->private; /* Check inside lock: is the old number correct? */ if (num_counters != private->number) { pr_debug("num_counters != table->private->number (%u/%u)\n", num_counters, private->number); - local_bh_enable(); + write_unlock_bh(&table->lock); *error = -EAGAIN; return NULL; } @@ -1420,24 +1420,7 @@ xt_replace_table(struct xt_table *table, /* make sure all cpus see new ->private value */ smp_mb(); - /* - * Even though table entries have now been swapped, other CPU's - * may still be using the old entries... - */ - local_bh_enable(); - - /* ... so wait for even xt_recseq on all cpus */ - for_each_possible_cpu(cpu) { - seqcount_t *s = &per_cpu(xt_recseq, cpu); - u32 seq = raw_read_seqcount(s); - - if (seq & 1) { - do { - cond_resched(); - cpu_relax(); - } while (seq == raw_read_seqcount(s)); - } - } + write_unlock_bh(&table->lock); audit_log_nfcfg(table->name, table->af, private->number, !private->number ? AUDIT_XT_OP_REGISTER :
3.2. RCUã§ä¿è·ããå ´å
RCUã§ã¯ã2ç« ã§ç´¹ä»ããnetfilterç¬èªã®seqã«ã¦ã³ã¿ã¨åããå¾
ã¡åããã«ãã£ã¦åç
§ã¨æ´æ°å¦çã®æä»å¶å¾¡ãè¡ããã¾ãããã®ãããRWããã¯ã®ããã«æ´æ°å¦çä¸ã§ãåç
§å¦çã¯ä¸æããã¾ããã
ç¾å¨ã®seqã«ã¦ã³ã¿ãç¨ããå¾
ã¡åããå®è£
ã§ã¯ãã«ã¼ã«ã®æ´æ°å¦çä¸ã«ãã±ããã®éåä¿¡ãè¡ãããå ´åã§ãåç
§å¦çã¯ãããã¯ãããã«æ´æ°åã®ã«ã¼ã«ãåºã«è¡ããããããæ´æ°å¦çã¯æ´æ°åã®ã«ã¼ã«ã®åç
§ããã¹ã¦ã®CPUã§çµäºãããã¨ãå¾
ã¡åããã¦ãããã¨ãç´¹ä»ãã¾ãããåæ§ã®æä»å¶å¾¡ãRCUã§ã¯ãåç
§åºéã®éå§ã¨çµäºãè¨å®ããããã«ç¨ããrcu_read_lock()
/rcu_read_unlock()
ã¨æ´æ°ç¨APIãç¨ãã¦å®ç¾ãã¾ã*1ãæ´æ°ç¨APIã¯ãç¨éã«å¿ãã¦æ§ã
ç¨æããã¦ãã¾ãã
é¢æ°å | 説æ |
---|---|
synchronize_rcu() | ãã®é¢æ°å¼ã³åºãåã«éå§ããããã¹ã¦ã®RCUåç §åºéã®çµäºãå¾ ã¡åãããã |
synchronize_rcu_expedited() | åä¸ãIPIãç¨ãã¦RCUåç
§åºéãçµäºããããããå¾
ã¡åããã®å®äºãsynchronize_rcu() ã«æ¯ã¹ã¦æ©ããªãã |
call_rcu() | æå®ããã³ã¼ã«ããã¯ãRCUåç §åºéã®å¾ ã¡åããå®äºå¾ã«å¼ã³åºãã |
kfree_rcu() | æå®ãããªãã¸ã§ã¯ããRCUåç §åºéã®å¾ ã¡åããå®äºå¾ã«åé¤ããã |
以ä¸ã®å®è£
ã®å ´åã«ã¯ãsynchronize_rcu()
ãç¨ãããã¨ã§ãç¾å¨ã®seqã«ã¦ã³ã¿ãå©ç¨ããå¾
ã¡åããã¨åæ§ã®å¾
ã¡åãããå®ç¾ãã¦ãã¾ãã
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c index 7da1df499..b0394a6cf 100644 --- a/net/ipv4/netfilter/ip_tables.c +++ b/net/ipv4/netfilter/ip_tables.c @@ -255,9 +255,8 @@ ipt_do_table(void *priv, acpar.state = state; WARN_ON(!(table->valid_hooks & (1 << hook))); - local_bh_disable(); - addend = xt_write_recseq_begin(); - private = READ_ONCE(table->private); /* Address dependency. */ + rcu_read_lock_bh(); + private = rcu_dereference(table->private); cpu = smp_processor_id(); table_base = private->entries; jumpstack = (struct ipt_entry **)private->jumpstack[cpu]; @@ -353,8 +352,7 @@ ipt_do_table(void *priv, } } while (!acpar.hotdrop); - xt_write_recseq_end(addend); - local_bh_enable(); + rcu_read_unlock_bh(); if (acpar.hotdrop) return NF_DROP; diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c index 21624d683..ea3baf5c6 100644 --- a/net/netfilter/x_tables.c +++ b/net/netfilter/x_tables.c @@ -1397,47 +1397,23 @@ xt_replace_table(struct xt_table *table, } /* Do the substitution. */ - local_bh_disable(); + mutex_lock(&table->lock); private = table->private; /* Check inside lock: is the old number correct? */ if (num_counters != private->number) { pr_debug("num_counters != table->private->number (%u/%u)\n", num_counters, private->number); - local_bh_enable(); + mutex_unlock(&table->lock); *error = -EAGAIN; return NULL; } newinfo->initial_entries = private->initial_entries; - /* - * Ensure contents of newinfo are visible before assigning to - * private. - */ - smp_wmb(); - table->private = newinfo; - - /* make sure all cpus see new ->private value */ - smp_mb(); - - /* - * Even though table entries have now been swapped, other CPU's - * may still be using the old entries... - */ - local_bh_enable(); + rcu_assign_pointer(table->private, newinfo); + mutex_unlock(&table->lock); - /* ... so wait for even xt_recseq on all cpus */ - for_each_possible_cpu(cpu) { - seqcount_t *s = &per_cpu(xt_recseq, cpu); - u32 seq = raw_read_seqcount(s); - - if (seq & 1) { - do { - cond_resched(); - cpu_relax(); - } while (seq == raw_read_seqcount(s)); - } - } + synchronize_rcu(); audit_log_nfcfg(table->name, table->af, private->number, !private->number ? AUDIT_XT_OP_REGISTER :
ãªãç¾å¨ã®netfilterã§ã¯ãRCUã¯ä½¿ããã«ããã¦ç¬èªã®å¾ ã¡åããå®è£ ãè¡ã£ã¦ãã¾ããå®éã«ãç¾å¨ã®seqã«ã¦ã³ã¿ãç¨ããnetfilterç¬èªã®å®è£ ã«è³ãã¾ã§ã«ãããã¾ã§RCUã«ããå®è£ ã2度åãè¾¼ã¾ãã¾ãããããããåãè¾¼ã¾ãããããã®RCUã«ããå®è£ ããæ´æ°å¦çã«ãããæéãå¤§å¹ ã«å¢ãããã¨ãåé¡ã¨ãªã*2ããã®å¾ãªãã¼ãããã¦ãã¾ããåå ã¯RCUã®å¾ ã¡åããã«ä¼´ããªã¼ãã¼ãããã§ãããç¾å¨ã®netfilterãç¬èªã®seqã«ã¦ã³ã¿ã«ããå¾ ã¡åãããå®è£ ãã¦ããèæ¯ã¨ãã¦ããã®æ´æ°å¦çã«ããããªã¼ãã¼ããããæ¹åããããã¨ããç¹ãæãããã¾ãã
3.3. æ´æ°å¦çã«ããããªã¼ãã¼ãããï¼ä½è«ï¼
æå¾ã«ãå è¿°ããRCUã使ç¨ããå ´åã«çºçããæ´æ°å¦çã®ãªã¼ãã¼ããããã©ã®ç¨åº¦ã«ãªãã®ãã«ã¤ãã¦å°ã触ãããã¨æãã¾ããã³ããã 942e4a2bd680 ("netfilter: revised locking for x_tables")ã®ããåãã«ã¯ã200ã«ã¼ã«è¿½å ããã¨ããå¦çãRWããã¯ï¼v2.6.29ï¼ã§ã¯0.2ç§ã§çµãã¦ããã«ãé¢ããããRCUã®å®è£ å¾ï¼v2.6.30-rc1ï¼ã«ã¯6ç§ãããããã«ãªã£ãã¨ããã¾ã*3ã
Adding 200 records in iptables took 6.0sec in 2.6.30-rc1 compared to 0.2sec in 2.6.29.
以ä¸ã¯ç¾å¨ã®netfilterï¼v6.11ã«ã¼ãã«ï¼ã®æ´æ°å¦çããRCU(synchronize_rcu()
)ã§ä¿è·ãã¦ã¿ãå ´åã«ããã£ãæé(ms)ã§ããsynchronize_rcu_expedited()
ã使ç¨ããå ´åã¯åèã¾ã§ã«è¼ãã¦ã¿ã¾ããã
ç¾å¨ã®å®è£ | synchronize_rcu() | synchronize_rcu_expedited() |
---|---|---|
1109 | 5073 | 1104 |
å¤ã¯200ã®ã«ã¼ã«ã追å ãã以ä¸ã®å¦çã«ããã£ãæéã¨ãªãã¾ã*4ãæ¤è¨¼ã«ã¯ã©ãºãã¤4ï¼4ã³ã¢ï¼ã使ç¨ãã¾ããã
for i in {1..200} do iptables -A INPUT -s 192.168.105.$i -j DROP done
RCUã¨ç¾å¨ã®å®è£
ãæ¯è¼ãã¦ã¿ãã¨ãæ´æ°å¦çã«å¤§ããæéãããã£ã¦ãããã¨ããããã¾ãããã®é
延ã®åå ã¯ãsynchronize_rcu_expedited()
ã«ã¤ãã¦ã¯ç¾å¨ã®å®è£
ã¨åæ§ã®çµæã§ãããã¨ããããå¾
ã¡åããã§ãããã¨ããããã¾ãã
ãã®RCUã«ãããæ´æ°å¦çã®é
延ã¯netfilterãç¬èªã®æä»å¶å¾¡ãå®è£
ãããã£ããã¨ãªãã¾ããããRCUãæ¡ç¨ããå
ã
ã®ç®çã¯åç
§å¦çã®æ§è½åä¸ã§ãããç¾å¨ã®å®è£
ãè¦ãã¨ãããéããã·ã¼ã±ã³ã¹ã«ã¦ã³ã¿ã¯CPUãã¨ã«ç¨æãããå®è£
ã¨ãªã£ã¦ãã¾ããããã¯ãã°ãã¼ãã«ãªããã¯ã使ç¨ããRWããã¯ã®å ´åã«ããã¦çºçããåç
§å¦çãã¨ã®ãã£ãã·ã¥ã®ãã©ãã·ã¥ãåé¿ããç®çãããã¾ããå½åãã®åç
§å¦çã®éã®ãã£ãã·ã¥ã®åé¡ãRCUã使ç¨ãããã¨ã§å
æãããã¨ã試ã¿ããã¾ããããä¸è¨ã®çµæã®éãRCUã«ã¯æ´æ°å¦çã«å¤§ããæéãããã£ã¦ãã¾ãã¨ããç¹ãåé¡ã«ãªãã¾ããã
ç¾å¨ã®seqã«ã¦ã³ã¿ãç¨ããç¬èªã®å®è£
ã¯ããã®ãããªåç
§å¦çã¨æ´æ°å¦ç両æ¹ã®æ§è½ãèæ
®ããã¦è³ã£ãå®è£
ã§ããRCUã«ããå®è£
以é*5ã®netfilterç¬èªã®æä»å¶å¾¡ã«ã¯ãç¾å¨ã«è³ãã¾ã§ã«æ§ã
ãªæ¹è¯ã»ä¿®æ£ãå ãããã¦ãã¾ããnetfilterã®æä»å¶å¾¡ã«ã¤ãã¦èª¿ã¹ãéã«ã¯ããã²éå»ã«è¡ããã¦ããä¿®æ£ã¨ãã®èæ¯ã«ã注ç®ãã¦ã¿ã¦ãã ããã
*1:RCUãå¾ ã¡åãããå®ç¾ããä»çµã¿èªä½ã®è§£èª¬ã¯å¥ã®è¨äºã§ãããã¨æãã¾ãã
*2:https://lore.kernel.org/all/[email protected]/
*3:https://lore.kernel.org/all/[email protected]/
*4:https://lore.kernel.org/all/[email protected]/
*5:https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=942e4a2bd680c606af0211e64eb216be2e19bf61