KSMå é¨è§£æ
æ¦è¦
KSM(Kernel SamePage Merging)ã¨ã¯ãã¦ã¼ã¶ããã»ã¹ã®ã¡ã¢ãªé åãèµ°æ»ãã¦ãåä¸å 容ã®ãã¼ã¸ã1ã¤ã®ãã¼ã¸ã«ãã¼ã¸ãããã¨ã§ä½¿ç¨ã¡ã¢ãªéãåæ¸ããLinuxã«ã¼ãã«ã®æ©è½ã§ãããã¼ã¸ããããã¼ã¸ã¯CoW(Copy on Write)ç¶æ ã«è¨å®ããããã¼ã¸ã¸ã®æ¸ãè¾¼ã¿ãçºçããã¨åã³åå¥ã®ãã¼ã¸ã«åè£ããã¾ãã
KSMã¯ãã¼ã¸ã§ã³2.6.32ã§ãã¼ã¸ããã主ã«åä¸å 容ã®ãã¼ã¸ãéè¤ãã¦æã¤å¯è½æ§ãæ¯è¼çå¤ãVMå©ç¨æã®ä½¿ç¨ã¡ã¢ãªåæ¸ãæå¾ ããã¦ãã¾ãã
ãã®è¨äºã¯KSMã®å é¨æ§é ã¨åä½ã解æãããã®ã§ãã
å¿ãã人ã®ããã®è§£ææ¦ç¥
- ã¦ã¼ã¶ã¤ã³ã¿ãã§ã¼ã¹ã¯madvise(ã®æ°ãããªãã·ã§ã³MADV_MERGEABLEã¨MADV_UNMERGEABLE)ã¨/sys/kernel/mm/ksm/以ä¸ã®sysfsãã¡ã¤ã«
- ã¦ã¼ã¶ããã»ã¹ãmadviseã§ãã¼ã¸ãã¦ãè¯ããã¼ã¸ãæå®
- 管çè ãsysfsçµç±ã§KSMã®åä½ãã©ã¡ã¿ãå¤æ´
- KSMã®ä»äºã¯åºæ¬çã«ãã¼ã¸ããã¼ã¸ãããã¨ããã¾ã§ã§ãCoWã®åä½ã¯Linuxã®åå¨ããæ©è½ããã®ã¾ã¾ä½¿ãã
- UNMERGEããã¨ããå½è©²ãã¼ã¸ã«æ¬ä¼¼çã«æ¸ãè¾¼ã¿ãèµ·ããããã«è¦ããããã ã
- èµ°æ»å¯¾è±¡ã¡ã¢ãªã¯Anonymousã¡ã¢ãªã®ã¿(ã»ãã¨ãã©ã®å ´åãã¼ãé åã®ã¯ã)
- File Cacheã¯å¯¾è±¡ã«ãªããªã
- ç¾å¨ã®å®è£
ã§ã¯ãã¼ã¸ããããã¼ã¸ã¯Swap outãããªããªã(å¶é)
- ããã解æ¶ãããäºå®
- ãã¼ã¸ãèµ°æ»ããã®ã¯ksmdã¨ããã«ã¼ãã«ã¹ã¬ãã
- æå®ééã§æå®ãã¼ã¸æ°ã ããã¼ã¸èµ°æ»ãè¡ãªããè¦ã¤ãã£ãåä¸å 容ã®ãã¼ã¸ããã¼ã¸ãã¦ãã
- ãã¼ã¸å
容ã®æ¯è¼(åå®)ã«ã¯memcmp(ã¤ã¾ã2ã¤ã®ãã¼ã¸å
¨ä½ãèãã)ãç¨ãã
- ããã·ã¥å¤ã¯ç¨ããªã
- ããã·ã¥å¤ã¯ãã¼ã¸å 容ã«å¤æ´ããã£ããå¦ãã®å¤å®ã«ã®ã¿ç¨ãããã
- åä¸å
容ãã¼ã¸ã®æ¢ç´¢/æ¿å
¥/åé¤ã¯èµ¤é»æ¨ãç¨ãããã¨ã§é«éåãã¦ãã
- VMwareã®ç¹è¨±ãé¿ããããã®è¦èã®çã¨æããã
KSMã®ä½¿ãæ¹
ãã®è¨äºã§ã¯ãKSMã¯ã©ãåãã¦ãããã«çç®ããã®ã§ãKSMã®ã»ããã¢ããã®æ¹æ³çã«ã¯è§¦ãã¾ããããã®è¾ºãã¯æ å ±ã¯ä»¥ä¸ã®ãã¼ã¸ãåç §ãã¦ãã ããã
注æ
- æ¬è¨äºã¯mmotmããªã¼ã«ãã¼ã¸ãããé ã®KSMã解æãããã®ãªã®ã§ãLinusããªã¼ã«ãã¼ã¸ããã¦ããã®å¤æ´ã«ã¯è¿½å¾ã§ãã¦ããªãç®æãããã¾ããåä½ã®å¤§çã¯å¤ãã£ã¦ããªãããã§ãããã注æãã ãã
- ã¾ãOOMã«é¢ä¿ããä¾å¤å¦çã«ã¤ãã¦ã¯è§¦ãã¦ãã¾ãããã¨ã¯ããåä½ã®å¤§çã«å½±é¿ã¯ãªãã¨æãã¾ã
- å 容ãéµåã¿ããªãã§ãã ãããç§ã®Linuxã¡ã¢ãªç®¡çåãã®ç解ãæ·±ããªãã®ã§ãééã£ã¦ããé¨åãããããããã¾ãã
- èªã¿ã«ããã§ã
- å³ããªã
- æ¸ããªãã£ãã¡ã¢ã¨å¤§å·®ãªã
- æéãããã°ç´ãã¾ãããã¿ã¾ããããã
解æï¼ãã¼ã¿æ§é
åè
- mmã®ã³ã¼ããå¤æ´ããªãæ¹é
- ãã¨ãã¨kernel moduleã¨ãã¦å®è£ ãããã¨ãæå³ãã¦ãã
- ç¾æç¹ã§ã¯ã«ã¼ãã«çµã¿è¾¼ã¿æ©è½
- PageKSM
- PageAnonã®ä»²é
- PageAnonãã¤anon_vma(ç¹å®ã®ä»®æ³ã¢ãã¬ã¹ç©ºéã®ã¢ãã¬ã¹é åãªã¹ã) == NULL
- VM_MERGEABLE
- vm_area_structã®ãã©ã°
- MMF_VM_MERGEABLE
- mm_structã®ãã©ã°
Stable & unstable trees
- èµ°æ»ãããã¼ã¸ãæ ¼ç´ãããã¼ã¿æ§é
- 赤é»æ¨(Red-Black tree)
- ãã¼ã¸ã§ãããã¼ã¸ãstableã«ãã¼ã¸ã§ããªãã£ã(ãã©å
容ã«å¤æ´ããªãã£ã)ãã¼ã¸ãunstalbe treeã«å
¥ãã
- å 容ã«å¤æ´ããã£ããã¼ã¸ã¯2ã¤ã®æ¨ã®ä¸ã«ã¯å ¥ããªã
- stable treeã®è¦ç´ (node == rmap_item)ã¯ãªã¹ãã«ãªã£ã¦ãã
- ãã¼ã¸ããããã¼ã¸ã¯è¤æ°ã®rmap_itemããåç §ããã¦ãã
- unstable treeã®è¦ç´ ãrmap_item
- ãã ããªã¹ãã«ã¯ãªã£ã¦ããªã
mm_slot
- 1 mm_struct (âããã»ã¹)ã«ã¤ã1ã¤ç¨æãããï¼
- ksmã¯mm_slotã®ãªã¹ããä¿æããããããvm_area_struct, page tableã¨ãã©ã£ã¦pageãåå¾ãã
- @link: hash list?
- @mm_list: ksm_mm_headç¨
- @rmap_list: ãã®mm_slotã®rmap_itemãªã¹ã
- @mm: å½è©²mm_struct
- ãªãmadviseã§mm_structã¨æå®ãããé åããã¤vm_area_structã«MERGEABLEãã©ã°ãè¨å®ããã¦ãã
vm_area_structåä½ãªã®ã§ããããå°ããé åãæå®ããã¨å¤å°ç©ºåãããï¼addressã使ã£ã¦ç¡é§ãã¯ã¶ãã¦ãã
rmap_item
- reverse mapping item for virtual address
- @link: mm_slotç¨
- rmap_itemã¯æ示çã«removeãããªãéãåºæ¬çã«çæããããmm_slotã«ç¹ãããã¾ã¾
- æãã¦ãããã¼ã¸ããã¼ã¸ããã¦ãrmap_itemã¯æ®ã
- ä¾å¤ï¼ãã ãèµ°æ»ä¸ã«å¤ãªã¢ãã¬ã¹ãæãrmap_itemããã£ãå ´åã¯remove&freeãã
- @address
- low bitã¯flag
- SENR_MASK:
- NODE_FLAG
- STABLE_FLAG
- low bitã¯flag
- 1ã¤ã«ã¤ã1ãã¼ã¸ãæãã¦ãã
- ç´æ¥pageãæããªã
- pageãå¾ãã¨ãã¯@mm + @addressã使ã£ã¦æ¯åPTããã©ã
- ç¹ããã¦ããå ´æã«ãã£ã¦ä¿æãã¼ã¿ãéã
- stable tree: @prev, @next, @node: 次ã®rmap
- ä½ã«ä½¿ã£ã¦ããã ãï¼
- unstable tree: @oldchecksum, @node
- @nodeããããã
- stable tree: @prev, @next, @node: 次ã®rmap
- @oldchecksum: å 容ãå¤æ´ããããå¦ãã®å¤å®ã®ã¿ã«å©ç¨ããã
ã°ãã¼ãã«å¤æ°
- ksm_mm_head
- mm_slotãªã¹ã
- ksm_scan
- cursor for scanning
- ãã¼ã¸èµ°æ»ã«ã¼ã½ã«
- ç¾å¨èµ°æ»ä¸ã®ãã¼ã¸ãæãã¦ãã
- @mm_slot + @address + @rmap_item
- seqnrã£ã¦ã©ã使ããã¦ããã®ï¼
- root_stable_tree
- root_unstable_tree
- rmap_item_cacheï¼
- mm_slot_cacheï¼
- mm_slots_hash
- mm_slotã®ãªã¹ããè¦ç´ ã¨ããããã·ã¥
- mm_slotãåé¤ããã¨ãã ãã«ä½¿ããã¦ãã
- é常ã®mm_slotã¢ã¯ã»ã¹ã¯mm_slot_headãããªã¹ãããã©ã
- ããã·ã¥ãµã¤ãºã¯1,024
- ã¹ã±ã¼ã«ããªãæ°ãããã
- ã¾ãåé¤ããã¨ãã ããªãã¹ã±ã¼ã«ããå¿ è¦ã¯ãªãã
çµ±è¨å¤
- ksm_pages_shared
- stable treeå ã®nodeæ°(!=rmap_itemæ°)
- ksm_page_sharing
- stable treeå ã®énodeæ°
- ksm_pages_unshared
- unstable treeå ã®nodeæ°(=rmap_itemæ°)
- ksm_rmap_items
- allocãããrmap_itemã®ç·æ°(ã®ã¹æ°ã§ã¯ãªã)
- ksm_pages_volatile
- ksm_rmap_items - ksm_pages_shared - ksm_pages_sharing - ksm_pages_unshared
- ããèªä½ã¯ã«ã¦ã³ãã¢ãã/ãã¦ã³ãããªã
ã³ã³ãã£ã°
- ksm_max_kernel_pages (default = 2000)
- ksm_pages_sharedãã¦ãè¯ãæ大ãã¼ã¸æ°
- ksm_thread_pages_to_scan (default = 200)
- ksm_thread_sleep_millisecs (default = 20)
- ksm_run (default = 1)
- KSM_RUN_STOP=0
- KSM_RUN_MERGE=1
- KSM_RUN_UNMERGE=2 (å¼·å¶unmerge)
解æï¼åä½(API)
madvise
- madvise(MADV_MERGEABLE) -> ... -> madvise_behavior() -> ksm_madvise() -> __ksm_enter()
- madvise(MADV_UNMERGEABLE) -> ... -> ksm_madvise()
sysfs
- åã
ã®ã°ãã¼ãã«å¤æ°ãå¤æ´ããã
- run = 1: wake_up_interruptible()
- run = 2: unmerge_and_remove_all_rmap_items()
- mm_slot_headããã©ã£ã¦ã²ãããKsmPageãunmerge & rmapãfree
- mm_slotèªä½ã¯æ®ãã¦ãã
- 使ç¨ã¡ã¢ãªã大éã«å¢ããã¨æãããã®ã§ãããªãæ¹ããããããªããã
__ksm_exit() (ksm_exit())
static inline void ksm_exit(struct mm_struct *mm, struct mmu_gather **tlbp, unsigned long end) { if (test_bit(MMF_VM_MERGEABLE, &mm->flags)) __ksm_exit(mm, tlbp, end); }
- å¼æ°
- struct mm_struct *mm
- struct mmu_gather **tlbp
- unsigned long end
- åä½
- OOM deadlockãèæ ®ãã¦ã¡ãã£ã¨ããªããã¼ã«ãªã£ã¦ãã
- mmããmm_slotãå¾ã
- mm_slotãrmap_itemãä¸ã¤ãæã£ã¦ããªããã°free_mm_slot()
- mm_structã®MERGEABLEãã©ã°ãè½ã¨ã
- ããã§ãªããã°ã«ã¼ã½ã«ã®ä½ç½®ãå½è©²mm_slotã«ãã¦ãã¾ããªãããã¦çµäº
- mm_slotã解æ¾ãããã¿ã¤ãã³ã°
- unmerge_and_remove_all_rmap_items()
- echo 2 > /sys/kernel/mm/ksm/run
- scan_get_next_rmap_item()
- mm_slotãªã¹ããèµ°æ»ä¸
- unmerge_and_remove_all_rmap_items()
解æï¼åä½(ksmd)
main loop
while (scan_pages--) { page, rmap_item = scan_get_next_rmap_item() if (PageKSM(page) && in_stable_tree(rmap_item) continue cmp_and_merge_page(page, rmap_item) }
scan_get_next_rmap_item()
- mm_slotããªããã°çµäº(ã¹ã¬ãããä¼ç )
- ã«ã¼ã½ã«ãmm_slot_headã ã£ãã == mm_slotãªã¹ããå
¨ã¦èµ°æ»ãã
- æåããèµ°æ»ããããã®åæå
- ã«ã¼ã½ã«ãæã(mm+address)vm_area_structãèµ°æ»
- find_vma(mm, ksm_scan.address)
- addressã®ä½ç½®ã®ãã¼ã¸ããä¸ã¤ãã¤åãåºã
- åãåºãã¨ãã¯(*)ã®ãããªæã
- ã«ã¼ã½ã«ã¯ã©ã®ã¿ã¤ãã³ã°ããã§ãã«ã¼ããæãã¦ãå度ã«ã¼ãã®éä¸ããèµ°æ»ãç¶è¡ããããã®ãã®
- PageAnonãããã°ã該å½pageã¨ãã®rmap_itemãæã£ã¦return
- rmap_itemã¯get_next_rmap_item()ã§åå¾ãã
- æ£å¸¸ãã¹ã¯ããã§çµäº
- mm_slotã«VM_MERGEABLEãªvm_area_structããªãå ´å
- mm_slotãmm_slotãªã¹ãããå¤ã
- 次ã®mm_slotãèµ°æ»
- mm_slotããªããã°çµäº
- å½è©²vm_area_structãªã¹ãã«ãã以ä¸VM_MERGEABLEããªãã£ã
- 次ã®mm_slotãèµ°æ»
- ksm_scan.seqnr++
(*)
foreach (mm_slot in mm_slots where mm_slot->mm_struct is MERGEABLE) {
foreach (vm_area_struct in vm_area_structs of mm_slot->mm_struct where vm_area_struct is MERGEABLE) {
foreach (page in pages of vm_area_struct) { // follow PTs
do_something(page);
}
}
}
get_next_rmap_item()
- å¼æ°
- mm_slot
- cur
- addr
- åä½
- mm_slotã®rmap_itemãªã¹ããcurã®ä½ç½®ãããã©ã£ã¦addrã«å¯¾å¿ããrmap_itemãå¾ã
- rmap_item.address == addrãªãã°ãããè¿ã
- ãã®ã¨ãunstable treeã«ç¹ãã£ã¦ããã°åãå¤ã
- ãªããã°alloc_rmap_item()
- ä¾å¤ï¼rmap_item->address <= addrã ã£ããfree_rmap_item()
- ã©ãããå ´åã«ãããªãï¼
cmp_and_merge_page()
- rmap_itemãstable treeå
ã«ãã£ããæ¨ããå¤ã
- stable & !KsmPageã®å ´å == CoWãå£ãã¦ãå ´å
- Stable treeãèµ°æ»(stable_tree_search)
- åä¸ãã¼ã¸ããã£ãå ´å
- pages_sharing++
- stable treeã«ç¹ãã
- åä¸å
容(&éããã¼ã¸)ã®ãã¼ã¸ããã£ãå ´å
- try_to_merge_with_ksm_page()
- *stable treeã«ãããã¼ã¸ã¨ãã¼ã¸
- æåãããstable_tree_append()
- otherwise
- fall through
- åä¸ãã¼ã¸ããã£ãå ´å
- ãã¼ã¸å
容ã®ãã§ãã¯ãµã ãè¨ç®ãã
- ãã¼ã¸ãµã¤ãºã®1/4ã ã使ã
- ãã§ãã¯ãµã ãå¤ãã£ã¦ãã(== ãã¼ã¸å 容ãå¤åãã)å ´åreturn
- ã¤ã¾ãunstable treeã«ãå ¥ããªã
- Unstable treeãèµ°æ»&æ¿å
¥(unstable_tree_search_insert)
- åä¸å
容ã®ãã¼ã¸ãããã°ããã®rmap_itemãåãåºãæ¿å
¥ã¯ããªã
- ãªããã°å½è©²rmap_itemãæ¿å ¥ãçµäº
- åãåºããrmap_itemãtree_rmap_itemã¨å¼ã¶
- äºã¤ã®ãã¼ã¸ããã¼ã¸ãããã¨ãã
- æåã®å ´å
- ksm_pages_unshared--
- stable treeã«tree_rmap_itemãå ¥ãã(stable_tree_insert)
- *失æãããã2ã¤ã®ãã¼ã¸ã®CoWãå£ã
- æåï¼
- *stable_tree_insert()å ã§ksm_pages_shared++
- *stable_tree_append()ã§tree_rmap_itemã«rmap_itemãç¹ãã
- åä¸å
容ã®ãã¼ã¸ãããã°ããã®rmap_itemãåãåºãæ¿å
¥ã¯ããªã
stable_tree_append()
- å¼æ°
- rmap_item: appendãããrmap_item
- tree_rmap_item: treeã«ããrmap_itemãNODE_FLAGãç«ã£ã¦ãã(ã¯ã)
- åä½
- tree_rmap_itemã®çå¾ãã«rmap_itemãç¹ãã
- rmap_itemã«STABLE_FLAGãç«ã¦ã
- ksm_pages_sharing++
- åè
- appendãããrmap_itemã¯rbtreeã®ä¸é¨ã«'''ãªããªã'''
try_to_merge_two_pages()
- å¼æ°
- ãã¼ã¸1: page1, mm1, address1
- ãã¼ã¸2: page2, mm2, address2
- åä½
- if (ksm_max_kernel_pages <= ksm_pages_shared) return
- kpage = alloc_page(GFP_HIGHUSER)
- page1ã®vm_area_structãåã£ã¦ãã => vma
- copy_user_highpage(kpage, page1, addr1, vma)
- try_to_merge_one_page(vma, page1, kpage)
- write_protect_page()
- ãã¼ã¸å 容ãåãããã§ãã¯ããã®ã¡replace_page()
- *mm1, address1ãããã©ããpteãkpageã®ãã®ã«ç½®ãæãã
- *å½è©²pteã«å¯¾ãã¦mmu_notifier->change_pte()ãå¼ã¶
- try_to_merge_with_ksm_page(mm2, addr2, page2, kpage)
- 失æãããbreak_cow(mm1, addr1)
get_ksm_page()
- stable_tree_search(), stable_tree_insert()ããå¼ã°ãã
- treeã«ããrmap_itemããpageãå¾ã
- ãã®ã¨ãPTããã©ã
- 以ä¸ã®å ´åNULLã帰ã£ã¦ãã
- rmap_itemãæãã¦ããpageãPageKsm()ã§ãªã
- ã¤ã¾ããã®pageãunmergeãã¦åãªãPageAnon()ã«ãªã£ã¦ãã
- rmap_itemãæå±ããvm_area_structãMERGEABLEã§ãªãã£ã
- rmap_itemãæãã¦ããpageãPageKsm()ã§ãªã