Asynchronous page fault解æ
ã¯ããã«
Linux-2.6.38ã§ãã¼ã¸ãããAsynchronous page fault(éåæãã¼ã¸ãã©ã¼ã«ãã以éAPFã¨ç¥ã)ã«ã¤ãã¦èª¿ã¹ã¦ã¿ã¾ããã
(追è¨ï¼ã²ã¹ãã®ããã»ã¹åãæ¿ãå¯è½å¤å®ã®ééããä¿®æ£ãã¾ããã)
Asynchronous page faultã¨ã¯ï¼
è¿ä»£çãªOSã§ã¯ããã¼ã¸ãã©ã¼ã«ãæã«ãã£ã¹ã¯ã¢ã¯ã»ã¹(I/O)ãå¿ è¦ã«ãªãå ´åã¯ãå¥ã®ããã»ã¹ã«CPUãå²ãå½ã¦ããã¨ã§ãI/Oå¾ ã¡ã®éãCPUãæå¹å©ç¨ãããã¨ãã¾ããKVMã²ã¹ãã®å ´åãåãã§ãã²ã¹ãã«ã¼ãã«ã(çµæçã«)ãã¹ãã«I/Oè¦æ±ãåºããå¾ãå¥ã®ããã»ã¹ããã£ã¹ããããã¾ãããããå¯è½ãªã®ã¯ãã²ã¹ãã«ã¼ãã«èªèº«ãI/Oãçºè¡ããããã§ãã
ããããªãããä»®æ³ç°å¢ã§ã¯ã²ã¹ãã«ã¼ãã«ãç¥ããªãã¨ããã§(ä»®æ³ã§ãªã)I/Oãçºçãããã¨ãããã¾ãããã®ä¸ã¤ããã²ã¹ãã¡ã¢ãªããã¹ãã§ã¹ã¯ããã¢ã¦ãããã¦ããå ´åã§ãããã¹ãã§ã¹ã¯ããã¢ã¦ãããã¦ãããã¼ã¸ã«ã²ã¹ããã¢ã¯ã»ã¹ããã¨ãã«ã¯ãã²ã¹ãã«ã¯ééãªç¶æ ã§ãã¹ãã§I/Oãçºçãã¾ãããã®å ´åãCPUã¯å½è©²ã²ã¹ã(VCPU)ãã奪ãããå¥ã®VCPUããã¹ãã¦ã¼ã¶ããã»ã¹ã«å²ãå½ã¦ããã¾ããã§ãããã²ã¹ãã®ä¸ã«ã¯CPUå²ãå½ã¦å¾ ã¡ã®å¥ã®ããã»ã¹ãåå¨ããããããããã²ã¹ãããè¦ãã¨ä¸å½ã«CPUãå²ãå½ã¦ãããªãç¶æ ã«ãªãããããã¾ãã(ãã¡ããå½è©²VCPUã®ã¿ã¤ã ã¹ã©ã¤ã¹ãæ®ã£ã¦ãããã¨ãåæ)ã
KVMã«å®è£ ãããAPFã¨ã¯ããã®ä¸å ¬å¹³ã解æ¶ãããã®ã§ããã²ã¹ããç¥ããªããã¹ãã®I/Oãå¿ è¦ã«ãªãå ´åã«ãã²ã¹ãã«å¦çãæ»ããã¨ã§ãã²ã¹ãã«ã¼ãã«ãå¥ããã»ã¹ããã£ã¹ãããããæ©ä¼ãä¸ãã¾ãããããã£ã¹ãããå¯è½ã§ããã°ãå½è©²ã²ã¹ãã¯CPUãç¶ç¶çã«å©ç¨ãããã¨ãã§ãã¾ããããã«ãããå½è©²ã²ã¹ãã®ã¹ã«ã¼ãããã¯ä¸ããã¯ãã§ã*1ã
ãã詳ããã¯ãKVM forum 2010ã§ã®Gleb Natapovæ°ã®発表資料ãåç §ãã ãããã¨ã¦ã解ããããã§ãã
äºåç¥è
- CPUID
- MSR
- CPUåºæã®æ©è½ã®ããã«ç¨æãããã¬ã¸ã¹ã¿
- ä»®æ³åç°å¢ã§ã¯ãã¤ãã¼ãã¤ã¶ãæè»ã«è¨å®å¯è½ãã¾ãæ¸ãè¾¼ã¿æã«VM exitãèµ·ãããã¨ãå¯è½
- APFã§ã¯ã²ã¹ããããã¹ãã¸æ å ±ãéãã¨ãã«ä½¿ç¨ããã(å¾è¿°)
åä½æ¦è¦
- åæå
- ãã¹ãã»ã²ã¹ãã§ãäºãã«æ©è½ã®æç¡ã調ã¹ã
- ãã¼ã¸ãã©ã¼ã«ã
- ãã¹ãã§ã¹ã¯ããã¢ã¦ãããããã¼ã¸ã«ã²ã¹ããã¢ã¯ã»ã¹ããã¨VM exitãèµ·ãã
- ãã¼ã¸ã®æºå(ã¹ã¯ããã¤ã³)ãworkqueueã«ä»»ãã¦ãã²ã¹ãã«å¦çãæ»ã
- ã²ã¹ãã¯å½è©²ããã»ã¹ãå¾ ã¡ç¶æ ã«ãã¦å¥ã®ããã»ã¹ããã£ã¹ããããã
- I/Oçºè¡ã¨I/Oå®äº
- workqueueã¯æ®éã«ãã¼ã¸ã¢ã¯ã»ã¹ãã¦ã¹ã¯ããã¤ã³ãèµ·ãããã
- ãã¼ã¸ã®æºåãå®äºãããããã®æ¨ãKVMæ¬ä½ã«ä¼ãã(å®äºãã¥ã¼ã«ã¤ãªã)
- KVMæ¬ä½ã¯ã¡ã¤ã³ã«ã¼ãã§å®æçã«ãã§ãã¯ãããã¨ã§æºåãå®äºãããã¨ãç¥ã
- ã²ã¹ãã¸ã®éç¥ã¨ãã¼ã¸ãã©ã¼ã«ã(å)
- ãã¹ãã¯ãã¼ã¸ãæºåå®äºããæ¨ããã¼ã¸ãã©ã¼ã«ããèµ·ãããã¨ã§ã²ã¹ãã«ç¥ããã
- ã²ã¹ãã¯å¾ ã¡ç¶æ ã®ããã»ã¹ãèµ·ãã
大éæã«ã¯ãã®ãããªåä½ããã¾ãããå®éã«ã¯ãã¼ã¸ãã©ã¼ã«ããèµ·ããç¶æ ã«ãã£ã¦åä½ãå¤ãã£ã¦ãã¾ãã以ä¸ã§ã¯ããã®è¾ºããå«ãã詳細ã説æãã¦ãã¾ãã
åæå
APFã¯æºä»®æ³åæ©è½ã§ãããã¹ãã¯ã²ã¹ããAPFæ©è½ãæã£ã¦ãããã¨ã確èªãã¦ããåæåãè¡ãªãã¾ãã
åä½æ¦è¦ã¯ä»¥ä¸ã®éãã
- ãã¹ã
- qemuãKVMã«CPUIDãè¨å®ãã(ioctl)ããã®ã¨ãAPFãæå¹ã§ãããã¨ãå«ãã¦ãã
- KVMã¯ãã®CPUIDãä»®æ³ãã·ã³(VMCS)ã«è¨å®ãã
- ã²ã¹ã
- ã«ã¼ãã«ã¯ãã¼ãæã«CPUIDã調ã¹ãAPFãæå¹ã«ãªã£ã¦ããå ´åãMSRã®å½è©²ã¬ã¸ã¹ã¿ã«æ¸ãè¾¼ã(wrmsr)ãã¨ã§ãã¹ãã«ACKãã
- ãã®ã¨ããæ¸ãè¾¼ãå 容ã«ã²ã¹ãã«ã¼ãã«ãå²ãå½ã¦ã¦ããCPUæ¯ã®å¤æ°(per cpu varã以éAPFè¦å å¤æ°ã¨å¼ã¶)ã®ã¢ãã¬ã¹å«ãã
- ãã¹ã
- wrmsrã«ããVM exitãçºçããã²ã¹ããAPFã«å¯¾å¿ãã¦ãããã¨ãç¥ã
- APFè¦å å¤æ°ã®ã¢ãã¬ã¹ãè¦ãã¦ãã(å¾ã§ããã¼ã¸ãã©ã¼ã«ãæã«å©ç¨ãã)
以éã¯éè¦ãªé¨åã ãã³ã¼ããè¦ãªãã説æãã¦ããã¾ãããã¹ãã§CPUIDãè¨å®ããã¨ããã¯çç¥ãã¦ãã²ã¹ãããã¼ãããé¨åããã§ãã
ã¾ãã²ã¹ãã«ã¼ãã«ã¯ãã¹ãã®APFãæå¹ã§ãããã¨ããããã¨ããã¼ã¸ãã©ã¼ã«ãå²ãè¾¼ã¿ãã³ãã©ãã(do_)page_fault()ãã(do_)async_page_fault()ã¸å¤æ´ãã¾ãã
static void __init kvm_apf_trap_init(void) { set_intr_gate(14, &async_page_fault); }
(do_)async_page_fault()ã¯ãã¼ã¸ãã©ã¼ã«ãæã«APFãã©ãããã§ãã¯ãã¦ãAPFãªãã°APFè¦å æ¯ã®ãã³ãã©(å¾è¿°)ãå¼ã³åºããããã§ãªããã°do_page_fault()ãå¼ã³ã¾ã*2ã
ç¶ãã¦ãCPUæ¯ã®åæåå¦çã®éä¸ã§KVMã²ã¹ãåãåæåé¢æ°kvm_guest_cpu_init()ãå¼ã°ãã¾ãã
void __cpuinit kvm_guest_cpu_init(void) { if (!kvm_para_available()) return; if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF) && kvmapf) { u64 pa = __pa(&__get_cpu_var(apf_reason)); #ifdef CONFIG_PREEMPT pa |= KVM_ASYNC_PF_SEND_ALWAYS; #endif wrmsrl(MSR_KVM_ASYNC_PF_EN, pa | KVM_ASYNC_PF_ENABLED); __get_cpu_var(apf_reason).enabled = 1; printk(KERN_INFO"KVM setup async PF for cpu %d\n", smp_processor_id()); } }
kvm_para_has_feature()ã§APFãæå¹ã§ãããã¨ããããã¨ã__get_cpu_var(apf_reason)ã§APFè¦å å¤æ°ã®ã¢ãã¬ã¹ã«ãã©ã°ãå ãããã®ãMSRã«æ¸ãè¾¼ã¿ã¾ã*3ããã®apf_reasonã¯ãåè¿°ã®ã²ã¹ãã®ãã¼ã¸ãã©ã¼ã«ããã³ãã©ã®åå²å¤å®ã«ä½¿ããã¾ãã
ãã©ã°ã«é¢ãã¦ã§ãããCONFIG_PREEMPTãæå¹ã®ã¨ãã«KVM_ASYNC_PF_SEND_ALWAYSãè¨å®ããã¦ãã¾ããCONFIG_PREEMPTãæå¹ã«ãªã£ã¦ããå ´åã¯ãここの説明ã®ã¨ãããã«ã¼ãã«å ã³ã¼ããå®è¡ä¸ã«ããã»ã¹åãæ¿ãå¯è½ã§ããéã«æå¹ã§ãªãå ´åã¯ãããAPFã§ãã¹ãããããã»ã¹åãæ¿ãã®æ©ä¼ãä¸ãããã¦ãå°ãã¾ããKVM_ASYNC_PF_SEND_ALWAYSãoffã«ãã¦ããã¨ããã¼ã¸ãã©ã¼ã«ãæã«ã²ã¹ãã®CPLã0ã®å ´åã¯ããã¹ãã¯APFã§ã²ã¹ãã«å¦çãæ»ããªãããã«ãªãã¾ã(å¾è¿°)ã
wrmsrl()ã§MSRæ¸ãè¾¼ã¿å°ç¨å½ä»¤ãå¼ã°ããã¨VM exitãèµ·ããå¦çããã¹ãã«æ»ãã¾ãããã¹ãã§ã¯æçµçã«kvm_pv_enable_async_pf()ãå¼ã°ãã¾ãã
static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data) { gpa_t gpa = data & ~0x3f; /* Bits 2:5 are resrved, Should be zero */ if (data & 0x3c) return 1; vcpu->arch.apf.msr_val = data; if (!(data & KVM_ASYNC_PF_ENABLED)) { kvm_clear_async_pf_completion_queue(vcpu); kvm_async_pf_hash_reset(vcpu); return 0; } if (kvm_gfn_to_hva_cache_init(vcpu->kvm, &vcpu->arch.apf.data, gpa)) return 1; vcpu->arch.apf.send_user_only = !(data & KVM_ASYNC_PF_SEND_ALWAYS); kvm_async_pf_wakeup_all(vcpu); return 0; }
ããã§ã¯ã渡ãããAPFè¦å å¤æ°ã®ã¢ãã¬ã¹ãç»é²ããç´ æ©ãæ¸ãè¾¼ã¿ã§ããããã«kvm_gfn_to_hva_cache_init()ã§ãã£ãã·ã¥ãè¨å®ãã¦ãã¾ãããã®ãã£ãã·ã¥ã¯ã²ã¹ãç©çã¢ãã¬ã¹âãã¹ãä»®æ³ã¢ãã¬ã¹(qemuããã»ã¹ã®ã¢ãã¬ã¹)å¤æçµæãä¿æãããã®ã§ãå¾è¿°ã®kvm_write_guest_cached()ã§ä½¿ããã¾ãã
ã¾ãKVM_ASYNC_PF_SEND_ALWAYSãã©ã°ãoffã®ã¨ããsend_user_only = trueã¨è¨å®ãããã²ã¹ããã¦ã¼ã¶ã¢ã¼ãã§åãã¦ããã¨ãã ãAPFã®å²ãè¾¼ã¿ãéãããããã«ãªãã¾ãã
ãã¼ã¸ãã©ã¼ã«ã
ãã¹ãã§ã¹ã¯ããã¢ã¦ãããããã¼ã¸ã«ã²ã¹ããã¢ã¯ã»ã¹ããå ´åã¯ãVM exitãèµ·ãã¦ãã¹ãã«å¦çã移ãã¾ããã¾ãã¯ãã¡ãã®å¦çå 容ãã説æãã¦ããã¾ãã
ãã¹ã
- KVMã«ã¼ãã«ã¢ã¸ã¥ã¼ã«ã®ãã¼ã¸ãã©ã¼ã«ããã³ãã©(tdp_page_fault())ã§I/Oãå¿
è¦ã«ãªãããã§ãã¯ãã
- __get_user_pages_fast()ã¨ããI/Oãå¿ è¦ãªå ´åã¯ã¨ã©ã¼ãè¿ãé¢æ°ã使ã
- I/Oãå¿
è¦ãªå ´åãworkqueueã«I/Oãå®è¡ããä»äºãç»é²ãã
- ç»é²å workqueueã¯CPUæ¯ã«ç¨æãããã·ã¹ãã ã°ãã¼ãã«ãªãã®([events/?]ã¨ããååããã¤ã¹ã¬ãã)
- VM entryæã«ã²ã¹ãã«ãã¼ã¸ãã©ã¼ã«ãå²ãè¾¼ã¿ãçºçããããã«è¨å®ãã
- APFè¦å å¤æ°ã«å½è©²ãã¼ã¸ããã¹ãã§ã¹ã¯ããã¢ã¦ãããã¦ããæ¨ãæ¸ãã¦ãã
- VM entryãã
- ããªãå ´åããã(å¾è¿°)
ãã¼ã¸ãã©ã¼ã«ããã³ãã©ã¯try_async_pf() â gfn_to_pfn_async() â¦â __get_user_pages_fast() ã¨å¼ã³åºããI/Oãªãã§å½è©²ãã¼ã¸ãåå¨ããããã§ãã¯ãã¾ããããåå¨ããªããã°ãkvm_arch_setup_async_pf() â kvm_setup_async_pf()ã§workqueueã®åæåããã¾ã(workqueueã®å¦çå 容ã¯å¾è¿°)ããã®å¾ãkvm_arch_async_page_not_present()ã§ã²ã¹ãã«APFå²ãè¾¼ã¿ãããããã¨ãã¾ãã
void kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu, struct kvm_async_pf *work) { struct x86_exception fault; trace_kvm_async_pf_not_present(work->arch.token, work->gva); kvm_add_async_pf_gfn(vcpu, work->arch.gfn); if (!(vcpu->arch.apf.msr_val & KVM_ASYNC_PF_ENABLED) || (vcpu->arch.apf.send_user_only && kvm_x86_ops->get_cpl(vcpu) == 0)) kvm_make_request(KVM_REQ_APF_HALT, vcpu); else if (!apf_put_user(vcpu, KVM_PV_REASON_PAGE_NOT_PRESENT)) { fault.vector = PF_VECTOR; fault.error_code_valid = true; fault.error_code = 0; fault.nested_page_fault = false; fault.address = work->arch.token; kvm_inject_page_fault(vcpu, &fault); } }
ã¾ãã²ã¹ãã«APFå²ãè¾¼ã¿å¯è½ãã©ãããã§ãã¯ãã¾ããã²ã¹ãã§æå¹ã«ãªã£ã¦ããªãããsend_user_only=trueãã¤ã²ã¹ããã«ã¼ãã«ã¢ã¼ãã®ã¨ãã«ãã¼ã¸ãã©ã¼ã«ããèµ·ããå ´åã¯ãå²ãè¾¼ã¿ãããã¾ããã代ããã«VCPUãhaltããã¾ãã
Haltç¶æ ã®VCPUã¯ãæºã¾ã£ã¦ããã·ã°ãã«ã®å¦çãªã©ã¯å®è¡ãã¾ãããæ¡ä»¶ãæ´ãã¾ã§VM entryããªããªãã¾ããAPFã®å ´åã¯ãworkqueueã§å®è¡ããã¦ããI/Oãå®äºããã¾ã§haltã®ã¾ã¾ã«ãªãã¾ãã
APFå²ãè¾¼ã¿ã«å¯è½ãªå ´åã¯ãapf_put_user()ã§APFè¦å å¤æ°ã«ãã¼ã¸ãã©ã¼ã«ãè¦å (KVM_PV_REASON_PAGE_NOT_PRESENT)ãæ¸ãè¾¼ã¿ãKVMã«å²ãè¾¼ã¿ãèµ·ããããããã«è¨å®ãã¦ãã¾ãããªããAPFè¦å å¤æ°ã¯qemuããã»ã¹ã®ã¡ã¢ãªä¸(ã®ã²ã¹ãã®ã¡ã¢ãªé å)ã«è¦ãã¦ããã®ã§ãæ¸ãè¾¼ã¿ã«ã¯copy_to_user()ãç¨ãããã¨ã«ãªãã¾ãã
ã²ã¹ã
- å²ãè¾¼ã¿ãã³ãã©(do_async_page_fault())ãå¼ã°ãã
- ã«ã¬ã³ãããã»ã¹ãwaitqueueã«å ¥ãã¦ãå¯è½ãªãããã»ã¹åãæ¿ã(schedule())ãè¡ãªã
- ããã»ã¹åãæ¿ãã§ããªãå ´åã¯ãCPUãhaltããã
ã¾ããã¼ã¸ãã©ã¼ã«ããã³ãã©ã§ãã
dotraplinkage void __kprobes do_async_page_fault(struct pt_regs *regs, unsigned long error_code) { switch (kvm_read_and_reset_pf_reason()) { default: do_page_fault(regs, error_code); break; case KVM_PV_REASON_PAGE_NOT_PRESENT: /* page is swapped out by the host. */ kvm_async_pf_task_wait((u32)read_cr2()); break; case KVM_PV_REASON_PAGE_READY: kvm_async_pf_task_wake((u32)read_cr2()); break; } }
kvm_read_and_reset_pf_reason()ã§ããã¹ããè¨å®ããAPFè¦å å¤æ°ã調ã¹ã¦åå²ãã¾ããããã§ã¯ãKVM_PV_REASON_PAGE_NOT_PRESENTãã»ããããã¦ãã¯ããªã®ã§ãkvm_async_pf_task_wait()ãå¼ã°ãã¾ãã
void kvm_async_pf_task_wait(u32 token) { u32 key = hash_32(token, KVM_TASK_SLEEP_HASHBITS); struct kvm_task_sleep_head *b = &async_pf_sleepers[key]; struct kvm_task_sleep_node n, *e; DEFINE_WAIT(wait); int cpu, idle; cpu = get_cpu(); idle = idle_cpu(cpu); put_cpu(); spin_lock(&b->lock); e = _find_apf_task(b, token); if (e) { /* dummy entry exist -> wake up was delivered ahead of PF */ hlist_del(&e->link); kfree(e); spin_unlock(&b->lock); return; } n.token = token; n.cpu = smp_processor_id(); n.mm = current->active_mm; n.halted = idle || preempt_count() > 1; atomic_inc(&n.mm->mm_count); init_waitqueue_head(&n.wq); hlist_add_head(&n.link, &b->list); spin_unlock(&b->lock); for (;;) { if (!n.halted) prepare_to_wait(&n.wq, &wait, TASK_UNINTERRUPTIBLE); if (hlist_unhashed(&n.link)) break; if (!n.halted) { local_irq_enable(); schedule(); /* â»ããã»ã¹åãæ¿ã */ local_irq_disable(); } else { /* * We cannot reschedule. So halt. */ native_safe_halt(); local_irq_disable(); } } if (!n.halted) finish_wait(&n.wq, &wait); return; }
é·ãã§ãããã³ã¡ã³ãã«æ¸ãããâ»ããã»ã¹åãæ¿ããã®ç®æãä¸çªæ£å¸¸ãªãã¹ã§ããã¦ã¼ã¶ã¢ã¼ãã§ãã¼ã¸ãã©ã¼ã«ããèµ·ããã¨ãã¯ãã»ã¨ãã©ã®å ´åã¯ãããã»ã¹ãå¾ ã¡ç¶æ ã«è¨å®ããå¾ãããã§ä»ã®ããã»ã¹ããã£ã¹ãããããã¯ãã§ãã
ããããã³ã¼ããè¦ã¦ã¿ãã¨schedule()ã§ã¯ãªããnative_safe_halt()ãå¼ã°ãããã¤ã¾ãCPUãhaltããã¦ãããã¹ãããã¾ããã©ããã£ãå ´åã«ãããªãã®ã§ããããï¼
[ä¿®æ£]native_safe_halt()ãå¼ã°ããã®ã¯n.halted=trueã®ã¨ãã§ãããããªãã®ã¯ãã¾ãã«ã¬ã³ãããã»ã¹ãidleããã»ã¹ã®ã¨ãã§ããã«ã¬ã³ãããã»ã¹ãidleããã»ã¹ã¨ãããã¨ã¯ãä»ã«CPUãå²ãå½ã¦ãã¹ãããã»ã¹ãããªãã¨ãããã¨ãªã®ã§ãhaltã«ããã®ã¯å¦¥å½ã§ãã
次ã«CPUã横åãå¯è½ã§ãããã©ãã調ã¹ã¦ãã¾ããpreempt_count() > 1ã®ç®æã§ãã(é常ãªãã°ãpreempt_count() > 0ã§ãã§ãã¯ããã¨ãããªã®ã§ãããä¸ã®æ¹ã§spin_lock()ãå¼ãã§ããã®ã§ãããèæ ®ãã¦> 1ã¨ãªã£ã¦ãã¾ãã)ãã®æ¡ä»¶å¼ãtrueã«ãªãã®ã¯ãå²ãè¾¼ã¿ã³ã³ããã¹ãã§åãã¦ããã¨ãã¨ãããã¯ãä¿æãã¦ããã¨ãã§ãããã®ã«ã¦ã³ã¿ãã©ãæ±ããã¦ãããã¯こちらãåç §ãã ãããã¾ãããã¯ãä¿æããã¾ã¾CPUã横åãããã¨ã¾ããçç±ã¯preempt-locking.txtãåç §ãã¦ãã ããã[/ä¿®æ£]
Haltããå ´åã¯ããã¹ãããããæ¤åºå¯è½ãªã®ã§ãå¥ã®ä»®æ³ãã·ã³ããã¹ãããã»ã¹ã«CPUããã£ã¹ããããããã¯ãã§ãã
I/Oçºè¡ã¨I/Oå®äº
ããã§ã¯ãå®éã«I/Oãçºè¡ããã¦ããworkqueueã®è¨å®ã¨å®äºå¦çãè¦ã¦ããã¾ãã
- workqueueã«ã¯async_pf_execute()ãè¨å®ããã
- ã«ã¼ãã«ã¹ã¬ãã([events/?])ãasync_pf_execute()ãå¼ã¶
- APF I/Oç¨ã®ãªãã¸ã§ã¯ããå²ãå½ã¦ã
- get_user_pages()ãå¼ã¶ãçµæI/O(ã¹ã¯ããã¤ã³)ãçºçãã
- I/Oãå®äºãããAPF I/Oå®äºãªã¹ãã«ä¸è¨ãªãã¸ã§ã¯ããã¤ãªã
workqueueã¯ããã»ã¹ã³ã³ããã¹ãã§åãã¦ããsleepå¯è½ã§ããã¤ã¾ããæ®éã®ã¦ã¼ã¶ããã»ã¹ã§ãã¼ã¸ãã©ã¼ã«ããçºçããç¶æ ã¨ã»ã¼åãç¶æ³ã«ãªãã¾ãã
APF I/Oå®äºãªã¹ãç»é²ã¯ä»¥ä¸ã®éãã§ãå¾è¿°ã®VCPUã«ã¼ã(KVMã®ã¡ã¤ã³ã«ã¼ã)ã§ãã®ãªã¹ãããã§ãã¯ããã¾ãã
list_add_tail(&apf->link, &vcpu->async_pf.done);
ã²ã¹ãã¸ã®éç¥ã¨ãã¼ã¸ãã©ã¼ã«ã(å)
å½è©²ãã¼ã¸ãã¹ã¯ããã¤ã³ããããããã®æ¨ãã²ã¹ãã«éç¥ããã²ã¹ãã¯å½è©²ããã»ã¹ãèµ·åºããã¾ãã
- ãã¹ã
- VCPUã®ã«ã¼ãã§I/Oå®äºãå®æçã«ãã§ãã¯ãã
- å®äºãã¦ããããKVMãã¼ã¸ãã©ã¼ã«ããã³ãã©ãå度å¼ã³åºãã¦SPTEãè¨å®ãã
- APFè¦å å¤æ°ã®ãã©ã°ãç«ã¦ã¦ã²ã¹ãã«å²ãè¾¼ã¿ãããã
- ã²ã¹ã
- å²ãè¾¼ã¿ãã³ãã©ãå¼ã°ãã
- å½è©²ããã»ã¹(ããã»ã¹ã³ã³ããã¹ãã§å®è¡ä¸ã®ã«ã¼ãã«ãã«ã¼ãã«ã¹ã¬ãããããå¾ã)ãèµ·ããã
å®äºãã§ãã¯ã¯__vcpu_run()ã®ã«ã¼ãå ã®kvm_check_async_pf_completion()ã§è¡ãªããã¾ããããå®äºãã¦ããããkvm_arch_async_page_ready() â tdp_page_fault()ã¨å¼ã³ãSPTEãè¨å®ãã¾ãã(SPTEã®è¨å®ã«é¢ãã¦ã¯ d:id:kvm:20110514 ãåç §ãã ããã)
ãã®å¾ãkvm_check_async_pf_completion()ã¯kvm_arch_async_page_present()ãå¼ãã§ãAPFè¦å å¤æ°ãKVM_PV_REASON_PAGE_READYã«è¨å®ããã²ã¹ãã«å²ãè¾¼ã¿ããããããã«è¨å®ãã¾ã(KVM_PV_REASON_PAGE_NOT_PRESENTã®å ´åã¨ã ãããåããªã®ã§ã³ã¼ãã¯çç¥)ããªããããã§VCPUã®haltç¶æ ã解é¤ãããVM entryããããã«ãªãã¾ãã
å²ãè¾¼ã¿ãåããã²ã¹ãã¯ãdo_async_page_fault() â kvm_async_pf_task_wake() â apf_task_wake_one()ã¨å¼ã³ã¾ãã
static void apf_task_wake_one(struct kvm_task_sleep_node *n) { hlist_del_init(&n->link); if (!n->mm) return; mmdrop(n->mm); if (n->halted) smp_send_reschedule(n->cpu); else if (waitqueue_active(&n->wq)) wake_up(&n->wq); }
èµ·ãã対象ãhaltãã¦ãå ´åã¯ããã®å²ãè¾¼ã¿ãã³ãã©ã¯å¥ã®(V)CPUã§åãã¦ããã¯ããªã®ã§ãIPIã§CPUãèµ·ããã¾ã(smp_send_reschedule())ãããã§ãªããã°ãå½è©²ããã»ã¹ãèµ·åºããã¾ãã
ãããã«
ããããã¨ç«¯æã£ã¦ãã¾ãããAPFã®åä½ãã²ã¨éãè¦ã¦ããã¾ããã
æã£ãããè¤éã§ãã«ã¼ãã«ããã¼ãã¦ã§ã¢ã«é¢ããç¥èããªãã¨ç解ããã®ãé£ããã£ãã§ããéåæå¦çãå®ç¾ããã®ã¯ãã£ã±ã大å¤ã§ããã
以ä¸ã¯ãã³ã¼ããèªãã¨ãã«åèã«ãããµã¤ããã³ããããã³ã¼ãã®é¢æ°å®ç¾©ãªã©ã¸ã®ãªã³ã¯ãä»é²ã¨ãã¦ä»ãã¾ããã(æ¬æä¸ã§ãªã³ã¯ãå¼µãã¨æ¬æã®ç·¨éãããã«ããã£ãã®ã§ä»é²ã«ãã¾ããã)APFãèªãã§ã¿ããã¨æã£ã人ã®åèã«ãªãã°å¹¸ãã§ãã
åèæç®
é¢é£ã³ããã(主è¦ãªãã®)
ã«ã¼ãã«
- KVM: Halt vcpu if page it tries to access is swapped out
- KVM: Retry fault before vmentry
- KVM: Add PV MSR to enable asynchronous page faults delivery
- KVM: Handle async PF in a guest
- KVM: Inject asynchronous page fault into a PV guest if page is swapped out
- KVM paravirt: Handle async PF in non preemptable context
- KVM: Let host know whether the guest can handle async PF in non-userspace context
- KVM: Send async PF when guest is not in userspace too
- KVM: expose async pf through our standard mechanism
é¢é£ã½ã¼ã¹ã³ã¼ã(主è¦ãªãã®)
åã ã®é¢æ°ãæ§é ä½ãªã©ã¸ã®ãªã³ã¯
- tdp_page_fault()
- try_async_pf()
- hva_to_pfn()
- __get_user_pages_fast()
- kvm_arch_async_page_not_present()
- kvm_arch_async_page_present()
- KVM_REQ_APF_HALT
- struct kvm_async_pf
- struct kvm_arch_async_pf
- kvm_make_request()
- struct kvm_vcpu_arch.apf
- kvm_setup_async_pf()
- kvm_arch_setup_async_pf()
- kvm_arch_async_page_ready()
- kvm_check_async_pf_completion()
- __vcpu_run()
- kvm_inject_page_fault()
- async_pf_execute()
- schedule_work()
- get_user_pages()
- kvm_apf_trap_init()
- do_async_page_fault()
- kvm_guest_cpu_init()
- struct kvm_vcpu_pv_apf_data
- apf_reason
- apf_put_user()
- kvm_gfn_to_hva_cache_init()
*1:注ï¼ãã¹ãè¦ç¹ã§ã¯ãä»ã«ãã£ã¹ãããå¯è½ãªããã»ã¹ãããå ´åã¯ã¹ã«ã¼ãããã¯å¤ãããªãããããã¾ããã
*2:ã¤ã¾ããé常ã®ãã¼ã¸ãã©ã¼ã«ãæã®ãªã¼ãããããå°ãå¢ããã
*3:apf_reasonã¯64bitã§alignããã¦ããã®ã§ãä¸ä½6bitããã©ã°ã«ä½¿ãã¾ãã