KVM MMU EPT記憶體管理
阿新 • • 發佈:2019-02-12
轉載請註明:【轉載自部落格xelatex KVM】,並附本文連結。謝謝。
【注】文章中採用的版本:
Linux-3.11,https://www.kernel.org/pub/linux/kernel/v3.x/linux-3.11.tar.gz
qemu-kvm,git clone http://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git,
git checkout 4d9367b76f71c6d938cf8201392abe4bfb1136cb
先說幾個英文縮寫:
上文中我提到KVM只提供機制,不提供策略。為了實現對記憶體區域的管理,採用了kvm_memory_slot結構來對應Qemu中的AddressSpace。Qemu將虛擬機器的線性地址(實體地址)在KVM中註冊為多個記憶體槽,如BIOS、MMIO、GPU、RAW。
KVM中memory slot的資料結構:
<include/linux/kvm_host.h>
KVM資料結構中與記憶體槽相關的結構,注意KVM對每個虛擬機器都會建立和維護一個struct kvm結構。 <include/linux/kvm_host.h>
kvm->memslots結構在建立虛擬機器時被建立,程式碼見: <virt/kvm/kvm_main.c>
記憶體槽的註冊入口在kvm_vm_ioctl函式中case KVM_SET_USER_MEMORY_REGION部分,最終呼叫函式__kvm_set_memory_region在KVM中建立與Qemu相對應的記憶體槽結構。__kvm_set_memory_region函式主要做了如下幾件事情:
該函式中首先呼叫kvm_arch_vcpu_create建立vcpu,然後呼叫kvm_arch_vcpu_setup初始化vcpu。在x86架構中,kvm_arch_vcpu_create最終呼叫vmx_create_vcpu函式進行VCPU的建立工作。MMU的建立在vmx_create_vcpu => kvm_vcpu_init => kvm_arch_vcpu_init => kvm_mmu_create 中,如下: <arch/x86/kvm/mmu.c>
該函式指定了arch.walk_mmu就是arch.mmu的地址,在KVM MMU相關的程式碼中經常會把arch.walk_mmu和arch.mmu混用,在這裡指定了他們其實是一回事。我們來看在vcpu->arch中與MMU相關的結構: <arch/x86/include/asm/kvm_host.h>
註釋已經很清楚了,我就不做過多的解釋了,說一下三個cache:
他們分別對應mmu_pte_list_desc_cache和mmu_page_header_cache,也就是說如果這兩個cache中快取的object數目不夠,則會從上述對應的kmem_cache中獲取,對應的程式碼可以參考函式mmu_topup_memory_cache;而mmu_page_cache中的object數目不夠時,則呼叫mmu_topup_memory_cache_page函式,其中直接呼叫了__get_free_page函式來獲得頁面。在一些初始化函式中,需要初始化這些cache以便加速執行時的分配,初始化函式為mmu_topup_memory_caches,該初始化過程在mmu page fault處理函式(如tdp_page_fault)、MMU初始化函式(kvm_mmu_load)和寫SPT的pte函式(kvm_mmu_pte_write)中被呼叫。 如果不關注效率的話可以忽略上述cache。 2.2 KVM MMU的初始化 KVM MMU的初始化過程在kvm_vm_ioctl_create_vcpu => kvm_arch_vcpu_setup => kvm_mmu_setup => init_kvm_mmu呼叫鏈中。init_kvm_mmu函式根據建立MMU的型別分別有三個呼叫路徑init_kvm_nested_mmu、init_kvm_tdp_mmu、init_kvm_softmmu。init_kvm_nested_mmu是nested virtualization中呼叫的,init_kvm_tdp_mmu是支援EPT的虛擬化呼叫的(tdp的含義是Two-dimentional Paging,也就是EPT),init_kvm_soft_mmu是軟體SPT(Shadow Page Table)呼叫的。我們這裡只關注init_kvm_tdp_mmu。 init_kvm_tdp_mmu中唯一做的事情就是初始化了2.1中提到的arch.mmu(通過arch->walk_mmu初始化的),並且根據host的不同進行不同的初始化過程。 下面我們來看struct kvm_mmu結構,init_kvm_tdp_mmu中幾乎初始化了kvm_mmu中所有的域(此處可以參考官方文件[2][3]:https://www.kernel.org/doc/Documentation/virtual/kvm/mmu.txt,或者kernel目錄中Documentation/virtual/kvm/mmu.txt) <arch/x86/include/asm/kvm_host.h>
對該結構體中各個域進行說明:
各個域解釋如下:
<arch/x86/kvm/mmu.h>
<arch/x86/kvm/mmu.c>
可以看到在kvm_mmu_load函式中呼叫了mmu_alloc_roots函式來初始化根目錄的頁面,並呼叫arch.mmu.set_cr3(實際為vmx_set_cr3)來設定Guest的CR3暫存器。 五、EPT頁表缺頁處理流程 Intel EPT相關的VMEXIT有兩個:
巨集PTE_LIST_EXT定義的陣列是用來對齊一個cache line,在pte_list_add中,desc->sptes[0] = (u64 *)*pte_list,desc->sptes[1] = spte。 反向對映在mmu_set_spte中被新增。 引用: [3] kernel目錄中Documentation/virtual/kvm/mmu.txt
- GVA - Guest Virtual Address,虛擬機器的虛擬地址
- GPA - Guest Physical Address,虛擬機器的虛擬地址
- GFN - Guest Frame Number,虛擬機器的頁框號
- HVA - Host Virtual Address,宿主機虛擬地址,也就是對應Qemu中申請的地址
- HPA - Host Physical Address,宿主機實體地址
- HFN - Host Frame Number,宿主機的頁框號
<arch/x86/include/asm/kvm_host.h>struct kvm_memory_slot { gfn_t base_gfn; // 該slot對應虛擬機器頁框的起點 unsigned long npages; // 該slot中有多少個頁 unsigned long *dirty_bitmap; // 髒頁的bitmap struct kvm_arch_memory_slot arch; // 體系結構相關的結構 unsigned long userspace_addr; // 對應HVA的地址 u32 flags; // slot的flag short id; // slot識別id };
struct kvm_arch_memory_slot {
unsigned long *rmap[KVM_NR_PAGE_SIZES]; // 反向對映結構(reverse map)
struct kvm_lpage_info *lpage_info[KVM_NR_PAGE_SIZES - 1]; // Large page結構(如2MB、1GB大小頁面)
};
KVM資料結構中與記憶體槽相關的結構,注意KVM對每個虛擬機器都會建立和維護一個struct kvm結構。 <include/linux/kvm_host.h>
struct kvm {
spinlock_t mmu_lock; // MMU最大的鎖
struct mutex slots_lock; // 記憶體槽操作鎖
struct mm_struct *mm; /* userspace tied to this vm,指向虛擬機器內部的頁儲存結構 */
struct kvm_memslots *memslots; // 儲存該KVM所有的memslot
...
};
struct kvm_memslots {
u64 generation;
struct kvm_memory_slot memslots[KVM_MEM_SLOTS_NUM];
/* The mapping table from slot id to the index in memslots[]. */
short id_to_index[KVM_MEM_SLOTS_NUM];
};
kvm->memslots結構在建立虛擬機器時被建立,程式碼見: <virt/kvm/kvm_main.c>
static struct kvm *kvm_create_vm(unsigned long type)
{
...
kvm->memslots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
if (!kvm->memslots)
goto out_err_nosrcu;
kvm_init_memslots_id(kvm);
...
}
記憶體槽的註冊入口在kvm_vm_ioctl函式中case KVM_SET_USER_MEMORY_REGION部分,最終呼叫函式__kvm_set_memory_region在KVM中建立與Qemu相對應的記憶體槽結構。__kvm_set_memory_region函式主要做了如下幾件事情:
- 資料檢查
- 呼叫id_to_memslot來獲得kvm->memslots中對應的memslot指標
- 設定memslot的base_gfn、npages等域
- 處理和已經存在的memslots的重疊
- 呼叫install_new_memslots裝載新的memslot
static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
{
...
vcpu = kvm_arch_vcpu_create(kvm, id);
...
r = kvm_arch_vcpu_setup(vcpu);
...
}
該函式中首先呼叫kvm_arch_vcpu_create建立vcpu,然後呼叫kvm_arch_vcpu_setup初始化vcpu。在x86架構中,kvm_arch_vcpu_create最終呼叫vmx_create_vcpu函式進行VCPU的建立工作。MMU的建立在vmx_create_vcpu => kvm_vcpu_init => kvm_arch_vcpu_init => kvm_mmu_create 中,如下: <arch/x86/kvm/mmu.c>
int kvm_mmu_create(struct kvm_vcpu *vcpu)
{
ASSERT(vcpu);
vcpu->arch.walk_mmu = &vcpu->arch.mmu;
vcpu->arch.mmu.root_hpa = INVALID_PAGE;
vcpu->arch.mmu.translate_gpa = translate_gpa;
vcpu->arch.nested_mmu.translate_gpa = translate_nested_gpa;
return alloc_mmu_pages(vcpu);
}
該函式指定了arch.walk_mmu就是arch.mmu的地址,在KVM MMU相關的程式碼中經常會把arch.walk_mmu和arch.mmu混用,在這裡指定了他們其實是一回事。我們來看在vcpu->arch中與MMU相關的結構: <arch/x86/include/asm/kvm_host.h>
struct kvm_vcpu_arch {
...
/*
* Paging state of the vcpu
*
* If the vcpu runs in guest mode with two level paging this still saves
* the paging mode of the l1 guest. This context is always used to
* handle faults.
*/
struct kvm_mmu mmu;
/*
* Paging state of an L2 guest (used for nested npt)
*
* This context will save all necessary information to walk page tables
* of the an L2 guest. This context is only initialized for page table
* walking and not for faulting since we never handle l2 page faults on
* the host.
*/
struct kvm_mmu nested_mmu;
/*
* Pointer to the mmu context currently used for
* gva_to_gpa translations.
*/
struct kvm_mmu *walk_mmu;
struct kvm_mmu_memory_cache mmu_pte_list_desc_cache;
struct kvm_mmu_memory_cache mmu_page_cache;
struct kvm_mmu_memory_cache mmu_page_header_cache;
...
};
註釋已經很清楚了,我就不做過多的解釋了,說一下三個cache:
- mmu_pte_list_desc_cache:用來分配struct pte_list_desc結構,該結構主要用於反向對映,參考rmap_add函式,每個rmapp指向的就是一個pte_list。後面介紹反向對映的時候會詳細介紹。
- mmu_page_cache:用來分配spt頁結構,spt頁結構是儲存spt paging structure的頁,對應kvm_mmu_page.spt
- mmu_page_header_cache:用來分配struct kvm_mmu_page結構,從該cache分配的頁面可能會呼叫kmem_cache機制來分配
static struct kmem_cache *pte_list_desc_cache;
static struct kmem_cache *mmu_page_header_cache;
他們分別對應mmu_pte_list_desc_cache和mmu_page_header_cache,也就是說如果這兩個cache中快取的object數目不夠,則會從上述對應的kmem_cache中獲取,對應的程式碼可以參考函式mmu_topup_memory_cache;而mmu_page_cache中的object數目不夠時,則呼叫mmu_topup_memory_cache_page函式,其中直接呼叫了__get_free_page函式來獲得頁面。在一些初始化函式中,需要初始化這些cache以便加速執行時的分配,初始化函式為mmu_topup_memory_caches,該初始化過程在mmu page fault處理函式(如tdp_page_fault)、MMU初始化函式(kvm_mmu_load)和寫SPT的pte函式(kvm_mmu_pte_write)中被呼叫。 如果不關注效率的話可以忽略上述cache。 2.2 KVM MMU的初始化 KVM MMU的初始化過程在kvm_vm_ioctl_create_vcpu => kvm_arch_vcpu_setup => kvm_mmu_setup => init_kvm_mmu呼叫鏈中。init_kvm_mmu函式根據建立MMU的型別分別有三個呼叫路徑init_kvm_nested_mmu、init_kvm_tdp_mmu、init_kvm_softmmu。init_kvm_nested_mmu是nested virtualization中呼叫的,init_kvm_tdp_mmu是支援EPT的虛擬化呼叫的(tdp的含義是Two-dimentional Paging,也就是EPT),init_kvm_soft_mmu是軟體SPT(Shadow Page Table)呼叫的。我們這裡只關注init_kvm_tdp_mmu。 init_kvm_tdp_mmu中唯一做的事情就是初始化了2.1中提到的arch.mmu(通過arch->walk_mmu初始化的),並且根據host的不同進行不同的初始化過程。 下面我們來看struct kvm_mmu結構,init_kvm_tdp_mmu中幾乎初始化了kvm_mmu中所有的域(此處可以參考官方文件[2][3]:https://www.kernel.org/doc/Documentation/virtual/kvm/mmu.txt,或者kernel目錄中Documentation/virtual/kvm/mmu.txt) <arch/x86/include/asm/kvm_host.h>
/*
* x86 supports 3 paging modes (4-level 64-bit, 3-level 64-bit, and 2-level
* 32-bit). The kvm_mmu structure abstracts the details of the current mmu
* mode.
*/
struct kvm_mmu {
void (*new_cr3)(struct kvm_vcpu *vcpu);
void (*set_cr3)(struct kvm_vcpu *vcpu, unsigned long root);
unsigned long (*get_cr3)(struct kvm_vcpu *vcpu);
u64 (*get_pdptr)(struct kvm_vcpu *vcpu, int index);
int (*page_fault)(struct kvm_vcpu *vcpu, gva_t gva, u32 err,
bool prefault);
void (*inject_page_fault)(struct kvm_vcpu *vcpu,
struct x86_exception *fault);
void (*free)(struct kvm_vcpu *vcpu);
gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t gva, u32 access,
struct x86_exception *exception);
gpa_t (*translate_gpa)(struct kvm_vcpu *vcpu, gpa_t gpa, u32 access);
int (*sync_page)(struct kvm_vcpu *vcpu,
struct kvm_mmu_page *sp);
void (*invlpg)(struct kvm_vcpu *vcpu, gva_t gva);
void (*update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
u64 *spte, const void *pte);
hpa_t root_hpa;
int root_level;
int shadow_root_level;
union kvm_mmu_page_role base_role;
bool direct_map;
/*
* Bitmap; bit set = permission fault
* Byte index: page fault error code [4:1]
* Bit index: pte permissions in ACC_* format
*/
u8 permissions[16];
u64 *pae_root;
u64 *lm_root;
u64 rsvd_bits_mask[2][4];
/*
* Bitmap: bit set = last pte in walk
* index[0:1]: level (zero-based)
* index[2]: pte.ps
*/
u8 last_pte_bitmap;
bool nx;
u64 pdptrs[4]; /* pae */
};
/*
* kvm_mmu_page_role, below, is defined as:
*
* bits 0:3 - total guest paging levels (2-4, or zero for real mode)
* bits 4:7 - page table level for this shadow (1-4)
* bits 8:9 - page table quadrant for 2-level guests
* bit 16 - direct mapping of virtual to physical mapping at gfn
* used for real mode and two-dimensional paging
* bits 17:19 - common access permissions for all ptes in this shadow page
*/
union kvm_mmu_page_role {
unsigned word;
struct {
unsigned level:4;
unsigned cr4_pae:1;
unsigned quadrant:2;
unsigned pad_for_nice_hex_output:6;
unsigned direct:1;
unsigned access:3;
unsigned invalid:1;
unsigned nxe:1;
unsigned cr0_wp:1;
unsigned smep_andnot_wp:1;
};
};
對該結構體中各個域進行說明:
- 從new_cr3到update_pte為函式指標,分別對應著MMU的操作,初始化過程會對這些指標進行初始化,其功能在其命名中即可體現,這裡就不詳細介紹了
- root_hpa:儲存Paging Structure中根目錄的結構,如EPT中的eptp
- root_level:Host Paging Structure中根目錄的級別(如64位支援paging的系統可以支援level=4的頁結構)
- shadow_root_level:SPT Paging Structure中根目錄的級別(如64位支援paging的系統可以支援level=4的EPT頁結構)
- base_role:建立MMU頁面時採用的基本的page role(下面描述摘自[3],就不翻譯了)
- role.level:
The level in the shadow paging hierarchy that this shadow page belongs to. 1=4k sptes, 2=2M sptes, 3=1G sptes, etc. - role.direct:
If set, leaf sptes reachable from this page are for a linear range. Examples include real mode translation, large guest pages backed by small host pages, and gpa->hpa translations when NPT or EPT is active. The linear range starts at (gfn << PAGE_SHIFT) and its size is determined by role.level (2MB for first level, 1GB for second level, 0.5TB for third level, 256TB for fourth level) If clear, this page corresponds to a guest page table denoted by the gfn field. - role.quadrant:
When role.cr4_pae=0, the guest uses 32-bit gptes while the host uses 64-bit sptes. That means a guest page table contains more ptes than the host, so multiple shadow pages are needed to shadow one guest page. For first-level shadow pages, role.quadrant can be 0 or 1 and denotes the first or second 512-gpte block in the guest page table. For second-level page tables, each 32-bit gpte is converted to two 64-bit sptes (since each first-level guest page is shadowed by two first-level shadow pages) so role.quadrant takes values in the range 0..3. Each quadrant maps 1GB virtual address space. - role.access:
Inherited guest access permissions in the form uwx. Note execute permission is positive, not negative. - role.invalid:
The page is invalid and should not be used. It is a root page that is currently pinned (by a cpu hardware register pointing to it); once it is unpinned it will be destroyed. - role.cr4_pae:
Contains the value of cr4.pae for which the page is valid (e.g. whether 32-bit or 64-bit gptes are in use). - role.nxe:
Contains the value of efer.nxe for which the page is valid. - role.cr0_wp:
Contains the value of cr0.wp for which the page is valid. - role.smep_andnot_wp:
Contains the value of cr4.smep && !cr0.wp for which the page is valid (pages for which this is true are different from other pages; see the treatment of cr0.wp=0 below).
- role.level:
- direct_map:該MMU是否保證儲存的頁結構和VCPU使用的頁結構的一致性。如果為true則每次MMU內容時都會重新整理VCPU的TLB,否則需要手動同步。
- permissions:在page fault處理時不同page fault error code對應的許可權,許可權由 ACC_* 系列巨集指定
- last_pte_bitmap:上一次訪問的pte
- nx:對應CPU efer.nx,詳見Intel手冊
- pdptrs:Guest的頁表結構,對應VMCS中GUEST_PDPTR0、GUEST_PDPTR1、GUEST_PDPTR2和GUEST_PDPTR3,參考ept_save_pdptrs和ept_load_pdptrs函式。
struct kvm_mmu_page {
struct list_head link;
struct hlist_node hash_link;
/*
* The following two entries are used to key the shadow page in the
* hash table.
*/
gfn_t gfn;
union kvm_mmu_page_role role;
u64 *spt;
/* hold the gfn of each spte inside spt */
gfn_t *gfns;
bool unsync;
int root_count; /* Currently serving as active root */
unsigned int unsync_children;
unsigned long parent_ptes; /* Reverse mapping for parent_pte */
/* The page is obsolete if mmu_valid_gen != kvm->arch.mmu_valid_gen. */
unsigned long mmu_valid_gen;
DECLARE_BITMAP(unsync_child_bitmap, 512);
#ifdef CONFIG_X86_32
/*
* Used out of the mmu-lock to avoid reading spte values while an
* update is in progress; see the comments in __get_spte_lockless().
*/
int clear_spte_count;
#endif
/* Number of writes since the last time traversal visited this page. */
int write_flooding_count;
};
各個域解釋如下:
- link:link將該結構連結到kvm->arch.active_mmu_pages和invalid_list上,標註該頁結構不同的狀態
- hash_link:hash_link將該結構連結到kvm->arch.mmu_page_hash雜湊表上,以便進行快速查詢,hash key由接下來的gfn和role決定
- gfn:在直接對映中儲存線性地址的基地址;在非直接對映中儲存guest page table,該PT包含了由該頁對映的translation。非直接對映不常見
- role:該頁的“角色”,詳細參見上文對union kvm_mmu_page_role的說明
- spt:對應的SPT/EPT頁地址,SPT/EPT頁的struct page結構中page->private域會反向指向該struct kvm_mmu_page。該域可以指向一個lower-level shadow pages,也可以指向真正的資料page。
- parent_ptes:指向上一級spt
- unsync:該域只對頁結構的葉子節點有效,可以執行該頁的翻譯是否與guest的翻譯一致。如果為false,則可能出現修改了該頁中的pte但沒有更新tlb,而guest讀取了tlb中的資料,導致了不一致。
- root_count:該頁被多少個vcpu作為根頁結構
- unsync_children:記錄該頁結構下面有多少個子節點是unsync狀態的
- mmu_valid_gen:該頁的generation number。KVM維護了一個全域性的的gen number(kvm->arch.mmu_valid_gen),如果該域與全域性的gen number不相等,則將該頁標記為invalid page。該結構用來快速的碾壓掉KVM的MMU paging structure。例如,如果想廢棄掉當前所有的MMU頁結構,需要處理掉所有的MMU頁面和對應的對映;但是通過該結構,可以直接將kvm->arch.mmu_valid_gen加1,那麼當前所有的MMU頁結構都變成了invalid,而處理掉頁結構的過程可以留給後面的過程(如記憶體不夠時)再處理,可以加快廢棄所有MMU頁結構的速度。當mmu_valid_gen值達到最大時,可以呼叫kvm_mmu_invalidate_zap_all_pages手動廢棄掉所有的MMU頁結構。
- unsync_child_bitmap:記錄了unsync的子結構的點陣圖
- clear_spte_count:僅針對32位host有效,具體作用參考函式__get_spte_lockless的註釋
- write_flooding_count:在防寫模式下,對於任何一個頁的寫都會導致KVM進行一次emulation。對於葉子節點(真正指向資料頁的節點),可以使用unsync狀態來保護頻繁的寫操作不會導致大量的emulation,但是對於非葉子節點(paging structure節點)則不行。對於非葉子節點的寫emulation會修改該域,如果寫emulation非常頻繁,KVM會unmap該頁以避免過多的寫emulation。
static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
{
...
r = kvm_mmu_reload(vcpu);
...
}
<arch/x86/kvm/mmu.h>
static inline int kvm_mmu_reload(struct kvm_vcpu *vcpu)
{
if (likely(vcpu->arch.mmu.root_hpa != INVALID_PAGE))
return 0;
return kvm_mmu_load(vcpu);
}
<arch/x86/kvm/mmu.c>
int kvm_mmu_load(struct kvm_vcpu *vcpu)
{
int r;
r = mmu_topup_memory_caches(vcpu);
if (r)
goto out;
r = mmu_alloc_roots(vcpu);
kvm_mmu_sync_roots(vcpu);
if (r)
goto out;
/* set_cr3() should ensure TLB has been flushed */
vcpu->arch.mmu.set_cr3(vcpu, vcpu->arch.mmu.root_hpa);
out:
return r;
}
可以看到在kvm_mmu_load函式中呼叫了mmu_alloc_roots函式來初始化根目錄的頁面,並呼叫arch.mmu.set_cr3(實際為vmx_set_cr3)來設定Guest的CR3暫存器。 五、EPT頁表缺頁處理流程 Intel EPT相關的VMEXIT有兩個:
- EPT Misconfiguration:EPT pte配置錯誤,具體情況參考Intel Manual 3C, 28.2.3.1 EPT Misconfigurations
- EPT Violation:當guest VM訪存出發到EPT相關的部分,在不產生EPT Misconfiguration的前提下,可能會產生EPT Violation,具體情況參考Intel Manual 3C, 28.2.3.2 EPT Violations
- 設定role
- 通過role和gfn在反向對映表中查詢kvm mmu page,如果存在之前建立過的page,則返回該page
- 呼叫kvm_mmu_alloc_page建立新的struct kvm_mmu_page
- 呼叫hlist_add_head將該頁加入到kvm->arch.mmu_page_hash雜湊表中
- 呼叫init_shadow_page_table初始化對應的spt
- 頁面回收時,能夠知道宿主機虛擬地址HVA
- 通過HVA可以計算出GFN,計算公式gfn=(hva-base_hva)>>PAGE_SIZE+base_gfn
- 通過反向對映定位到影子頁表項spte
- 由gfn獲得對應的rmap:static unsigned long *gfn_to_rmap(struct kvm *kvm, gfn_t gfn, int level)
- 新增gfn反向對映spte:static int rmap_add(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn),新增的內容是struct pte_list_desc
- 刪除反向對映:static void rmap_remove(struct kvm *kvm, u64 *spte)
- 獲得rmap單鏈表中的元素:首先呼叫rmap_get_first()獲得一個有效的rmap_iterator,其次呼叫static u64 *rmap_get_next(struct rmap_iterator *iter)獲得連結串列中的下一個元素
struct pte_list_desc {
u64 *sptes[PTE_LIST_EXT];
struct pte_list_desc *more;
};
巨集PTE_LIST_EXT定義的陣列是用來對齊一個cache line,在pte_list_add中,desc->sptes[0] = (u64 *)*pte_list,desc->sptes[1] = spte。 反向對映在mmu_set_spte中被新增。 引用: [3] kernel目錄中Documentation/virtual/kvm/mmu.txt