intel:x86架構VT虛擬化(三):x64多核程式碼介紹
一般而言,我們做windows核心和VT測試,都是在自己的物理機裝vmware或virtualBox虛擬機器,再在虛擬機器裝windows,然後在物理機裝windbg連結到虛擬機器,通過windbg除錯虛擬機器的windwos核心;如果是VT測試,就要開啟虛擬機器的VT,這就涉及到VT嵌套了,整體架構如下:
L0 = Code that runs on a physical host. Runs a hypervisor;
L1 = L0’s hypervisor guest. Runs the hypervisor we want to debug;
L2 = L1’s hypervisor guest;
從上面的圖能看到用windbg既能除錯guestOS,也能除錯hostOS的程式碼;
上一篇文章用了周壑的VT框架,其優點是程式碼簡潔、框架邏輯明晰,適合初學入門;缺點是僅限於32位,無法跑在64位,並且還是單核;這次推薦另一個框架,github地址:https://github.com/zhuhuibeishadiao ,裡面有miniVT64和PFHook兩個工程,建議先從miniVT64入手,原因同樣是邏輯簡單,程式碼少,易入門;
1、第一次跑程式碼的時候就藍屏報錯,從windbg看到錯誤型別: 常見的C0000005,access violation,也就是記憶體無法訪問;
執行出錯的程式碼:invvpid
為了徹底瞭解出錯原因並修復bug,這裡簡單介紹一下invvpid這個指令的作用,核心要點如下:
(1)Intel的VPID(Virtual-Processor Identifier)是一個16位的域,每個TLB表項與一個VPID相關聯,用於唯一標識一個VCPU;
(2)當進行虛擬地址到實體地址轉換的時候,只有一個TLB表項對應的VPID與當前正在執行的虛擬機器的VCPU的VPID相同的時候,才可以用該TLB表項把虛擬地址轉換為實體地址;
(3)利用VPID可以區分一個TLB表項屬於哪個VCPU,從而在虛擬機器切換的時候可以保留TLB中已經有的表項,減少了無用的TLB重新整理;
(4)invvpid指令第二個引數叫descriptor,結構如下:一共128bit,0-15就是VPID號,64-127是快取的線性地址,可有效減少CPU轉換地址時讀記憶體的次數,提升程式執行效率;
回到這個bug本身:函式有兩個引數,分別是rcx和rdx。看了一下出錯當時的上下文,發現rcx=2,意味著invalidate掉所有VPID(除了000H)對應的虛擬地址翻譯;從access violation的提示看,應該是第二個descriptor引數出錯了:這裡訪問了記憶體;
回到windbg,把dq讀取一下rax地址對應的內容,發現沒任何問題;這就奇怪了:能讀取到記憶體特定地址的內容,但是windbg又報access violation的錯,這是怎麼回事了?繼續看https://www.felixcloutier.com/x86/invvpid的指令介紹,發現一條重要資訊:在訪問記憶體時發生缺頁會導致異常,這就能解釋這條指令為什麼執行失敗了。
執行invvpid時已經開啟了VMX,此時已經進入hostOS。但目前的hostOS剛開始執行,什麼程式碼都沒有:VMCS還未設定,段暫存器、控制暫存器、GDT/IDT都沒設定,屬於”一窮二白“的階段,此時若發生缺頁異常,去哪找回缺失的頁都不知道,只能宕機;所以invvpid的第二個引數必須要用非分頁記憶體,確保不會被交換到磁碟;
改進後的程式碼:分配一個128bit = 16byte的非分頁記憶體,再作為descriptor傳入:
即使進入host,分配記憶體、轉成實體地址(再直白一點:還要依靠guestOS維護的頁表才能把虛擬地址轉成實體地址)等都要依靠guestOS的API,host此時還只是個空架子;
2、正當愉快地單步時,另一個問題接踵而至:出異常的程式碼時xsaves [rcx];
出錯時的呼叫堆疊:
驅動裡面的出錯程式碼:
這次的異常程式碼是在swapcontext,應該是在切換執行緒時出錯的;老辦法,先查查這個條指令的作用:https://www.felixcloutier.com/x86/xsaves
“Performs a full or partial save of processor state components to the XSAVE area located at the memory address specified by the destination operand”: 就是儲存處理器的各種狀態到指令指定的記憶體模組;這裡指定的記憶體在[rcx],先看看這塊記憶體是不是讀寫出錯了:從結果來看,這塊記憶體區域是沒問題的;
kd> dq ffffd40acb595cc0 ffffd40a`cb595cc0 00000000`00000000 00000000`00000000 ffffd40a`cb595cd0 00000000`00000000 00000000`00001f80 ffffd40a`cb595ce0 00000000`00000000 00000000`00000000 ffffd40a`cb595cf0 00000000`00000000 00000000`00000000 ffffd40a`cb595d00 00000000`00000000 00000000`00000000 ffffd40a`cb595d10 00000000`00000000 00000000`00000000 ffffd40a`cb595d20 00000000`00000000 00000000`00000000 ffffd40a`cb595d30 00000000`00000000 00000000`00000000 kd> r cr3 cr3=00000000001aa000 kd> !vtop 00000000001aa000 fffff800bf80734c Amd64VtoP: Virt fffff800bf80734c, pagedir 00000000001aa000 Amd64VtoP: PML4E 00000000001aaf80 Amd64VtoP: PDPE 0000000001109010 Amd64VtoP: PDE 000000000110afe0 Amd64VtoP: PTE 0000000001095038 Amd64VtoP: Mapped phys 000000000220734c Virtual address fffff800bf80734c translates to physical address 220734c.
從日誌看:是進入guestOS後才產生的異常,既然是這裡產生的,很有可能是xsaves產生了vmexit,但是hostOS並未正常handle;
kd> g FGP [VT] : [#0][IRQL=0x2](AdjustControls): Adjusting control for msr 0x481 FGP [VT] : [#0][IRQL=0x2](AdjustControls): Adjusting controls (low): 0x00000016 FGP [VT] : [#0][IRQL=0x2](AdjustControls): Adjusting controls (high): 0x0000003f FGP [VT] : [#0][IRQL=0x2](AdjustControls): Adjusting control for msr 0x483 FGP [VT] : [#0][IRQL=0x2](AdjustControls): Adjusting controls (low): 0x00036dff FGP [VT] : [#0][IRQL=0x2](AdjustControls): Adjusting controls (high): 0x003fffff FGP [VT] : [#0][IRQL=0x2](AdjustControls): Adjusting control for msr 0x484 FGP [VT] : [#0][IRQL=0x2](AdjustControls): Adjusting controls (low): 0x000011ff FGP [VT] : [#0][IRQL=0x2](AdjustControls): Adjusting controls (high): 0x0000f3ff FGP [VT] : [#0][IRQL=0x2](Virtualize): CPU: 0xFFFF8C02E51DFF60 FGP [VT] : [#0][IRQL=0x2](Virtualize): rsp: 0xffffd40acb595ac8 FGP [VT] : [#0][IRQL=0x2](ResumeGuest): Resuming guest...
繼續看intel手冊的說明,從 “Table 24-7. Definitions of Secondary Processor-Based VM-Execution Controls” 發現如下關鍵資訊:
如果第20位設定為0,任何執行xsaves的指令都會導致#UD(undefined);
回到setupvmcs函式,vmwrite的時候把這位設定為1即可:
3、繼續執行時,又遇到bug,日誌如下:
kd> g FGP [VT] : [#0][IRQL=0x0](DriverEntry): Dirver is StartFGP [VT] : [#0][IRQL=0x0](DriverEntry): Dirver is Start FGP [VT] : [#0][IRQL=0x0](VtStart): virtualizing 1 processors ... FGP [VT] : [#0][IRQL=0x0](VtStart): Allocated g_cpus array @ 0xffff8c02e3f85370, size=0x8 FGP [VT] : [#0][IRQL=0x2](SetupVMX): VMXON region size: 0x0 FGP [VT] : [#0][IRQL=0x2](SetupVMX): VMX revision ID: 0x1 FGP [VT] : [#0][IRQL=0x2](SetupVMX): VMON 記憶體虛擬地址 ffffa20189ea0000 FGP [VT] : [#0][IRQL=0x2](SetupVMX): VMON 實體地址 7c0e8000 FGP [VT] : [#0][IRQL=0x2](SetupVMX): VMCS 記憶體虛擬地址 ffffa20189ea6000 FGP [VT] : [#0][IRQL=0x2](SetupVMX): VMCS 實體地址 7c085000 FGP [VT] : [#0][IRQL=0x2](SetupVMCS): GuestRsp=FFFFD40ACA21BB28 FGP [VT] : [#0][IRQL=0x2](SetupVMCS): VMCS PHYSICAL_ADDRESS 7c085000 FGP [VT] : [#0][IRQL=0x2](AdjustControls): Adjusting control for msr 0x481 FGP [VT] : [#0][IRQL=0x2](AdjustControls): Adjusting controls (low): 0x00000016 FGP [VT] : [#0][IRQL=0x2](AdjustControls): Adjusting controls (high): 0x0000003f FGP [VT] : [#0][IRQL=0x2](AdjustControls): Adjusting control for msr 0x483 FGP [VT] : [#0][IRQL=0x2](AdjustControls): Adjusting controls (low): 0x00036dff FGP [VT] : [#0][IRQL=0x2](AdjustControls): Adjusting controls (high): 0x003fffff FGP [VT] : [#0][IRQL=0x2](AdjustControls): Adjusting control for msr 0x484 FGP [VT] : [#0][IRQL=0x2](AdjustControls): Adjusting controls (low): 0x000011ff FGP [VT] : [#0][IRQL=0x2](AdjustControls): Adjusting controls (high): 0x0000f3ff FGP [VT] : [#0][IRQL=0x2](Virtualize): CPU: 0xFFFF8C02E2CE7F60 FGP [VT] : [#0][IRQL=0x2](Virtualize): rsp: 0xffffd40aca21bac8 FGP [VT] : [#0][IRQL=0x2](HandleVmExit): Exit code: 16 FGP [VT] : [#0][IRQL=0x2](HandleRdtsc): vmx: HandleRdtsc(): rax = 0x0, rdx = 0x80000003 FGP [VT] : [#0][IRQL=0x2](HandleVmExit): Exit code: 28 FGP [VT] : [#0][IRQL=0x2](HandleCrAccess): HandleCrAccess: pExitQualification->ControlRegister = 3 FGP [VT] : [#0][IRQL=0x2](HandleVmExit): Exit code: 10 FGP [VT] : [#0][IRQL=0x2](HandleCpuid): vmx: HandleCpuid(): guest_rip = 0xfffff800bf62a4b4 FGP [VT] : [#0][IRQL=0x2](HandleVmExit): Exit code: 10 FGP [VT] : [#0][IRQL=0x2](HandleCpuid): vmx: HandleCpuid(): guest_rip = 0xfffff800bf63bb39 FGP [VT] : [#0][IRQL=0x2](HandleVmExit): Exit code: 10 FGP [VT] : [#0][IRQL=0x2](HandleCpuid): vmx: HandleCpuid(): guest_rip = 0xfffff800bf63bada FGP [VT] : [#0][IRQL=0x2](HandleVmExit): Exit code: 31 FGP [VT] : [#0][IRQL=0x2](HandleMsrRead): vmx: HandleMsrRead(): msr = 0x1b, Msr.LowPart = 0xfee00d00, Msr.HighPart = 0x0 FGP [VT] : [#0][IRQL=0x2](HandleVmExit): Exit code: 31 FGP [VT] : [#0][IRQL=0x2](HandleMsrRead): vmx: HandleMsrRead(): msr = 0x1b, Msr.LowPart = 0xfee00d00, Msr.HighPart = 0x0 FGP [VT] : [#0][IRQL=0x2](HandleVmExit): Exit code: 31 FGP [VT] : [#0][IRQL=0x2](HandleMsrRead): vmx: HandleMsrRead(): msr = 0x1b, Msr.LowPart = 0xfee00d00, Msr.HighPart = 0x0 FGP [VT] : [#0][IRQL=0x2](HandleVmExit): Exit code: 31 FGP [VT] : [#0][IRQL=0x2](HandleMsrRead): vmx: HandleMsrRead(): msr = 0x40000105, Msr.LowPart = 0x0, Msr.HighPart = 0x80000000 FGP [VT] : [#0][IRQL=0x2](HandleVmExit): Exit code: 32 FGP [VT] : [#0][IRQL=0x2](HandleMsrWrite): vmx: HandleMsrWrite(): msr = 0x40000100, rax = 0x7f, rdx = 0x0 FGP [VT] : [#0][IRQL=0x2](HandleVmExit): Exit code: 32 FGP [VT] : [#0][IRQL=0x2](HandleMsrWrite): vmx: HandleMsrWrite(): msr = 0x40000101, rax = 0x8, rdx = 0x0 FGP [VT] : [#0][IRQL=0x2](HandleVmExit): Exit code: 32 FGP [VT] : [#0][IRQL=0x2](HandleMsrWrite): vmx: HandleMsrWrite(): msr = 0x40000102, rax = 0xc184de70, rdx = 0xfffff800 FGP [VT] : [#0][IRQL=0x2](HandleVmExit): Exit code: 32 FGP [VT] : [#0][IRQL=0x2](HandleMsrWrite): vmx: HandleMsrWrite(): msr = 0x40000103, rax = 0x10001f, rdx = 0x0 FGP [VT] : [#0][IRQL=0x2](HandleVmExit): Exit code: 32 FGP [VT] : [#0][IRQL=0x2](HandleMsrWrite): vmx: HandleMsrWrite(): msr = 0x40000104, rax = 0xbfe2bc98, rdx = 0xfffff800 FGP [VT] : [#0][IRQL=0x2](HandleVmExit): Exit code: 32 FGP [VT] : [#0][IRQL=0x2](HandleMsrWrite): vmx: HandleMsrWrite(): msr = 0x40000105, rax = 0x0, rdx = 0x80000000
這次虛擬機器卡死,點選滑鼠沒任何反應;wingbd顯示running,但斷不下來,感覺也是卡死狀態;從最後一行日誌看,guestOS正在往0x40000105號MAR暫存器寫資料,遂google一番,找到了部分原因(https://git.qemu.org/?p=qemu.git;a=blob_plain;f=docs/hyperv.txt;hb=master):
“write to HV_X64_MSR_CRASH_CTL causes guest to shutdown. This effectively blocks crash dump generation by Windows”
往MSR的0x40000105暫存器寫會導致guest shutdown,現在需要進一步排查是什麼原因使得guestOS往MSR的0x40000105寫資料!在逐行除錯程式碼、對比其他VT框架後,終於發現了這份miniVT程式碼的坑:
- 進入VMM後沒有關中斷,這時如果被打斷,因為VMCS已經設定了HOST_IDTR_BASE(用的還是guestOS的中斷向量),所以會跳轉到終端歷程這裡,此時會打亂堆疊的平衡,導致棧上儲存的暫存器值錯亂;
- 在棧中儲存guestOS的暫存器上下文,rsp未正確儲存;
換成PFHook的寫法後正常了;
4、(1)多核關鍵程式碼:遍歷每個核,每個核單獨設定所需記憶體,不同核千萬不能共享同一塊儲存資料的記憶體
NTSTATUS StartVirtualTechnology() { Asm_int3(); KeInitializeMutex(&g_GlobalMutex,0);//初始化互斥體 KeWaitForMutexObject(&g_GlobalMutex,Executive,KernelMode,FALSE,0); g_Pml4 = EptInitialization(); for (int i = 0;i<KeNumberProcessors;i++) { KeSetSystemAffinityThread((KAFFINITY)(1 << i));//指定哪個CPU運行當前執行緒的程式碼 SetupVT(); // 設定VT,每個核單獨分配VMXON和VMCS區域需要的記憶體,不同核千萬不能共享同一塊記憶體,否則藍屏宕機 KeRevertToUserAffinityThread();//恢復到原來正在跑的執行緒 } KeReleaseMutex(&g_GlobalMutex, FALSE); KdPrint(("VT Engine has been loaded!\n")); return STATUS_SUCCESS; }
(2)設定VMCS需要注意的點:vmlaunch後進入guestOS執行,但是這裡的目的是除錯,不需要額外執行任何程式碼直接回到下面的push EntryRflags繼續執行;
這裡儲存通用暫存器都沒用棧,而是在資料段單獨開闢的空間,避免了guestRSP被改動核破壞;
Asm_RunToVMCS Proc
mov rax,[rsp]
mov GuestReturn,rax ;獲取返回地址,讓vmlaunch後客戶機繼續執行驅動載入的程式碼
call SetupVMCS ;這個函式填充VMCS結構體,然後直接vmlaunch,隨後繼續回到Asm_SetupVMCS的push EntryRflags程式碼執行(這時已guestOS身份執行)
ret
Asm_RunToVMCS Endp
Asm_SetupVMCS Proc ;在SetupVT中最先被呼叫
cli ;關中斷,避免被打斷產生函式呼叫,棧被破壞
mov GuestRSP,rsp ;vmlaunch後rsp從這裡開始讀資料
mov EntryRAX,rax ;設定VMCS結構體在函式中,會改變暫存器的值,這裡先儲存好。因為棧會變動,所以這裡不用棧,而是在資料段儲存
mov EntryRCX,rcx
mov EntryRDX,rdx
mov EntryRBX,rbx
mov EntryRSP,rsp
mov EntryEBP,rbp
mov EntryESI,rsi
mov EntryRDI,rdi
mov EntryR8,r8
mov EntryR9,r9
mov EntryR10,r10
mov EntryR11,r11
mov EntryR12,r12
mov EntryR13,r13
mov EntryR14,r14
mov EntryR15,r15
pushfq
pop EntryRflags
call Asm_RunToVMCS ;從上面繞一圈,打個岔,目的是儲存下一行程式碼的地址,vmlanuch後guest繼續從這裡開始執行
push EntryRflags ;看上面,這行程式碼的地址會賦給GuestReturn,vmlanuch後guest繼續從這裡開始執行
popfq
mov rax,EntryRAX ;恢復暫存器的值
mov rcx,EntryRCX
mov rdx,EntryRDX
mov rbx,EntryRBX
mov rsp,EntryRSP
mov rbp,EntryEBP
mov rsi,EntryESI
mov rdi,EntryRDI
mov r8,EntryR8
mov r9,EntryR9
mov r10,EntryR10
mov r11,EntryR11
mov r12,EntryR12
mov r13,EntryR13
mov r14,EntryR14
mov r15,EntryR15
mov rsp,GuestRSP
sti
ret
Asm_SetupVMCS Endp
(3)https://github.com/zhuhuibeishadiao 這裡有完整的程式碼,建議先看看miniVT64,除錯除錯,熟悉程式碼框架和流程後繼續除錯PFHook
經驗總結:
1、剛開始除錯時建議把虛擬機器改成單處理和單核,否則多核CPU同時執行,會執行不同的程式碼,除錯時感覺到處跳躍,不按順序執行。
2、DbgPrint不要列印太多,比如在msr讀寫的時候列印,會造成日誌刷屏,虛擬機器卡死的假象(實際上windbg還能斷下,說明並未宕機)
參考:1、https://github.com/zhuhuibeishadiao miniVT和PF_HOOk程式碼
2、https://github.com/calware/HV-Playground 彙集了各個VT框架
3、https://www.felixcloutier.com/x86/invvpid invvpid指令介紹
4、https://msrc-blog.microsoft.com/2018/12/10/first-steps-in-hyper-v-research/ First Steps in Hyper-V Research