一次linux啟動故障記錄
故障背景:
在2.6.32升級核心之後,出現多臺裝置啟動失敗,失敗的全部都是ssd作為系統盤的機器,bios引導之後,螢幕就黑了,沒有列印。
一開是以為是mbr損壞了,所以將啟動盤掛載到其他伺服器上,結果發現mbr和升級之前備份的mbr是一樣的,而且和升級後能正常啟動的mbr也是一樣的。
排查到此,沒能繼續跟蹤,找專業的os團隊同事蒙恩排查,結論記錄如下:
由於使用的是grub作為載入程式,mbr中的扇區位置,找不到stage2檔案。
過程:
1.把現場的boot.bak和mbr.bak拿回來搭建了環境,引導核心,引導不起來,由於虛擬機器bios有里程碑列印,確定bios已經載入到mbr了。
2.確定mbr壞掉了,主要是mbr中寫入的stage2檔案開始扇區號錯了
3.打點確定升級操作沒有操作到mbr以及引導相關的幾個關鍵檔案(stage2等)
grub-install失敗的原因就是現場用了這種方式寫device map檔案,構造個如下的device.map檔案,然後用命令:"grub-install /dev/sda" (sda是系統盤)
[[email protected] /]# cat /boot/grub/device.map
(hd0) /dev/disk/by-id/ata-INTEL_SSDSC2BB240G4_BTWL4020041Z240NGN
原理記錄:
=====
系統啟動流程:MBR(/boot/grub/stage1)->/boot/grub/stage2->vmlinux MBR負責載入stage2->stage2負責載入vmlinux.
MBR /boot/grub/stage1,/boot/grub/stage2的關係如下:
stage1二進位制麼以辦法識別檔案系統,因此只能通過biso中斷,讀資料。
stage1二進位制程式被寫入MBR,stage1有幾個變數通過編譯器嚴格控制其在stage1二進位制檔案中的偏移量。其中一個最重要的變數是stage2在boot分割槽的開始扇區號,因此MBR為stage1檔案+幾個被安裝程式修改的變數+分割槽表
stage2中內建了ext系列檔案系統的支援,因此可以通過直接讀boot分割槽所在的檔案系統來載入vmlinux,grub.conf等。
上面結論的依據:
Stage 1 and Stage 2 have embedded variables whose locations are
well-defined, so that the installation can patch the binary file
directly without recompilation of the stages.
In Stage 1, these are defined:
`0x3E'
The version number (not GRUB's, but the installation mechanism's).
`0x40'
The boot drive. If it is 0xFF, use a drive passed by BIOS.
`0x41'
The flag for if forcing LBA.
`0x42'
The starting address of Stage 2.
`0x44'
The first sector of Stage 2.
`0x48'
The starting segment of Stage 2.
`0x1FE'
The signature (`0xAA55').
打點了升級patch中是否呼叫過grub一級開啟stage檔案結果如下,並沒有發現有人呼叫過grub命令(grub-install也是呼叫了grub來安裝grub的)
[[email protected] home]# ./test.stap |grep -E 'stage|grub'
open===/boot/grub/grub.conf
open===/boot/grub/sedgzxf68
open===/usr/share/icons/hicolor/22x22/apps/nm-stage03-connecting10.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage01-connecting11.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage03-connecting08.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage02-connecting08.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage02-connecting01.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage02-connecting11.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage01-connecting10.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage01-connecting04.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage02-connecting09.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage01-connecting01.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage02-connecting03.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage03-connecting11.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage01-connecting08.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage01-connecting07.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage03-connecting07.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage03-connecting03.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage02-connecting06.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage03-connecting05.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage01-connecting02.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage02-connecting07.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage02-connecting02.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage03-connecting01.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage01-connecting09.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage01-connecting06.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage03-connecting09.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage02-connecting05.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage01-connecting05.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage01-connecting03.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage02-connecting10.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage03-connecting06.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage02-connecting04.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage03-connecting04.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage03-connecting02.png
execve===>/sbin/grubby
open===/etc/grub.conf
open===../boot/grub/grub.conf-
execve===>/sbin/grubby
open===/etc/grub.conf
execve===>/sbin/grubby
open===/etc/grub.conf
open===/etc/sysconfig/grub
execve===>/sbin/grubby
open===/etc/grub.conf
open===../boot/grub/grub.conf-
open===/usr/share/icons/hicolor/22x22/apps/nm-stage03-connecting10.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage01-connecting11.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage03-connecting08.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage02-connecting08.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage02-connecting01.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage02-connecting11.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage01-connecting10.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage01-connecting04.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage02-connecting09.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage01-connecting01.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage02-connecting03.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage03-connecting11.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage01-connecting08.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage01-connecting07.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage03-connecting07.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage03-connecting03.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage02-connecting06.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage03-connecting05.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage01-connecting02.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage02-connecting07.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage02-connecting02.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage03-connecting01.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage01-connecting09.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage01-connecting06.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage03-connecting09.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage02-connecting05.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage01-connecting05.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage01-connecting03.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage02-connecting10.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage03-connecting06.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage02-connecting04.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage03-connecting04.png
open===/usr/share/icons/hicolor/22x22/apps/nm-stage03-connecting02.png
open===/boot/grub/grub.conf
open===/boot/grub/grub.conf
排查了grub-install指令碼,在指令碼中發現對device-map檔案的解析還是過於簡單,我們這種型別的device-map沒有適配,在升級之前,我們的mbr中對stage2的扇區也是錯的,
但由於這個扇區裡面存放的之前老的stage2檔案還留存著,反倒沒有問題,升級之後,boot分割槽可能因為備份的原因,裡面要覆蓋一些新的檔案,導致那個sector被分配出去了。
參考資料:
https://www.gnu.org/software/grub/manual/legacy