HP-UX 11.31雙根盤故障案例分析
Superdome SX2000服務器一臺(HPUX 11.23系統),外接MSA60根盤櫃,兩塊根盤,磁盤設備文件名分別為:c2t3d0,c3t3d0(PV Link);c2t4d0,c3t4d0(pv link).
故障描述:
其中一塊根盤c2t3d0在event log中報mdeia error,正常更換根盤後,發現lvlnboot信息無法更新(lvlnboot信息不正確,重啟或宕機後機器可能會無法啟動)。
分析過程描述:
在更換根盤前,vgdisplay -v vg00 的輸出如下:
hostname#[/]vgdisplay -v vg00
--- Volume groups ---
VG Name /dev/vg00
VG Write Access read/write
VG Status available
Max LV 255
Cur LV 10
Open LV 10
Max PV 16
Cur PV 2
Act PV 2
Max PE per PV 4356
VGDA 4
PE Size (Mbytes) 32
Total PE 8712
Alloc PE 4087
Free PE 4625
Total PVG 0
Total Spare PVs 0
Total Spare PVs in use 0
(中間lv詳細信息省略)
--- Physical volumes ---
PV Name /dev/dsk/c2t3d0s2
PV Name /dev/dsk/c3t3d0s2 Alternate Link
PV Status available
Total PE 4356
Free PE 4356
Autoswitch On
Proactive Polling On
PV Name /dev/dsk/c3t4d0s2
PV Name /dev/dsk/c2t4d0s2 Alternate Link
PV Status available
Total PE 4356
Free PE 269
Autoswitch On
Proactive Polling On
由上述輸出可以看出,vg00總共包括兩塊pv:c2t3d0s2,c3t3d0s2(pvlink)和c3t4d0s2,c2t4d0s2(pvlink).現在由於
c2t3d0有media error,所以要將其換掉。
在更換根盤前,lvlnboot的輸出如下,不知各位有沒有發現是否有異常呢?
hostname#[/]lvlnboot -v
Current path "/dev/dsk/c3t3d0s2" is an alternate link, skip.
Current path "/dev/dsk/c2t4d0s2" is an alternate link, skip.
Boot Definitions for Volume Group /dev/vg00:
Physical Volumes belonging in Root Volume Group:
/dev/dsk/c2t3d0s2 (0/0/13/0/0/0/0.0.0.3.0) -- Boot Disk
/dev/dsk/c3t3d0s2 (0/0/2/0/0/0/0.0.0.3.0)
/dev/dsk/c3t4d0s2 (0/0/2/0/0/0/0.0.0.4.0) //此處應該也有Boot Disk才對
/dev/dsk/c2t4d0s2 (0/0/13/0/0/0/0.0.0.4.0)
Boot: lvol1 on: /dev/dsk/c2t3d0s2
/dev/dsk/c3t3d0s2
/dev/dsk/c3t4d0s2
/dev/dsk/c2t4d0s2
Root: lvol3 on: /dev/dsk/c2t3d0s2
/dev/dsk/c3t3d0s2
/dev/dsk/c3t4d0s2
/dev/dsk/c2t4d0s2
Swap: lvol2 on: /dev/dsk/c2t3d0s2
/dev/dsk/c3t3d0s2
/dev/dsk/c3t4d0s2
/dev/dsk/c2t4d0s2
Dump: lvol2 on: /dev/dsk/c2t3d0s2, 0
接下來便是正常的更換根盤步驟,填充EFI,鏡像lv等。新根盤設備文件名為:c2t6d0,c3t6d0(pv link);換完根盤後的vg00所包含的pv信息如下:
--- Physical volumes ---
PV Name /dev/dsk/c3t4d0s2
PV Name /dev/dsk/c2t4d0s2 Alternate Link
PV Status available
Total PE 4356
Free PE 269
Autoswitch On
Proactive Polling On
PV Name /dev/dsk/c3t6d0s2
PV Name /dev/dsk/c2t6d0s2 Alternate Link
PV Status available
Total PE 4356
Free PE 4356
Autoswitch On
Proactive Polling On
至此,更換根盤的過程就已經結束了,該是執行lvlnboot -R的時候了,在執行lvlnboot -R前lvlnboot的
輸出如下(此時,已經可以看出c3t4d0這快盤的結構有問題了):
hostname#[/]lvlnboot -v
Current path "/dev/dsk/c2t4d0s2" is an alternate link, skip.
Current path "/dev/dsk/c2t6d0s2" is an alternate link, skip.
Boot Definitions for Volume Group /dev/vg00:
Physical Volumes belonging in Root Volume Group:
/dev/dsk/c3t4d0s2 (0/0/2/0/0/0/0.0.0.4.0)
/dev/dsk/c2t4d0s2 (0/0/13/0/0/0/0.0.0.4.0)
/dev/dsk/c3t6d0s2 (0/0/2/0/0/0/0.0.0.6.0) -- Boot Disk
/dev/dsk/c2t6d0s2 (0/0/13/0/0/0/0.0.0.6.0)
No Boot Logical Volume configured
No Root Logical Volume configured
No Swap Logical Volume configured
No Dump Logical Volume configured
上面的輸出說明此時的操作系統沒有有效的lvlnboot信息,swap,dump,root,boot等lv均未定義,如果此時宕機或重啟,則機器肯定無法啟動,如果沒有備份,可能需要重新安裝操作系統!
在執行完lvlnboot -R後,依然無法更新lvlnboot信息。分別執行lvlnboot -r;lvlnboot -b等信息均報錯,輸出如下:
hostname#[/]lvlnboot -r /dev/vg00/lvol3
lvlnboot: Physical Volume "/dev/dsk/c3t4d0s2" on which Logical
Volume "/dev/vg00/lvol3" resides is not a Boot Physical Volume.
hostname#[/]lvlnboot -d /dev/vg00/lvol2
lvlnboot: A Root Logical Volume must be assigned before
a Dump or Swap Logical Volume can be assigned.
hostname#[/]lvlnboot -s /dev/vg00/lvol2
lvlnboot: A Root Logical Volume must be assigned before
a Dump or Swap Logical Volume can be assigned.
hostname#[/]lvlnboot -R
Volume Group configuration for /dev/vg00 has been saved in /etc/lvmconf/vg00.conf
hostname#[/]lvlnboot -v
Current path "/dev/dsk/c2t4d0s2" is an alternate link, skip.
Current path "/dev/dsk/c2t6d0s2" is an alternate link, skip.
Boot Definitions for Volume Group /dev/vg00:
Physical Volumes belonging in Root Volume Group:
/dev/dsk/c3t4d0s2 (0/0/2/0/0/0/0.0.0.4.0)
/dev/dsk/c2t4d0s2 (0/0/13/0/0/0/0.0.0.4.0)
/dev/dsk/c3t6d0s2 (0/0/2/0/0/0/0.0.0.6.0) -- Boot Disk
/dev/dsk/c2t6d0s2 (0/0/13/0/0/0/0.0.0.6.0)
Root LV not yet configured !! Mirror information will not be displayed
Boot: lvol1 on: /dev/dsk/c3t6d0s2
No Root Logical Volume configured
No Swap Logical Volume configured
No Dump Logical Volume configured
試了很多次,均無法解決問題,由於此機器為客戶的生產機,且是重要業務的生產,絕對不允許宕機;後來,我認真的看了上面的其中一句話,也就是上面報錯信息中的一句:
Physical Volume "/dev/dsk/c3t4d0s2" on which Logical
Volume "/dev/vg00/lvol3" resides is not a Boot Physical Volume.
這句話的大概意思是說:lvol3這個lv所在的c3t4d0s2這個分區不是一個可啟動的PV,即不是一個有效的Boot Disk,為什麽系統不認為它是一個有效的Boot Disk呢?其實,這一點在開頭就可以看出來了,在沒有維修前lvlnboot的輸出就只有一塊Boot Disk標識(見開頭lvlnboot輸出中被標紅的字體)。
經過詳細檢查和case跟蹤,其他所有原因都排出了(lv鏡像,EFI區等),最後發現原因是因為c3t4d0這塊盤當初被pvcreate加進vg00時沒有加-B參數(即,當初把c3t4d0這塊盤加進vg00時,執行的是pvcreate /dev/rdsk/c3t4d0s2,正常的應該是
執行pvcreate -B /dev/rdsk/c3t4d0s2),未加-B參數直接導致盤上沒有BDRA區域,且操作系統不認為該盤是Boot Disk。所以lvlnboot的信息一直無法同步。
解決方法:
將c2t4d0s2和它的pv link c3t4d0s2這塊盤從vg00中剔除(剔除前需要先將所有lv的mirror從其中拆掉),重新pvreate -B,再加入vg00,再mirror vg00下的所有lv後,問題解決。
解決後,正常後的lvlnboot的輸入如下:
hostname#[/]lvlnboot -v
Current path "/dev/dsk/c2t6d0s2" is an alternate link, skip.
Current path "/dev/dsk/c2t4d0s2" is an alternate link, skip.
Boot Definitions for Volume Group /dev/vg00:
Physical Volumes belonging in Root Volume Group:
/dev/dsk/c3t6d0s2 (0/0/2/0/0/0/0.0.0.6.0) -- Boot Disk
/dev/dsk/c2t6d0s2 (0/0/13/0/0/0/0.0.0.6.0)
/dev/dsk/c3t4d0s2 (0/0/2/0/0/0/0.0.0.4.0) -- Boot Disk
/dev/dsk/c2t4d0s2 (0/0/13/0/0/0/0.0.0.4.0)
Boot: lvol1 on: /dev/dsk/c3t6d0s2
/dev/dsk/c2t6d0s2
/dev/dsk/c3t4d0s2
/dev/dsk/c2t4d0s2
Root: lvol3 on: /dev/dsk/c3t6d0s2
/dev/dsk/c2t6d0s2
/dev/dsk/c3t4d0s2
/dev/dsk/c2t4d0s2
Swap: lvol2 on: /dev/dsk/c3t6d0s2
/dev/dsk/c2t6d0s2
/dev/dsk/c3t4d0s2
/dev/dsk/c2t4d0s2
Dump: lvol2 on: /dev/dsk/c3t6d0s2, 0
通過上面標紅的字體,大家可以看出,此時系統均已將兩塊盤標識為Boot Disk,同事lvlnboot的信息也恢復正常。
至此,故障處理結束。
HP-UX 11.31雙根盤故障案例分析