RAC 一次掉盤導致叢集故障
業務反饋,兩臺主機上面的資料庫都宕機了,採用的儲存是資料檔案方式,不是ASM。
上去先檢視叢集狀態。
[[email protected] ~]$ crsctl stat res -t -init --可以看到叢集管理的資源狀態都是offline狀態。
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 OFFLINE OFFLINE Instance Shutdown
ora.cluster_interconnect.haip
1 ONLINE OFFLINE
ora.crf
1 ONLINE ONLINE cxcsdb01
ora.crsd
1 ONLINE OFFLINE
ora.cssd
1 ONLINE OFFLINE STARTING
ora.cssdmonitor
1 ONLINE ONLINE cxcsdb01
ora.ctssd
1 ONLINE OFFLINE
ora.diskmon
1 OFFLINE OFFLINE
ora.evmd
1 ONLINE OFFLINE
ora.gipcd
1 ONLINE ONLINE cxcsdb01
ora.gpnpd
1 ONLINE ONLINE cxcsdb01
ora.mdnsd
1 ONLINE ONLINE cxcsdb01
[[email protected] ~]$ ps -ef | grep crs --可以看到crsd.bin這個程序是沒有起來的
grid 33095 30418 0 10:26 pts/2 00:00:00 grep --color=auto crs
[[email protected] ~]$ ps -ef | grep css
root 30844 1 0 10:24 ? 00:00:00 /opt/oracle/11.2.0.4/grid/bin/cssdmonitor
root 30856 1 0 10:24 ? 00:00:00 /opt/oracle/11.2.0.4/grid/bin/cssdagent
grid 30868 1 0 10:24 ? 00:00:00 /opt/oracle/11.2.0.4/grid/bin/ocssd.bin
grid 33129 30418 0 10:26 pts/2 00:00:00 grep --color=auto css
[[email protected] ~]$ ps -ef | grep ohasd
root 1513 1 0 Oct17 ? 00:00:00 /bin/sh /etc/init.d/init.ohasd run >/dev/null 2>&1 Type=simple
root 4266 1 0 10:04 ? 00:00:07 /opt/oracle/11.2.0.4/grid/bin/ohasd.bin reboot
grid 33254 30418 0 10:26 pts/2 00:00:00 grep --color=auto ohasd
去看css的相關日誌
[[email protected] cssd]$ tail -f ocssd.log --紅色部分可以看到掉盤了
............................................................................................
2018-10-18 10:21:56.163: [ CSSD][2202380032]clssnmReadDiscoveryProfile: voting file discovery string(/crsdata/votedisk/votedata01/votedata01,/crsdata/votedisk/votedata02/votedata02,/crsdata/votedisk/votedata03/votedata03)
2018-10-18 10:21:56.163: [ CSSD][2202380032]clssnmvDDiscThread: using discovery string /crsdata/votedisk/votedata01/votedata01,/crsdata/votedisk/votedata02/votedata02,/crsdata/votedisk/votedata03/votedata03 for initial discovery
2018-10-18 10:21:56.163: [ SKGFD][2202380032]Discovery with str:/crsdata/votedisk/votedata01/votedata01,/crsdata/votedisk/votedata02/votedata02,/crsdata/votedisk/votedata03/votedata03:
2018-10-18 10:21:56.163: [ SKGFD][2202380032]UFS discovery with :/crsdata/votedisk/votedata01/votedata01:
2018-10-18 10:21:56.163: [ SKGFD][2202380032]Execute glob on the string /crsdata/votedisk/votedata01/votedata01
2018-10-18 10:21:56.164: [ SKGFD][2202380032]OSS discovery with :/crsdata/votedisk/votedata01/votedata01:
2018-10-18 10:21:56.164: [ SKGFD][2202380032]Discovery advancing to nxt string :/crsdata/votedisk/votedata02/votedata02:
2018-10-18 10:21:56.164: [ SKGFD][2202380032]UFS discovery with :/crsdata/votedisk/votedata02/votedata02:
2018-10-18 10:21:56.164: [ SKGFD][2202380032]Execute glob on the string /crsdata/votedisk/votedata02/votedata02
2018-10-18 10:21:56.164: [ SKGFD][2202380032]OSS discovery with :/crsdata/votedisk/votedata02/votedata02:
2018-10-18 10:21:56.164: [ SKGFD][2202380032]Discovery advancing to nxt string :/crsdata/votedisk/votedata03/votedata03:
2018-10-18 10:21:56.164: [ SKGFD][2202380032]UFS discovery with :/crsdata/votedisk/votedata03/votedata03:
2018-10-18 10:21:56.164: [ SKGFD][2202380032]Execute glob on the string /crsdata/votedisk/votedata03/votedata03
2018-10-18 10:21:56.164: [ SKGFD][2202380032]OSS discovery with :/crsdata/votedisk/votedata03/votedata03:
2018-10-18 10:21:56.164: [ CSSD][2202380032]clssnmvDiskVerify: Successful discovery of 0 disks
2018-10-18 10:21:56.164: [ CSSD][2202380032]clssnmCompleteInitVFDiscovery: Completing initial voting file discovery
2018-10-18 10:21:56.164: [ CSSD][2202380032]clssnmvFindInitialConfigs: No voting files found
2018-10-18 10:21:56.164: [ CSSD][2202380032](:CSSNM00070:)clssnmCompleteInitVFDiscovery: Voting file not found. Retrying discovery in 15 seconds
2018-10-18 10:21:56.478: [ CSSD][2204923648]clssgmExecuteClientRequest(): type(37) size(80) only connect and exit messages are allowed before lease acquisition proc(0x7f6278060880) client((nil))
......................................................................................................................................................
和業務確認在主機上面/crsdata檔案系統確實不存在了,業務掛上盤之後,叢集自動拉起。
[[email protected] ~]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda5 474G 35G 439G 8% /
devtmpfs 126G 0 126G 0% /dev
tmpfs 126G 0 126G 0% /dev/shm
tmpfs 126G 27M 126G 1% /run
tmpfs 126G 0 126G 0% /sys/fs/cgroup
/dev/sda3 20G 54M 20G 1% /home
/dev/sda1 497M 166M 332M 34% /boot
tmpfs 4.0K 0 4.0K 0% /dev/vx
tmpfs 26G 0 26G 0% /run/user/50008
tmpfs 26G 0 26G 0% /run/user/50007
tmpfs 26G 0 26G 0% /run/user/1000
/dev/vx/dsk/crsdg/crsvol 14G 106M 14G 1% /crsdata
/dev/vx/dsk/archdg/archvol 199G 2.7G 195G 2% /archive
/dev/vx/dsk/oradg/oravol01 1000G 554G 443G 56% /oradata01
[[email protected] ~]$ crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 OFFLINE OFFLINE Instance Shutdown
ora.cluster_interconnect.haip
1 ONLINE ONLINE cxcsdb01
ora.crf
1 ONLINE ONLINE cxcsdb01
ora.crsd
1 ONLINE ONLINE cxcsdb01
ora.cssd
1 ONLINE ONLINE cxcsdb01
ora.cssdmonitor
1 ONLINE ONLINE cxcsdb01
ora.ctssd
1 ONLINE ONLINE cxcsdb01 OBSERVER
ora.diskmon
1 OFFLINE OFFLINE
ora.evmd
1 ONLINE ONLINE cxcsdb01
ora.gipcd
1 ONLINE ONLINE cxcsdb01
ora.gpnpd
1 ONLINE ONLINE cxcsdb01
ora.mdnsd
1 ONLINE ONLINE cxcsdb01