1. 程式人生 > >crs只能啟動一個asm例項

crs只能啟動一個asm例項

今天接到一個朋友的電話,說他有個客戶rac安裝的時候總是有問題。cluster軟體已經裝上,但是沒法裝資料庫。由於網路環境比較差,無法遠端,只能通過QQ來了解情況和診斷了。

一開始,先讓對方執行crs_stat -t看看各個資源的狀況:

[[email protected] bin]$ ./crs_stat -t
Name           Type           Target    State     Host       
------------------------------------------------------------
ora....SM1.asm application    ONLINE    ONLINE    rac01       
ora....01.lsnr application    ONLINE    ONLINE    rac01       
ora.rac01.gsd  application    ONLINE    ONLINE    rac01       
ora.rac01.ons  application    ONLINE    ONLINE    rac01       
ora.rac01.vip  application    ONLINE    ONLINE    rac01       
ora....SM2.asm application    ONLINE    OFFLINE               
ora....02.lsnr application    ONLINE    ONLINE    rac02       
ora.rac02.gsd  application    ONLINE    ONLINE    rac02       
ora.rac02.ons  application    ONLINE    ONLINE    rac02       
ora.rac02.vip  application    ONLINE    ONLINE    rac02

發現在rac02上asm沒起來,並且通過ps -ef 看asm的程序也不存在:

[[email protected] etc]# ps -ef|grep asm
oracle   21691 15798  0 11:11 pts/1    00:00:00 more /u01/app/oracle/diag/asm/+asm/+ASM2/trace/+ASM2_pmon_12899.trc
root     21892 19317  0 11:11 pts/3    00:00:00 grep asm

由於很多時候,特別是在虛擬機器中,crs_start啟動總是會有點問題,一般只要重啟,都會解決該問題,於是嘗試重啟crs,用crs_stop -all和crs_start -all重啟。

在啟動的時候,報錯了:

[[email protected] bin]$ ./crs_start -all
Attempting to start `ora.rac01.vip` on member `rac01`
Attempting to start `ora.rac02.vip` on member `rac02`
Attempting to start `ora.rac02.ASM2.asm` on member `rac02`
Attempting to start `ora.rac01.ASM1.asm` on member `rac01`
Start of `ora.rac01.vip` on member `rac01` succeeded.
Start of `ora.rac02.vip` on member `rac02` succeeded.
Attempting to start `ora.rac01.LISTENER_RAC01.lsnr` on member `rac01`
Attempting to start `ora.rac02.LISTENER_RAC02.lsnr` on member `rac02`
Start of `ora.rac01.LISTENER_RAC01.lsnr` on member `rac01` succeeded.
Start of `ora.rac02.LISTENER_RAC02.lsnr` on member `rac02` succeeded.
 
 
        Start of `ora.rac01.ASM1.asm` on member `rac01` failed.
rac02 : CRS-1019: Resource ora.rac01.ASM1.asm (application) cannot run on rac02
 
 
Start of `ora.rac02.ASM2.asm` on member `rac02` succeeded.
Attempting to start `ora.rac01.gsd` on member `rac01`
Attempting to start `ora.rac01.ons` on member `rac01`
CRS-1002: Resource 'ora.rac02.ons' is already running on member 'rac02'
 
Attempting to start `ora.rac02.gsd` on member `rac02`
Start of `ora.rac01.gsd` on member `rac01` succeeded.
Start of `ora.rac02.gsd` on member `rac02` succeeded.
Start of `ora.rac01.ons` on member `rac01` succeeded.
CRS-0215: Could not start resource 'ora.rac01.ASM1.asm'.
 
CRS-0223: Resource 'ora.rac02.ons' has placement error.

上面的報錯中,關鍵的一句還是:rac02 : CRS-1019: Resource ora.rac01.ASM1.asm (application) cannot run on rac02。檢查crs_stat -t,發現:

[[email protected] bin]$ ./crs_stat -t
Name           Type           Target    State     Host       
------------------------------------------------------------
ora....SM1.asm application    ONLINE    OFFLINE               
ora....01.lsnr application    ONLINE    ONLINE    rac01       
ora.rac01.gsd  application    ONLINE    ONLINE    rac01       
ora.rac01.ons  application    ONLINE    ONLINE    rac01       
ora.rac01.vip  application    ONLINE    ONLINE    rac01       
ora....SM2.asm application    ONLINE    ONLINE    rac02       
ora....02.lsnr application    ONLINE    ONLINE    rac02       
ora.rac02.gsd  application    ONLINE    ONLINE    rac02       
ora.rac02.ons  application    ONLINE    ONLINE    rac02       
ora.rac02.vip  application    ONLINE    ONLINE    rac02

問題似乎是asm例項只能在一個節點上啟動,要去看看asm的log了。

到asm的bdump下發現:
[[email protected] admin]$ cd +ASM
[[email protected] +ASM]$ ls
hdump  pfile
[[email protected] +ASM]$ ll
total 8
drwxr-x--- 2 oracle oinstall 4096 Oct  9 10:30 hdump
drwxr-x--- 2 oracle oinstall 4096 Oct  9 10:30 pfile
[[email protected] +ASM]$

沒有asm的bdump的log?!那就是似乎還沒到crs去拉起asm例項的那一步了。於是繼續往上追溯,去看看crs的log:

Oracle Database 11g CRS Release 11.1.0.6.0 - Production Copyright 1996, 2007 Oracle. All rights reserved.
2010-10-09 10:03:29.059: [ default][4277080288] CRS Daemon Starting
2010-10-09 10:03:29.060: [ CRSMAIN][4277080288] Checking the OCR device
2010-10-09 10:03:29.079: [ CRSMAIN][4277080288] Connecting to the CSS Daemon
2010-10-09 10:03:29.080: [ COMMCRS][1107462464]clsc_connect: (0x1c7d86e0) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac01_))
2010-10-09 10:03:29.081: [ CSSCLNT][4277080288]clsssInitNative: failed to connect to (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac01_)), rc 9
2010-10-09 10:03:29.081: [  CRSRTI][4277080288] CSS is not ready. Received status 3 from CSS. Waiting for good status ..
2010-10-09 10:03:30.082: [ COMMCRS][1107462464]clsc_connect: (0x1c7d86c0) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac01_))
2010-10-09 10:03:30.083: [ CSSCLNT][4277080288]clsssInitNative: failed to connect to (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac01_)), rc 9
2010-10-09 10:03:30.083: [  CRSRTI][4277080288] CSS is not ready. Received status 3 from CSS. Waiting for good status ..
2010-10-09 10:03:31.084: [ COMMCRS][1107462464]clsc_connect: (0x1c7d86c0) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac01_))
2010-10-09 10:03:31.084: [ CSSCLNT][4277080288]clsssInitNative: failed to connect to (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac01_)), rc 9
2010-10-09 10:03:31.084: [  CRSRTI][4277080288] CSS is not ready. Received status 3 from CSS. Waiting for good status ..
2010-10-09 10:06:52.786: [ CRSMAIN][4277080288] CRSD running as the Privileged user
2010-10-09 10:06:52.823: [  CLSVER][4277080288] Active Version from OCR:10.1.0.2.0
2010-10-09 10:06:52.823: [  CLSVER][4277080288] Active Version is less than Software Version
2010-10-09 10:06:52.823: [ CSSCLNT][4277080288]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
2010-10-09 10:06:53.824: [ CSSCLNT][4277080288]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
2010-10-09 10:06:54.825: [ CSSCLNT][4277080288]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
2010-10-09 10:06:55.826: [ CSSCLNT][4277080288]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
2010-10-09 10:06:56.827: [ CSSCLNT][4277080288]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
2010-10-09 10:06:57.828: [ CSSCLNT][4277080288]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
2010-10-09 10:06:58.829: [ CSSCLNT][4277080288]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
2010-10-09 10:06:59.831: [ CSSCLNT][4277080288]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
2010-10-09 10:07:00.831: [ CSSCLNT][4277080288]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
2010-10-09 10:07:01.832: [ CSSCLNT][4277080288]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
2010-10-09 10:07:02.833: [ CSSCLNT][4277080288]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retr
2010-10-09 10:07:02.833: [ CSSCLNT][4277080288]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
2010-10-09 10:07:03.834: [ CSSCLNT][4277080288]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
2010-10-09 10:07:04.835: [ CSSCLNT][4277080288]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
2010-10-09 10:07:05.836: [ CSSCLNT][4277080288]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
2010-10-09 10:07:06.837: [ CSSCLNT][4277080288]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
2010-10-09 10:07:07.838: [ CSSCLNT][4277080288]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
2010-10-09 10:07:08.839: [ CSSCLNT][4277080288]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
2010-10-09 10:07:09.840: [ CSSCLNT][4277080288]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
2010-10-09 10:07:10.841: [ CSSCLNT][4277080288]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
2010-10-09 10:07:11.842: [ CSSCLNT][4277080288]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
2010-10-09 10:07:12.843: [ CSSCLNT][4277080288]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
2010-10-09 10:07:13.844: [ CSSCLNT][4277080288]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
2010-10-09 10:07:14.845: [ CSSCLNT][4277080288]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
2010-10-09 10:07:15.846: [ CSSCLNT][4277080288]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
2010-10-09 10:07:16.847: [ CSSCLNT][4277080288]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
2010-10-09 10:07:17.848: [ CSSCLNT][4277080288]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
2010-10-09 10:07:18.849: [ CSSCLNT][4277080288]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
2010-10-09 10:07:19.850: [ CSSCLNT][4277080288]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
2010-10-09 10:07:20.851: [ CSSCLNT][4277080288]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
2010-10-09 10:07:21.852: [ CSSCLNT][4277080288]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
2010-10-09 10:07:22.853: [ CSSCLNT][4277080288]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
2010-10-09 10:07:23.854: [ CSSCLNT][4277080288]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
2010-10-09 10:07:24.855: [ CSSCLNT][4277080288]clssgsGroupJoin: CSS has not reached fatal mode.Registration is not yet safe. Retrying
2010-10-09 10:07:40.874: [  CLSVER][4277080288] Registered in CSS group crs_version
2010-10-09 10:07:40.874: [ CRSMAIN][4277080288] Initializing OCR
2010-10-09 10:07:40.875: [  CLSVER][1117952320] Monitoring the crs_version group for AV change notification
2010-10-09 10:07:40.875: [  CLSVER][1117952320] Doing grpstat on crs_version group
2010-10-09 10:07:40.875: [  CLSVER][1117952320] Returned from grpstat with event 1
2010-10-09 10:07:40.875: [  CLSVER][1117952320] Doing grpstat on crs_version group
2010-10-09 10:07:40.911: [  OCRRAW][4277080288]proprioo: for disk 0 (/dev/sdd1), id match (1), my id set (1084942139,1028247821) total id sets (1), 1st set (1084942139,1028247821), 2nd set (0,0) my votes (2), total votes (2)
2010-10-09 10:07:41.074: [    CRSD][4277080288] ENV Logging level for Module: allcomp  0
2010-10-09 10:07:41.084: [    CRSD][4277080288] ENV Logging level for Module: default  0
2010-10-09 10:07:41.093: [    CRSD][4277080288] ENV Logging level for Module: OCRRAW  0
2010-10-09 10:07:41.102: [    CRSD][4277080288] ENV Logging level for Module: OCROSD  0
2010-10-09 10:07:41.111: [    CRSD][4277080288] ENV Logging level for Module: OCRCAC  0
2010-10-09 10:07:41.121: [    CRSD][4277080288] ENV Logging level for Module: COMMCRS  0
2010-10-09 10:07:41.130: [    CRSD][4277080288] ENV Logging level for Module: COMMNS  0

從log上看,應該是css的錯誤了,CSS,即Cluster Synchronization Services,根據文件的意思是說:Manages the cluster configuration by controlling which nodes are members of the cluster and by notifying members when a node joins or leaves the cluster. If you are using third-party clusterware, then the css process interfaces with your clusterware to manage node membership information.主要是負責節點間的控制和通訊問題了。

嘗試ping各個節點:ping rac01沒問題,ping rac02沒問題,ping rac01-priv沒問題,ping rac02-priv沒問題;嘗試驗證互信機制,嘗試ssh rac01 date沒問題,ssh rac02 date沒問題,ssh rac01-priv date沒問題,ssh rac02-priv date也沒問題。

再次嘗試用srvctl重啟rac01上的asm,出現了很重要的報錯資訊:

[[email protected] dbs]$ srvctl start asm -n rac01
PRKS-1009 : Failed to start ASM instance "+ASM1" on node "rac01", [PRKS-1009 : Failed to start ASM instance "+ASM1" on node "rac01", [rac01:ora.rac01.ASM1.asm:
rac01:ora.rac01.ASM1.asm:SQL*Plus: Release 11.1.0.6.0 - Production on Sat Oct 9 11:58:36 2010
rac01:ora.rac01.ASM1.asm:
rac01:ora.rac01.ASM1.asm:Copyright (c) 1982, 2007, Oracle.  All rights reserved.
rac01:ora.rac01.ASM1.asm:
rac01:ora.rac01.ASM1.asm:Enter user-name: Connected to an idle instance.
rac01:ora.rac01.ASM1.asm:
rac01:ora.rac01.ASM1.asm:SQL> ORA-03113: end-of-file on communication channel
rac01:ora.rac01.ASM1.asm:SQL> Disconnected
rac01:ora.rac01.ASM1.asm:
CRS-0215: Could not start resource 'ora.rac01.ASM1.asm'.]]
[PRKS-1009 : Failed to start ASM instance "+ASM1" on node "rac01", [rac01:ora.rac01.ASM1.asm:
rac01:ora.rac01.ASM1.asm:SQL*Plus: Release 11.1.0.6.0 - Production on Sat Oct 9 11:58:36 2010
rac01:ora.rac01.ASM1.asm:
rac01:ora.rac01.ASM1.asm:Copyright (c) 1982, 2007, Oracle.  All rights reserved.
rac01:ora.rac01.ASM1.asm:
rac01:ora.rac01.ASM1.asm:Enter user-name: Connected to an idle instance.
rac01:ora.rac01.ASM1.asm:
rac01:ora.rac01.ASM1.asm:SQL> ORA-03113: end-of-file on communication channel
rac01:ora.rac01.ASM1.asm:SQL> Disconnected
rac01:ora.rac01.ASM1.asm:
CRS-0215: Could not start resource 'ora.rac01.ASM1.asm'.]]

根據PRKS-1009和CRS-0215,基本可以斷定是網絡卡設定的問題了。用oifcfg檢查:

[[email protected] bin]$ ./oifcfg getif
eth0  10.0.253.0  global  public
eth1  192.168.253.0  global  cluster_interconnect
eth2  192.168.130.0  global  cluster_interconnect
eth3  192.168.131.0  global  cluster_interconnect
[[email protected] bin]$

問了一下,130和131網段是連儲存的,和rac間的priv通訊沒關係。rac0x-priv是在253網段,因此不應該有eth2和eth3的配置。

# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1        localhost.localdomain localhost
#Public IP
10.0.253.151     rac01
10.0.253.152     rac02
#Private IP
192.168.253.151    rac01-priv
192.168.253.152    rac02-priv
#Virtual IP
10.0.253.156    rac01-vip
10.0.253.157    rac02-vip

用oifcfg del刪除:

[[email protected] bin]$ ./oifcfg delif -global eth2/192.168.130.0
[[email protected] bin]$ ./oifcfg delif -global eth3/192.168.131.0
[[email protected] bin]$ ./oifcfg getif
eth0  10.0.253.0  global  public
eth1  192.168.253.0  global  cluster_interconnect

再次重啟crs:

--在一個視窗執行crs_stop:
[[email protected] bin]$ ./crs_stop -all
 
--在另一視窗看:
[[email protected] bin]$ ./crs_stat -t
Name           Type           Target    State     Host       
------------------------------------------------------------
ora....SM1.asm application    ONLINE    ONLINE    rac01       
ora....01.lsnr application    OFFLINE   OFFLINE               
ora.rac01.gsd  application    OFFLINE   OFFLINE               
ora.rac01.ons  application    ONLINE    ONLINE    rac01       
ora.rac01.vip  application    OFFLINE   OFFLINE               
ora....SM2.asm application    OFFLINE   OFFLINE               
ora....02.lsnr application    OFFLINE   OFFLINE               
ora.rac02.gsd  application    OFFLINE   OFFLINE               
ora.rac02.ons  application    OFFLINE   OFFLINE               
ora.rac02.vip  application    OFFLINE   OFFLINE

發現還有2個asm和nodeapp沒停下來,用srvctl停:

srvctl stop asm -n rac01
srvctl stop nodeapps -n rac01
[[email protected] bin]$ ./crs_stat -t
Name           Type           Target    State     Host       
------------------------------------------------------------
ora....SM1.asm application    OFFLINE   OFFLINE               
ora....01.lsnr application    OFFLINE   OFFLINE               
ora.rac01.gsd  application    OFFLINE   OFFLINE               
ora.rac01.ons  application    OFFLINE   OFFLINE               
ora.rac01.vip  application    OFFLINE   OFFLINE               
ora....SM2.asm application    OFFLINE   OFFLINE               
ora....02.lsnr application    OFFLINE   OFFLINE               
ora.rac02.gsd  application    OFFLINE   OFFLINE               
ora.rac02.ons  application    OFFLINE   OFFLINE               
ora.rac02.vip  application    OFFLINE   OFFLINE

再次啟動:

[[email protected] bin]$ ./crs_start -all
Attempting to start `ora.rac01.vip` on member `rac01`
Attempting to start `ora.rac02.vip` on member `rac02`
Attempting to start `ora.rac02.ASM2.asm` on member `rac02`
Attempting to start `ora.rac01.ASM1.asm` on member `rac01`
Start of `ora.rac01.vip` on member `rac01` succeeded.
Start of `ora.rac02.vip` on member `rac02` succeeded.
Attempting to start `ora.rac01.LISTENER_RAC01.lsnr` on member `rac01`
Attempting to start `ora.rac02.LISTENER_RAC02.lsnr` on member `rac02`
Start of `ora.rac02.ASM2.asm` on member `rac02` succeeded.
Start of `ora.rac01.ASM1.asm` on member `rac01` succeeded.
Start of `ora.rac01.LISTENER_RAC01.lsnr` on member `rac01` succeeded.
Start of `ora.rac02.LISTENER_RAC02.lsnr` on member `rac02` succeeded.
CRS-1002: Resource 'ora.rac01.ons' is already running on member 'rac01'
 
CRS-1002: Resource 'ora.rac02.ons' is already running on member 'rac02'
 
Attempting to start `ora.rac01.gsd` on member `rac01`
Attempting to start `ora.rac02.gsd` on member `rac02`
Start of `ora.rac01.gsd` on member `rac01` succeeded.
Start of `ora.rac02.gsd` on member `rac02` succeeded.
CRS-0223: Resource 'ora.rac01.ons' has placement error.
 
CRS-0223: Resource 'ora.rac02.ons' has placement error.
 
 
[[email protected] bin]$ ./crs_stat -t
Name           Type           Target    State     Host       
------------------------------------------------------------
ora....SM1.asm application    ONLINE    ONLINE    rac01       
ora....01.lsnr application    ONLINE    ONLINE    rac01       
ora.rac01.gsd  application    ONLINE    ONLINE    rac01       
ora.rac01.ons  application    ONLINE    ONLINE    rac01       
ora.rac01.vip  application    ONLINE    ONLINE    rac01       
ora....SM2.asm application    ONLINE    ONLINE    rac02       
ora....02.lsnr application    ONLINE    ONLINE    rac02       
ora.rac02.gsd  application    ONLINE    ONLINE    rac02       
ora.rac02.ons  application    ONLINE    ONLINE    rac02       
ora.rac02.vip  application    ONLINE    ONLINE    rac02

搞定,可以繼續安裝rac資料庫了!