ASM單例項由Oracle Restart引發的系列故障分析(Final Version)
今天重新開啟上次安裝完的一個ASM單例項環境,突然報錯
SQL> startup
ORA-01078: failure in processing system parameters
ORA-01565: error in identifying file '+DATA/asmsingle/spfileasmsingle.ora'
ORA-17503: ksfdopn:2 Failed to open file +DATA/asmsingle/spfileasmsingle.ora
ORA-29701: unable to connect to Cluster Synchronization Service
以為是監聽問題造成的無法讀取asm磁碟組,於是執行netca刪除監聽和服務名,準備重新建立。
當我刪除監聽後,準備再重新建立一個時,報了一個錯誤,如下圖:
需要重啟Oracle Restart後再配置netca,否則不能註冊到Oracle Restart
google了一下,Oracle Restart是11gR2用來管理單例項元件的一個元件,在裝完Grid Infrastructure以後自動安裝的
但是我的環境現在crsctl命令也不能用,看來是GI環境配置有點問題
在安裝GI的時候,執行完root.sh指令碼後,通常要執行一下roothas.pl,如:
Performing root user operation for Oracle 11g
The following environment variables are set as:
ORACLE_OWNER= grid
ORACLE_HOME= /u01/app/11.2.0/grid
Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The contents of "oraenv" have not changed. No need to overwrite.
The contents of "coraenv" have not changed. No need to overwrite.
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
To configure Grid Infrastructure for a Stand-Alone
/u01/app/11.2.0/grid/perl/bin/perl -I/u01/app/11.2.0/grid/perl/lib -I/u01/app/11.2.0/grid/crs/install /u01/app/11.2.0/grid/crs/install/roothas.pl
To configure Grid Infrastructure for a Cluster execute the following command:
/u01/app/11.2.0/grid/crs/config/config.sh
This command launches the Grid Infrastructure Configuration Wizard. The wizard also supports silent operation, and the parameters can be passed through the response file that is available in the installation media.
[[email protected] ~]# /u01/app/11.2.0/grid/crs/install/roothas.pl
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
User ignored Prerequisites during installation
Improper Oracle Grid Infrastructure configuration found on this host
Deconfigure the existing cluster configuration before starting
to configure a new Grid Infrastructure
run '/u01/app/11.2.0/grid/crs/install/roothas.pl -deconfig'
to configure existing failed configuration and then rerun root.sh
根據執行roothas.pl失敗的指示,先用-deconfig引數來刪除原來GI的配置:
[[email protected] ~]# /u01/app/11.2.0/grid/crs/install/roothas.pl -deconfig
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
Oracle Restart stack is not active on this node
Restart the SIHA stack (use /u01/app/11.2.0/grid/bin/crsctl start has) and retry
Failed to write the checkpoint:'' with status:FAIL.Error code is 256
Failed to verify HA resources
注意上面紅色部分字型,提示Oracle Restart並沒有在本節點啟動,也無法刪除原有GI配置,難道沒辦法刪除了嗎?
[[email protected] ~]# /u01/app/11.2.0/grid/crs/install/roothas.pl
-deconfig -force -verbose
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
CRS-4639: Could not contact Oracle High Availability Services
CRS-4000: Command Stop failed, or completed with errors.
CRS-4639: Could not contact Oracle High Availability Services
CRS-4000: Command Delete failed, or completed with errors.
CLSU-00100: Operating System function: opendir failed with error data: 2
CLSU-00101: Operating System error message: No such file or directory
CLSU-00103: error location: scrsearch1
CLSU-00104: additional error information: cant open scr home dir scls_scr_getval
CRS-4544: Unable to connect to OHAS
CRS-4000: Command Stop failed, or completed with errors.
Successfully deconfigured Oracle Restart stack
發現提示不一樣了,已經順利地重新配置了Oracle Restart,然後重新執行roothas.pl來啟動Oracle Restart:
[[email protected] ~]# /u01/app/11.2.0/grid/crs/install/roothas.pl
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
User ignored Prerequisites during installation
LOCAL ADD MODE
Creating OCR keys for user 'grid', privgrp 'oinstall'..
Operation successful.
LOCAL ONLY MODE
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
CRS-4664: Node dg1 successfully pinned.
Adding Clusterware entries to inittab
dg1 2013/08/25 23:30:16 /u01/app/11.2.0/grid/cdata/dg1/backup_20130825_233016.olr
Successfully configured Oracle Grid Infrastructure for a Standalone Server
成功地重新配置完成了一個Stand-Alone,此時crsctl命令已經可以使用了,再執行root.sh中提示的那條命令就會報錯了,提示已經配置了CRS:
[[email protected] ~]# /u01/app/11.2.0/grid/perl/bin/perl -I/u01/app/11.2.0/grid/perl/lib -I/u01/app/11.2.0/grid/crs/install /u01/app/11.2.0/grid/crs/install/roothas.pl
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
User ignored Prerequisites during installation
CRS is already configured on this node for crshome=/u01/app/11.2.0/grid
Cannot configure two CRS instances on the same cluster.
Please deconfigure before proceeding with the configuration of new home.
[[email protected] ~]$ crsctl check has
CRS-4638: Oracle High Availability Services is online
[[email protected] ~]$ crsctl status res -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ons
OFFLINE OFFLINE dg1
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.cssd
1 OFFLINE OFFLINE
ora.diskmon
1 OFFLINE OFFLINE
ora.evmd
1 ONLINE ONLINE dg1
Oracle Restart問題暫時解決了,但是隨後又發現一個問題:
[[email protected] ~]$ ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 3
Total space (kbytes) : 262120
Used space (kbytes) : 152
Available space (kbytes) : 261968
ID : 1711295372
Device/File Name : /u01/app/11.2.0/grid/cdata/localhost/local.ocr
Device/File integrity check succeeded
Device/File not configured
Device/File not configured
Device/File not configured
Device/File not configured
Cluster registry integrity check succeeded
Logical corruption check bypassed due to non-privileged user
按理說OCR磁碟的路徑應該在+DATA/asmsingle上的,怎麼變成/u01/...這個本地路徑了,而且grid用於居然執行不了下面這條命令:
[[email protected]dg1 ~]$ crsctl query css votedisk
Parse error:
'css' is an invalid argument
此時突然醒悟,怎麼這裡主機名是dg1呢?明明之前配置過/etc/hosts裡的主機名是asm-single的,檢視一下:
[[email protected] ~]# cat /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
192.168.1.99 asm-single
用hostname重新改一下:
[[email protected] ~]# hostname asm-single
[[email protected] ~]#su - grid
[[email protected] ~]# exit
[[email protected] ~]#
修改完以後切換下使用者就顯示正確的主機名了,再檢視一下GI資源情況:
[[email protected] ~]$ crsctl status res -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ons
OFFLINE OFFLINE dg1
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.cssd
1 OFFLINE OFFLINE
ora.diskmon
1 OFFLINE OFFLINE
ora.evmd
1 ONLINE ONLINE dg1
仍然是dg1作為主機名,看來要重新執行roothas.pl再配置一次了
[[email protected] ~]# /u01/app/11.2.0/grid/crs/install/roothas.pl -deconfig -force -verbose
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
Failed to write the checkpoint:'' with status:FAIL.Error code is 256
Can't open /etc/oracle/scls_scr/asm-single/grid/ohasdstr for write: No such file or directory at /u01/app/11.2.0/grid/crs/install/s_crsconfig_lib.pm line 1332.
該命令無法執行,提示找不到asm-single路徑,查看了一下,確實沒有asm-single這一路徑:
[[email protected] ~]# cd /etc/oracle/scls_scr
[[email protected] scls_scr]# ll
total 4
drwxr-x--- 4 root oinstall 4096 Aug 25 23:29 dg1
[[email protected] scls_scr]#
既然已經認定主機是dg1,那麼就先將錯就錯一下吧,把主機名改回dg1,直接修改/etc/hosts中的內容
然後用NETCA重新配置一下監聽和NET SERVICE NAME,並把監聽和資料庫例項新增到Oracle Restart配置中:
[[email protected] ~]$ srvctl add database -d asmsingle -o /u01/app/oracle/product/11.2.0/dbhome_1
PRCD-1025 : Failed to create database asmsingle
PRKH-1014 : Current user "grid" is not the oracle owner user "oracle" of oracle home "/u01/app/oracle/product/11.2.0/dbhome_1"
應該是grid使用者沒有操作ORACLE_HOME的許可權,用oracle使用者試一下:
[[email protected] ~]$ srvctl add database -d asmsingle -o /u01/app/oracle/product/11.2.0/dbhome_1
沒有報錯,此時再用grid使用者檢視一下Oracle Restart資源配置情況:
[[email protected] ~]$ crsctl status res -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.LISTENER.lsnr
ONLINE ONLINE dg1
ora.ons
OFFLINE OFFLINE dg1
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asmsingle.db
1 OFFLINE OFFLINE
ora.cssd
1 OFFLINE OFFLINE
ora.diskmon
1 OFFLINE OFFLINE
ora.evmd
1 ONLINE ONLINE dg1
可以看到,資料庫例項已經新增進去了,同時監聽也自動進來了,但是ASM和磁碟組都還沒有,試試啟動CRS
[[email protected] ~]$ crsctl start crs
CRS-4013: This command is not supported in a single-node configuration.
CRS-4000: Command Start failed, or completed with errors.
這是RAC中使用的命令,這裡單例項ASM不能用它來啟動全部資源,檢視一下資料庫的配置:
[[email protected] ~]$ srvctl config database -d asmsingle
Database unique name: asmsingle
Database name:
Oracle home: /u01/app/oracle/product/11.2.0/dbhome_1
Oracle user: oracle
Spfile:
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Database instance: asmsingle
Disk Groups:
Services:
[[email protected] ~]$
此處可以看到,與ASM有關的都沒有相應內容,先啟動ASM例項,配置磁碟組DG
[[email protected] ~]$ export ORACLE_SID=+ASM
[[email protected] ~]$ sqlplus '/as sysdba'
SQL*Plus: Release 11.2.0.3.0 Production on Mon Aug 26 01:27:04 2013
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to an idle instance.
SQL> startup
ORA-01078: failure in processing system parameters
ORA-29701: unable to connect to Cluster Synchronization Service
啟動錯誤,無法連線CSS,檢視一下CSS程序:
SQL> !
[[email protected] ~]$ crsctl check css
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
[[email protected] ~]$ ps -ef|grep cssd
grid 13704 13680 0 01:28 pts/5 00:00:00 grep cssd
[[email protected] ~]$ ps -ef|grep has
root 11620 1 0 Aug25 ? 00:00:00 /bin/sh /etc/init.d/init.ohasd run
grid 11652 1 0 Aug25 ? 00:00:21 /u01/app/11.2.0/grid/bin/ohasd.bin reboot
grid 13706 13680 0 01:28 pts/5 00:00:00 grep has
[[email protected] ~]$ ps -ef|grep d.bin
grid 11652 1 0 Aug25 ? 00:00:21 /u01/app/11.2.0/grid/bin/ohasd.bin reboot
grid 11834 1 0 Aug25 ? 00:00:06 /u01/app/11.2.0/grid/bin/oraagent.bin
grid 11850 1 0 Aug25 ? 00:00:00 /u01/app/11.2.0/grid/bin/evmd.bin
grid 11887 11850 0 Aug25 ? 00:00:00 /u01/app/11.2.0/grid/bin/evmlogger.bin -o /u01/app/11.2.0/grid/evm/log/evmlogger.info -l /u01/app/11.2.0/grid/evm/log/evmlogger.log
grid 13156 1 0 00:51 ? 00:00:00 /u01/app/11.2.0/grid/bin/tnslsnr LISTENER -inherit
grid 13714 13680 0 01:28 pts/5 00:00:00 grep d.bin
[[email protected] ~]$ crsctl check has
CRS-4638: Oracle High Availability Services is online
has程序正常啟動的,但是沒有找到cssd的demon程序
[[email protected] ~]$ crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora....ER.lsnr ora....er.type ONLINE ONLINE dg1
ora....ngle.db ora....se.type OFFLINE OFFLINE
ora.cssd ora.cssd.type OFFLINE OFFLINE
ora.diskmon ora....on.type OFFLINE OFFLINE
ora.evmd ora.evm.type ONLINE ONLINE dg1
ora.ons ora.ons.type OFFLINE OFFLINE
這裡發現cssd和diskmon都是OFFLINE的,這2個服務是依賴於HAS維護的,分別檢視這2個程序的引數:
[[email protected] ~]$ crs_stat -p ora.cssd
NAME=ora.cssd
TYPE=ora.cssd.type
ACTION_SCRIPT=
ACTIVE_PLACEMENT=0
AUTO_START=never
CHECK_INTERVAL=30
DESCRIPTION="Resource type for CSSD"
FAILOVER_DELAY=0
FAILURE_INTERVAL=3
FAILURE_THRESHOLD=5
HOSTING_MEMBERS=
PLACEMENT=balanced
RESTART_ATTEMPTS=5
SCRIPT_TIMEOUT=600
START_TIMEOUT=600
STOP_TIMEOUT=900
UPTIME_THRESHOLD=1m
[[email protected] ~]$ crs_stat -p ora.diskmon
NAME=ora.diskmon
TYPE=ora.diskmon.type
ACTION_SCRIPT=
ACTIVE_PLACEMENT=0
AUTO_START=never
CHECK_INTERVAL=3
DESCRIPTION="Resource type for Diskmon"
FAILOVER_DELAY=0
FAILURE_INTERVAL=3
FAILURE_THRESHOLD=5
HOSTING_MEMBERS=
PLACEMENT=balanced
RESTART_ATTEMPTS=10
SCRIPT_TIMEOUT=60
START_TIMEOUT=600
STOP_TIMEOUT=60
UPTIME_THRESHOLD=5s
發現他們並不會隨HAS啟動而自己啟動,於是手動啟動它們:
[[email protected] ~]$ crsctl start res ora.cssd
CRS-2672: Attempting to start 'ora.cssd' on 'dg1'
CRS-2672: Attempting to start 'ora.diskmon' on 'dg1'
CRS-2676: Start of 'ora.diskmon' on 'dg1' succeeded
CRS-2676: Start of 'ora.cssd' on 'dg1' succeeded
這裡發現啟動了cssd後,diskmon也被啟動了,其實他們是被綁在一起的2個服務,隨便先啟動哪個,另一個也會跟著啟動,檢視一下HAS資源情況
[[email protected] ~]$ crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora....ER.lsnr ora....er.type ONLINE ONLINE dg1
ora....ngle.db ora....se.type OFFLINE OFFLINE
ora.cssd ora.cssd.type ONLINE ONLINE dg1
ora.diskmon ora....on.type OFFLINE OFFLINE
ora.evmd ora.evm.type ONLINE ONLINE dg1
ora.ons ora.ons.type OFFLINE OFFLINE
但事實好像有點出入,儘管現實diskmon也是succeeded,但這裡仍然是OFFLINE狀態,就算單獨再啟動一次diskmon,也是一樣
[[email protected] ~]$ crsctl start res ora.diskmon
CRS-2672: Attempting to start 'ora.diskmon' on 'dg1'
CRS-2676: Start of 'ora.diskmon' on 'dg1' succeeded
[[email protected] ~]$ crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora....ER.lsnr ora....er.type ONLINE ONLINE dg1
ora....ngle.db ora....se.type OFFLINE OFFLINE
ora.cssd ora.cssd.type ONLINE ONLINE dg1
ora.diskmon ora....on.type OFFLINE OFFLINE
ora.evmd ora.evm.type ONLINE ONLINE dg1
ora.ons ora.ons.type OFFLINE OFFLINE
先不管diskmon服務了,用grid啟動ASM例項:
[[email protected] ~]$ sqlplus '/as sysdba'
SQL*Plus: Release 11.2.0.3.0 Production on Mon Aug 26 01:46:13 2013
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to an idle instance.
SQL> startup
ORA-00099: warning: no parameter file specified for ASM instance
ORA-01031: insufficient privileges
SQL> exit
Disconnected
注意:這裡必須要用sysasm來連線ASM例項,在管理ASM時,他的許可權比sysdba還要大
[[email protected] ~]$ export ORACLE_SID=+ASM
[[email protected] ~]$ sqlplus '/as sysasm'
SQL*Plus: Release 11.2.0.3.0 Production on Mon Aug 26 01:46:57 2013
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to an idle instance.
SQL> startup
ORA-00099: warning: no parameter file specified for ASM instance
ASM instance started
Total System Global Area 283930624 bytes
Fixed Size 2227664 bytes
Variable Size 256537136 bytes
ASM Cache 25165824 bytes
ORA-15110: no diskgroups mounted
先檢視一下磁碟組,那些磁碟沒有掛載,這裡並沒有顯示:
SQL> select name,state,type from v$asm_diskgroup;
no rows selected
因為環境是自己搭建的,所以知道有2個磁碟組,分別是DATA和FRA,直接進行掛載:
SQL> alter diskgroup DATA mount;
alter diskgroup DATA mount
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15017: diskgroup "DATA" cannot be mounted
ORA-15063: ASM discovered an insufficient number of disks for diskgroup "DATA"
SQL> alter diskgroup FRA mount;
alter diskgroup FRA mount
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15017: diskgroup "FRA" cannot be mounted
ORA-15063: ASM discovered an insufficient number of disks for diskgroup "FRA"
DATA和FRA是之前為單例項建立的2個磁碟組,用的是EXTERNAL模式,DATA給了4個盤,FRA給了3個盤,每個盤都是3G
現在掛載失敗,提示發現數量不足,EXTERNAL模式應該沒理由盤不夠啊,檢視一下asm_diskstring引數:
SQL> show parameter string
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
asm_diskstring string
原來asm_diskstring引數值為空,難怪說磁碟數量不足,原來是找不到磁碟,一個盤都沒有,數量當然不足了
SQL> alter system set asm_diskstring='/dev/asm*';
System altered.
SQL> show parameter string
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
asm_diskstring string /dev/asm*
SQL> alter system set asm_diskstring='/dev/asm*' scope=both;
alter system set asm_diskstring='/dev/asm*' scope=both
*
ERROR at line 1:
ORA-32001: write to SPFILE requested but no SPFILE is in use
這裡本來是想把變更加到spfile裡去的,但是這裡並不是用spfile啟動的,所以報錯了,先不管,再試試掛載一下磁碟組
SQL> alter diskgroup DATA mount;
Diskgroup altered.
SQL> alter diskgroup FRA mount;
Diskgroup altered.
此處其實還有另外一條命令可以用:
SQL> alter system set asm_diskgroups=data,fra;(用此法加入磁碟組後是否直接是掛載狀態還有待測試)
再檢視一下資料庫的配置情況:
[[email protected] ~]$ srvctl config database -t asmsingle
PRKO-2002 : Invalid command line option: -t
[[email protected] ~]$ srvctl config database -d asmsingle
Database unique name: asmsingle
Database name:
Oracle home: /u01/app/oracle/product/11.2.0/dbhome_1
Oracle user: oracle
Spfile:
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Database instance: asmsingle
Disk Groups:
Services:
磁碟組依舊沒有新增進內容,檢視引數asm_diskgroup已經有這2個磁碟組了
SQL> show parameter asm_diskgroup
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
asm_diskgroups string DATA, FRA
SQL> select name,state,type from v$asm_diskgroup;
NAME STATE TYPE
------------------------------ ----------- ------
DATA MOUNTED EXTERN
FRA MOUNTED EXTERN
SQL> !
[[email protected] ~]$ asmcmd
ASMCMD> lsdg
State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name
MOUNTED EXTERN N 512 4096 1048576 12288 10350 0 10350 0 N DATA/
MOUNTED EXTERN N 512 4096 1048576 9216 8984 0 8984 0 N FRA/
磁碟組已經正常掛載了,由於是用pfile啟動的,想把結果儲存到spfile中去:
SQL> show parameter spfile
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
spfile string
SQL> create spfile from pfile
2 ;
create spfile from pfile
*
ERROR at line 1:
ORA-29786: SIHA attribute GET failed with error [Attribute 'SPFILE' sts[200]
lsts[0]]
可能是由於ASM資源並沒有註冊到HAS
[[email protected] ~]$ srvctl config asm
PRCR-1001 : Resource ora.asm does not exist
[[email protected] ~]$ srvctl add asm
[[email protected] ~]$ srvctl config asm
ASM home: /u01/app/11.2.0/grid
ASM listener: LISTENER
Spfile:
ASM diskgroup discovery string: ++no-value-at-resource-creation--never-updated-through-ASM++
這裡不知道為何沒有值,按理說應該已經能夠識別到asm_diskstring了
ASM的alert日誌在路徑/u01/app/grid/diag/asm/+asm/+ASM/trace/alert_+ASM.log
查看了一下,有如下報錯資訊:
Mon Aug 26 02:25:32 2013
NOTE: failed to discover disks from gpnp profile asm diskstring
Errors in file /u01/app/grid/diag/asm/+asm/+ASM/trace/+ASM_rbal_14046.trc:
ORA-29786: SIHA attribute GET failed with error [Attribute 'ASM_DISKSTRING' sts[200] lsts[0]]
[[email protected] ~]$ srvctl status asm
ASM is not running.
[[email protected] ~]$ srvctl start asm
[[email protected] ~]$ srvctl status asm
ASM is running on dg1
[[email protected] ~]$ crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora....ER.lsnr ora....er.type ONLINE ONLINE dg1
ora.asm ora.asm.type ONLINE ONLINE dg1
ora....ngle.db ora....se.type OFFLINE OFFLINE
ora.cssd ora.cssd.type ONLINE ONLINE dg1
ora.diskmon ora....on.type OFFLINE OFFLINE
ora.evmd ora.evm.type ONLINE ONLINE dg1
ora.ons ora.ons.type OFFLINE OFFLINE
SQL> create spfile from pfile;
create spfile from pfile
*
ERROR at line 1:
ORA-01078: failure in processing system parameters
LRM-00109: could not open parameter file
'/u01/app/11.2.0/grid/dbs/init+ASM.ora'
查看了一下路徑,確實沒有init+ASM.ora這個pfile,所以用pfile來建立spfile這個方法看來不行了
SQL> create spfile from memory;
File created.
但是可以從記憶體建立spfile,但是要注意,這條命令執行以後,會在$GRID_HOME/dbs建立spfile,全名是”spfile+ASM.ora“
對於單例項而言,這個問題還不大,如果是RAC環境,則必須要在spfile檔名後面指定在ASM中的路徑,如:”+DATA/asmsingle/spfile+ASM.ora“
那麼這裡,我再重新建立一個spfile,來修改預設的spfile存放路徑,這裡涉及到GPNP profile內容的更新問題,原則是,會按照最新的儲存位置來更新spfile
SQL> shutdown immediate
ASM diskgroups dismounted
ASM instance shutdown
SQL> startup
ORA-32004: obsolete or deprecated parameter(s) specified for ASM instance
ASM instance started
Total System Global Area 283930624 bytes
Fixed Size 2227664 bytes
Variable Size 256537136 bytes
ASM Cache 25165824 bytes
ASM diskgroups mounted
SQL> show parameter spfile
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
spfile string /u01/app/11.2.0/grid/dbs/spfil
e+ASM.ora
我們可以來看一下ASM中檔案的存放路徑:
ASMCMD> ls
DATA/
FRA/
ASMCMD> cd data
ASMCMD> ls
ASM/
ASMSINGLE/
ASMCMD> cd asmsingle
ASMCMD> ls
CONTROLFILE/
DATAFILE/
ONLINELOG/
PARAMETERFILE/
TEMPFILE/
spfileasmsingle.ora
ASMCMD> cd parameterfile
ASMCMD> ls
spfile.266.824164223
注意,如果是RAC環境,剛才的建立spfile的語句就要改成:SQL> create spfile='+DATA/asmsingle/' from memory;
否則會造成RAC中其他節點無法訪問spfile而破壞RAC環境,這個spfile.266.824164223的檔案是系統自動生成的,具體作用還未研究過
注意這裡RAC用的spfile是上面那個spfileasmsingle.ora,這點千萬不要搞錯了
做完以上全部操作以後,再啟動資料庫:
[[email protected] ~]$ sqlplus '/as sysdba'
SQL*Plus: Release 11.2.0.3.0 Production on Mon Aug 26 03:26:14 2013
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to an idle instance.
SQL> startup
ORACLE instance started.
Total System Global Area 313159680 bytes
Fixed Size 2227944 bytes
Variable Size 117440792 bytes
Database Buffers 188743680 bytes
Redo Buffers 4747264 bytes
Database mounted.
Database opened.
[[email protected] ~]$ srvctl config database -d asmsingle
Database unique name: asmsingle
Database name:
Oracle home: /u01/app/oracle/product/11.2.0/dbhome_1
Oracle user: oracle
Spfile:
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Database instance: asmsingle
Disk Groups: DATA,FRA
Services:
[[email protected] ~]$ crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora.DATA.dg ora....up.type ONLINE ONLINE dg1
ora.FRA.dg ora....up.type ONLINE ONLINE dg1
ora....ER.lsnr ora....er.type ONLINE ONLINE dg1
ora.asm ora.asm.type ONLINE ONLINE dg1
ora....ngle.db ora....se.type ONLINE ONLINE dg1
ora.cssd ora.cssd.type ONLINE ONLINE dg1
ora.diskmon ora....on.type OFFLINE OFFLINE
ora.evmd ora.evm.type ONLINE ONLINE dg1
ora.ons ora.ons.type OFFLINE OFFLINE
OK,故障解決
PS:關於本測試還有2個疑問
1.測試重新在ASM上建立一個spfile,看預設是不是用這個spfile啟動,即GPNP profile有沒有生效(已解決)
ASMCMD> spget
/u01/app/11.2.0/grid/dbs/spfile+ASM.ora
ASMCMD> pwd
+data/asmsingle --注意這裡是資料庫例項,是我搞錯了,應該進入的路徑是+data/asm
ASMCMD> ls -l
Type Redund Striped Time Sys Name
Y CONTROLFILE/
Y DATAFILE/
Y ONLINELOG/
Y PARAMETERFILE/
Y TEMPFILE/
N spfileasmsingle.ora => +DATA/ASMSINGLE/PARAMETERFILE/spfile.266.824164223
spfileasmsingle.ora->spfile.266.824164223(OMF格式)是一個對映關係,都是資料庫例項的spfile
ASMCMD> spset +DATA/asmsingle/spfileasmsingle.ora --之前錯把資料庫例項的spfile當做了asm例項的spfile,這裡spset了一個錯誤的路徑
ASMCMD> spget
+DATA/asmsingle/spfileasmsingle.ora
ASMCMD> exit
[[email protected] ~]$ srvctl config asm -a
ASM home: /u01/app/11.2.0/grid
ASM listener: LISTENER
Spfile: +DATA/asmsingle/spfileasmsingle.ora
ASM diskgroup discovery string: ++no-value-at-resource-creation--never-updated-through-ASM++
ASM is enabled.
(注:上面的spfile其實是資料庫例項的,而並不是asm例項的,所以這裡ASM diskgroup discovery string提示沒有值)
SQL> show parameter diskgroup
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
asm_diskgroups string DATA, FRA
SQL> show parameter diskstring
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
asm_diskstring string /dev/asm* --其實是有diskgroup string的
用spset設定spfile新的路徑後是即時生效的,但前提是設定的這個spfile必須本來就是存在的,否則,只有通過下面的辦法先建立一個ASM例項的spfile:
SQL>create spfile='+DATA/asm/asmparameterfile/asmspfile.ora' from pfile='$ORACLE_HOME/dbs/init+ASM.ora'; SQL> create spfile='+DATA/asm/asmparameterfile/asmspfile.ora' from pfile; --此處可以省略pfile的路徑,預設就是去dbs路徑下去找init+ASM.ora檔案的File created.
SQL> !
[[email protected] ~]$ asmcmd
ASMCMD> cd data/asm
ASMCMD> ls
ASMPARAMETERFILE/
ASMCMD> cd asmparameterfile
ASMCMD> ls
REGISTRY.253.824517131
asmspfile.ora
ASMCMD> spget
+DATA/asm/asmparameterfile/asmspfile.ora 可以看到,建立完後會直接更新GPNP profile,現在ASM例項的spfile已經是新指定的+DATA路徑了
這裡REGISTRY.253.824517131是系統自動生成的ASM例項的spfile檔案,是OMF格式的,下面的asmspfile.ora是剛才那條命令建立的 此時如果以不帶引數檔案的startup啟動ASM例項,則使用的是+DATA這一ASM磁碟上的spfile,而不再是之前本地磁碟上的spfile了
SQL> startup force
ORA-32004: obsolete or deprecated parameter(s) specified for ASM instance
ASM instance started
Total System Global Area 283930624 bytes
Fixed Size 2227664 bytes
Variable Size 256537136 bytes
ASM Cache 25165824 bytes
ASM diskgroups mounted
SQL> show parameter spfile
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
spfile string +DATA/asm/asmparameterfile/asm
spfile.ora
注意:這裡已經用了剛才新建立的spfile指定的路徑了,之前的理論得到驗證 ASMCMD> cd +data/asm/asmparameterfile
ASMCMD> pwd
+data/asm/asmparameterfile
ASMCMD> ls -l
Type Redund Striped Time Sys Name
ASMPARAMETERFILE UNPROT COARSE AUG 27 00:00:00 Y REGISTRY.253.824517131
N asmspfile.ora => +DATA/ASM/ASMPARAMETERFILE/REGISTRY.253.824517131
當我們在ASM磁碟組上建立spfile後,不僅是更新GPNP profile後直接生效,還會把新建的spfile和系統原有的spifle做一個對映,asmspfile.ora->REGISTRY.253.824517131的對映,這個和資料庫例項的spifle是類似的 最後在來驗證一下: ASMCMD> exit
[[email protected] ~]$ srvctl config asm -a
ASM home: /u01/app/11.2.0/grid
ASM listener: LISTENER
Spfile: +DATA/asm/asmparameterfile/asmspfile.ora
ASM diskgroup discovery string: /dev/asm*
ASM is enabled.
這時候發現ASM diskgroup discovery string裡的值已經可以正確識別了,因為配置了正確的ASM例項的spfile
2.為何狀態都正常的情況下,ora.diskmon還是OFFLINE狀態(已解決)
關於ora.diskmon這個resource是專門為EXADATA環境準備的,對於11.2.0.3的非EXADATA環境,預設是被DISABLE的,我也是查了一篇帖子才知道,
它和CSS有依賴關係,雖然CSS啟動的時候diskmon也顯示succeeded,但是預設不會啟動,讓我困惑了好一會
後記:當從pfile建立spfile後,啟動asm例項時,又引發了另一個錯誤:ORA-32004: obsolete or deprecated parameter(s) specified for ASM instance
為ASM指定的引數過期或廢棄了
SQL> set line 200 pages 999
SQL> col name for a30
SQL> col value for a50
SQL> select t.NAME,t.value from v$parameter t where t.ISDEPRECATED='TRUE';
NAME VALUE
------------------------------ --------------------------------------------------
lock_name_space
remote_os_authent FALSE
background_dump_dest /u01/app/grid/diag/asm/+asm/+ASM/trace
user_dump_dest /u01/app/grid/diag/asm/+asm/+ASM/trace
sql_trace FALSE
發現有5個過期引數,asm例項的報警日誌裡也記錄了2個引數,內容如下:
Deprecated system parameters with specified values:
background_dump_dest
user_dump_dest
End of deprecated system parameter listing
都共同指向了background_dump_dest和user_dump_dest,那我們只要把這兩個引數註釋掉,就可以解決問題了
先修改pfile,把其中的關於這兩個引數的行註釋掉,然後重新用pfile再建立一次spfile,再用spfile啟動:
SQL> startup force pfile='/u01/app/11.2.0/grid/dbs/init+ASM.ora' --用註釋掉那2個引數的本地pfile啟動
ASM instance started
Total System Global Area 283930624 bytes
Fixed Size 2227664 bytes
Variable Size 256537136 bytes
ASM Cache 25165824 bytes
ASM diskgroups mounted
SQL> create spfile='+DATA/asm/asmparameterfile/asmspfile.ora' from pfile; --重新用修改好的pfile建立一次spfile
File created.
SQL> startup force
--重新啟動後,不報ORA-32004錯誤了
ASM instance started
Total System Global Area 283930624 bytes
Fixed Size 2227664 bytes
Variable Size 256537136 bytes
ASM Cache 25165824 bytes
ASM diskgroups mounted
至此,一路碰到的ASM單例項故障全部解決,完!
-------------------------------------------------------------------------------------------------------
原創內容,轉載請註明連結,謝謝!
相關推薦
ASM單例項由Oracle Restart引發的系列故障分析(Final Version)
今天重新開啟上次安裝完的一個ASM單例項環境,突然報錯 SQL> startup ORA-01078: failure in processing system parameters ORA-01565: error in identifying file '+DAT
使用DBCA安裝ASM單例項出現的PRCR-1079和ORA-12547的解決方法
oracle和grid使用者下$ORACLE_HOME/bin/oracle的許可權是否為"-rwsr-s--x"且屬於oinstall組: [oracle@rtest bin]$ ls -ltr $ORACLE_HOME/bin/oracle -r-xrws--x1 oracleasmadmin2323
RMAN備份恢復系列1: Oracle 10g rac asm資料庫恢復到10g單例項資料庫
RMAN> recover database; Starting recover at 11-MAR-13 using channel ORA_DISK_1 starting media recovery channel ORA_DISK_1: starting archive log restore
centos部署oracle rac單例項11.2.0.3資料庫(使用asm磁碟)
部署oracle rac單例項資料庫,需要安裝grid和datavase兩部分,所以首先建立兩個使用者oracle和grid,因為不能使用root使用者進行安裝,在安裝之前首先需要修改一些系統引數和安裝一些庫。 1.建立使用者和使用者組 首先是官方推薦的使用者和使用者組清單:
Oracle DB 使用DBCA建立單例項ASM資料庫用作恢復目錄資料庫
1) 使用 DBCA 啟動建立單例項ASM資料庫,可用作恢復目錄資料庫。 [[email protected] ~]$ dbca 2) 在“Welcome(歡迎)”頁上,單擊“Next (下一步)”。 3) 在“Operations(操作)”頁上,選中“
Oracle rac asm資料庫恢復到單例項資料庫
環境: 資料庫版本:10201 1.源端資料庫:oracle 10g RAC ASM datbase 2.目標資料庫:oracle 10g single datbase 1引數檔案的恢復 1.1.恢復引數檔案[[email protected] ~]$
oracle 12c RMAN異機恢復還原RAC叢集(ASM)為單例項庫(檔案系統)記錄
CONFIGURE CONTROLFILE AUTOBACKUP ON備份集準備 backup database format '/home/databak/whole_%d_%U'; backup spfile format '/home/databak/spfile_%
單例項Windows下升級11.2.0.1的oracle至11.2.0.4
前提:1.做好備份 可以用expdp/impdp匯出匯入至另一臺伺服器 也可以手動複製檔案和資料夾(較麻煩) 2.關閉服務和監聽 開始:1.下載
oper-O11gR2+ASM+RAC使用duplicate快速複製一個庫到單例項
環境: 源庫:O11gR2+ASM+RAC+rhel6.4 目標庫:O11gR2+rhel6.4單例項 步驟 1、配置靜態監聽 源庫:/u01/app/11.2.0/grid_1/network/admin/listener.ora SID_LIST_LISTENER = (SID_LI
Oracle 12c 單例項安裝
準備工作 實驗環境:Redhat 6.6 Oracle 12c 12.2.0.1 1、官網下載 https://www.oracle.com/technetwork/database/enterprise-edition/downloads/oracle12c-linux-122
Oracle 單例項 Relink Binary Options 說明
一.引入問題 幫朋友在CentOS上安裝一個10g的Oracle,結果朋友的CentOS版本是6.2的版本,最新的一個版本,不過Linux上基本都差不多,所以按照以前的步驟,迅速的操作起來,結果遇到N多以前沒有遇到過的錯誤。安裝被迫中斷,嘗試忽略之前的錯誤,又會遇到其他的錯誤。 64位 l
oracle單例項通過dataguard 遷移到RAC
One instance primary to RAC standbyDataGuard Configuration: Primary standby Clusterware Oracle Database 11g Enterprise Edition
Oracle 12c R2 單例項資料庫建立 和 開機自啟
/etc/oratab是在建立資料庫例項時建立的,需要用root使用者執行root.sh,如果忘記執行在自啟動和關閉時會失敗,錯誤資訊:"/etc/oratab" is not accessible。它記錄主機中有多少個數據庫。在$ORACLE_HOME/bin目錄下的$ORACLE_HOME/bin/dbs
Oracle 18c 單例項安裝手冊 詳細截圖版
1. 安裝前準備工作 1.1. 系統要求 Linux下Oracle資料庫安裝的最低伺服器配置要求: 系統要求 說明 記憶體 DB:至少少1G,建議大小8G。Grid:至少8G。 臨時空間 /tmp目錄中至少有
Linux下靜默安裝 oracle參考連結(for單例項、for RAC叢集)
1. http://www.dbdream.com.cn/2012/03/03/linux%e7%8e%af%e5%a2%83%e4%b8%8b%e9%9d%99%e9%bb%98%e5%bb%ba%e5%ba%93oracle11gr2/ 2.http://www.dbd
ORACLE 11g 由新特性引發lsnrctl hang住卡死迷霧的詳細剖析歷程
1、問題描述同事說卡住了,連線oracle資料庫很慢,需要很久,連上了做一個簡單的查詢也非常慢,感覺像是hang主了一般。2、分析oracle伺服器負載一開始登入進去,檢視oracle伺服器,負載很低,伺服器毫無壓力,感覺不是伺服器卡的問題了:[[email pro
Linux安裝單例項Oracle11g ASM
環境:OS:Redhat 5.5 64BitDB:Oracle 11.2.0.3 64Bit Grid/DatabaseVM 虛擬機器說明:在Oracle10g 版本中兩種是在一個介質中,11g是分為兩個介質。步驟:1:Linux系統磁碟管理(磁碟組、裸裝置等)2:Grid
ASM+RAC==>>單例項+檔案系統遷移步驟
1修改pfile檔案(去掉叢集專用,因原伺服器有重名資料庫,取別名'test') 2轉儲控制檔案,資料庫至mount 3註冊備份集目錄 4restore、recover database 5修改redo日誌路徑 6開啟資料庫7善後(刪除多餘undo和redo、新建temp表
oracle單例項靜默安裝
背景:對於在linux系統安裝oracle,起初比較頭痛於圖形化介面安裝,費時費力,後來找到了靜默安裝的方法,效率提高了不只一倍,現在分享給大家1、做好安裝oracle的初始化工作包括建使用者、oracle base目錄、依賴包、環境變數等,我寫了sh指令碼#########
Oracle 單例項 遷移到 RAC 例項 -- 使用RMAN 異機恢復
Oracle 官網有關單例項遷移到RAC的一個步驟說明: How to Convert 10g Single-Instance database to 10g RAC using Manua