RAC 10G叢集啟動指令碼
11GR2版本GI中新增加的重要元件OHAS(Oracle High Availability Service)和其他相關的元件,資源,下圖是11GR2版本中GI元件之間啟動關係。
OHAS
OHAS是11GR2版本新推出的一個重要的元件,隨著這個元件的產生,Oracle叢集管理軟體很多方面發生了改變。這些改變主要體現在叢集啟動方式和資源管理方式方面。
叢集啟動方式10G版本
10G版本叢集管理軟體(CRS)。從叢集的啟動角度來說,10G版本的叢集通過/etc/inittab檔案中下面標紅的三行程式碼來啟動。資料庫版本Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production
# cat /etc/inittab
ap::sysinit:/sbin/autopush -f /etc/iu.ap
sp::sysinit:/sbin/soconfig -f /etc/sock2path
smf::sysinit:/lib/svc/bin/svc.startd >/dev/msglog 2<>/dev/msglog </dev/console
p3:s1234:powerfail:/usr/sbin/shutdown -y -i5 -g0 >/dev/msglog 2<>/dev/msglog
h1:3:respawn:/etc/init.d/init.evmd run >/dev/null 2>&1 </dev/null
h2:3:respawn:/etc/init.d/init.cssd fatal >/dev/null 2>&1 </dev/null
h3:3:respawn:/etc/init.d/init.crsd run >/dev/null 2>&1 </dev/null
雖然以上三個指令碼是同時被呼叫的,但是守護程序之間是有依存關係的。首先需要啟動cssd.bin並確保其能夠正常工作,之後才能夠啟動crsd.bin並確保其正常工作,最後啟動evmd.bin並確保其正常工作。
Init.cssd:負責啟動ocssd.bin守護程序和其他css層面的守護程序,從而完成對叢集的構建工作。
Init.crsd:負責啟動crsd.bin守護程序並且呼叫racg模組來啟動相應的資源,從而完成對叢集應用程式資源的啟動。
Init.evmd:負責啟動evmd.bin守護程序,從而實現叢集節點的事件釋出。
接下來,看一下每個指令碼的內容,只列舉一部分指令碼,主要體現主要功能。
(1)init.crsd指令碼
...............................................................................................................
ORA_CRS_HOME=/opt/oracle/product/CRS
ORACLE_USER=oracle
ORACLE_HOME=$ORA_CRS_HOME
export ORACLE_HOME
export ORA_CRS_HOME
export ORACLE_USER
# Set DISABLE_OPROCD to false. Platforms that do not ship an oprocd
# binary should override this below.
DISABLE_OPROCD=false
# Default OPROCD timeout values defined here, so that it can be
# over-ridden as needed by a platform.
# default Timout of 1000 ms and a margin of 500ms
OPROCD_DEFAULT_TIMEOUT=1000
OPROCD_DEFAULT_MARGIN=500
# default Timeout for other actions
OPROCD_CHECK_TIMEOUT=2000
OPROCD_STOP_TIMEOUT=2000
OPROCD_DEFAULT_HISTORGRAM=
# Incase /bin/hostname is not present in a particular platform, we
# may have to do something different.
HOSTN=/bin/hostname
EXPRN=/usr/bin/expr
CUT=/usr/bin/cut
AWK='/bin/awk'
ECHO='echo'
TR=/bin/tr
#solaris on amd and SPARC has issue with /bin/tr
[ 'SunOS' = `/bin/uname` ] && TR=/usr/xpg4/bin/tr
#on Linux tr is at /usr/bin/tr
[ 'Linux' = `/bin/uname` ] && TR=/usr/bin/tr
#If the hostname is an IP address, let hostname
#remain as IP address
HOST=`$HOSTN`
len1=`$EXPRN "$HOST" : '.*'`
len2=`$EXPRN match $HOST '[0-9]*\.[0-9]*\.[0-9]*\.[0-9]*'`
# Strip off domain name in case /bin/hostname returns
# FQDN hostname
if [ $len1 != $len2 ]; then
HOST=`$ECHO $HOST | $CUT -d'.' -f1 `
fi
HOST=`$ECHO $HOST | $TR '[:upper:]' '[:lower:]'`
# Default Location for commands on most platforms
PS='/bin/ps'
# ps -e is expected to search for all processes on the box and provide
# terse binary name output so that column count does not truncate binary
# names and confuse grep.
PSE='/bin/ps -e'
PSEF='/bin/ps -ef'
HEAD='/bin/head'
GREP='/bin/grep'
KILL='/bin/kill'
KILLTERM='/bin/kill -TERM'
KILLDIE='/bin/kill -9'
KILLCHECK="/bin/kill -0 $$"
SLEEP='/bin/sleep'
NULL='/dev/null'
............................................................可以看到,首先定義了叢集使用的一些環境變數和需要使用的作業系統命令。
...............................................................................................................
PLATFORM=`$UNAME`
MAXFILE=65536
case $PLATFORM in
Linux)
LD_LIBRARY_PATH=$ORA_CRS_HOME/lib
export LD_LIBRARY_PATH
FAST_REBOOT="/sbin/reboot -n -f & $SLEEP 1 ; $ECHO b > /proc/sysrq-trigger"
HEAD='/usr/bin/head'
...............................................................................................................
HP-UX) MACH_HARDWARE=`/bin/uname -m`
...............................................................................................................
LD_LIBRARY_PATH=$ORA_CRS_HOME/lib:$NMAPIDIR_64:/usr/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH
# Presence of this file indicates that vendor clusterware is installed
SKGXNLIB=${NMAPIDIR_64}/libnmapi2.${SO_EXT}
if [ -f $SKGXNLIB ]; then
USING_VC=1
fi
...............................................................................................................
SunOS) MACH_HARDWARE=`/bin/uname -i`
ARCH=`/usr/bin/isainfo -b`
CLUSTERDIR=/opt/ORCLcluster
LD_LIBRARY_PATH=$ORA_CRS_HOME/lib:$CLUSTERDIR/lib:/usr/lib:/usr/ucblib:$LD_LIBRARY_PATH
LD_LIBRARY_PATH_64=$ORA_CRS_HOME/lib:$CLUSTERDIR/lib:/usr/lib:/usr/ucblib:$LD_LIBRARY_PATH_64
if [ "${MACH_HARDWARE}${ARCH}" = "i86pc64" ]; then
LD_LIBRARY_PATH=$ORA_CRS_HOME/lib:$CLUSTERDIR/lib:/usr/lib/amd64:/usr/ucblib/amd64:$LD_LIBRARY_PATH
LD_LIBRARY_PATH_64=$ORA_CRS_HOME/lib:$CLUSTERDIR/lib:/usr/lib/amd64:/usr/ucblib/amd64:$LD_LIBRARY_PATH_64
...............................................................................................................可以看到為不同作業系統設定對應環境變數。
...............................................................................................................
'stop')
$LOGMSG "Oracle CSSD being stopped"
# disable CSS startup until the next boot
$ID/init.cssd norun
# shutdown the OPROCD process if it is running
if [ ! -f $NOOPROCD ]; then
$OPROCD stop -t $OPROCD_STOP_TIMEOUT 2>$NULL
fi
# No steps are necessary for shutting down clsomon. It will go down
# automatically when CSS is shutdown.
# Shut down oclsvmon if it is up.
if [ ! -f $NOCLSVMON ]; then
$EVAL $FINDCLSVMON | $AWK '{ print $2 ; }' | $XARGS $KILLTERM > $NULL 2>&1
fi
# Invalidate init.cssd fatal pidfiles.
$ECHO "stopped" > $CSSFBOOT
$TOUCH $NOOPROCD
$TOUCH $NOCLSVMON
$TOUCH $NOCLSOMON
# Now tell it to shut down.
if [ -x "$CRSCTL" ]; then
$CRSCTL stop crs
fi
$ECHO "Shutdown has begun. The daemons should exit soon."
;;
'run')
# Foreground run, for single instance or single-node installs only.
# If this is used in a cluster install, RDBMS datafile corruption is
# likely.
# Run the startcheck to see whether we should continue
$ID/init.cssd startcheck
while [ "$?" != "0" ]; do
$SLEEP $RUNRECHECKTIME
$ID/init.cssd startcheck
done
cd $ORA_CRS_HOME/log/$HOST/cssd
# If there is an old corefile by such a collision prone name, then
# rename it to something safe.
if [ -f ./core ]; then
$MVF ./core "$UNIQUECORE"
fi
# Arguments. By default none.
OCSSD_ARGS=
$ORA_CRS_HOME/bin/ocssd $OCSSD_ARGS
;;
'fatal')
# This action is invoked to start the CSS daemon in cluster mode,
# and one or more of its accompanying daemons oprocd or clsvmon or clsomon
# This respawn wrapper is done in lieu of adding new entries to inittab.
# Check to see if we are supposed to run this boot.
$ID/init.cssd startcheck
while [ "$?" != "0" ]; do
$SLEEP $RUNRECHECKTIME
$ID/init.cssd startcheck
done
# See discussion in LocalFence
$EVAL $CLEANREBOOTLOCK
..........................................................................................................
$ECHO "See documentation at the top of $0 about supported commands."
exit 1;
;;
..........................................................................................................init.cssd根據輸入的引數決定需要執行的操作,如果輸入啟動引數為fatal則正常啟動cssd守護程序和其他相關守護程序。
(2)Init.crsd
ORA_CRS_HOME=/opt/oracle/product/CRS
ORACLE_HOME=$ORA_CRS_HOME
export ORA_CRS_HOME
export ORACLE_HOME
ORACLE_USER=oracle
UMASK=/bin/umask
SED=/bin/sed
CAT=/bin/cat
LOGMSG="/bin/logger -puser.err"
ECHO=/bin/echo
.............................................................定義crsd需要使用的環境變數和作業系統命令。
---------------------------------------------------------------------------------------------------------------------------
case $PLATFORM in
Linux)
SCRDIR=/etc/oracle/scls_scr/$HOST
ID=/etc/init.d
LOGGER="/usr/bin/logger"
if [ ! -f "$LOGGER" ]; then
LOGGER="/bin/logger"
fi
LOGMSG="$LOGGER -puser.err"
if [ ! -f "$UMASK" ]; then
UMASK=umask
......................................................................................................................................................
OSF1)
ID=/sbin/init.d
# No restriction in opening files on TRU64. Refer b7623099.
MAXFILE=unlimited
;;
*) /bin/echo "ERROR: Unknown Operating System"
exit -1
;;
esac
....................................................................................根據不同平臺設定不同的環境變數。
......................................................................................................................................................
case $1 in
'home')
$ECHO $ORA_CRS_HOME
exit 0;
;;
'stop')
[ -r $PIDFILE ] && crspid=`$CAT $PIDFILE`
$LOGMSG "Oracle CRSD $crspid set to stop"
# Indicate that the next time we start up, it may be an initial startup.
$ECHO "stopped" > $CRSDBOOT
$LOGMSG "Oracle CRSD $crspid shutdown completed"
;;
'run') # foreground run out of init
.....................................................................................................................................................
$ECHO "Manual invocation of $0 is not supported."
;;
Esac
....................................................................根據輸入引數值決定相應的操作。輸入引數為run,則表示啟動crsd.bin守護程序。
(3)Init.evmd
ORA_CRS_HOME=/opt/oracle/product/CRS
ORACLE_USER=oracle
ORACLE_HOME=$ORA_CRS_HOME
export ORACLE_HOME
export ORA_CRS_HOME
CAT=/bin/cat
RMF="/bin/rm -f"
LOGMSG="/bin/logger -puser.err"
ECHO=/bin/echo
KILL=/bin/kill
..............................................................................根據不同平臺設定不同的環境變數。
case $PLATFORM in
Linux)
ID=/etc/init.d
LOGGER="/usr/bin/logger"
if [ ! -f "$LOGGER" ];then
LOGGER="/bin/logger"
fi
LOGMSG="$LOGGER -puser.err"
SU="/bin/su -l"
;;
HP-UX)
ID=/sbin/init.d
;;
.....................................................................................................................................................
;;
Esac
.......................................................................根據不同平臺設定不同的環境變數。
....................................................................................................................................................
case $1 in
'home')
$ECHO $ORA_CRS_HOME
exit 0;
;;
'user')
$ECHO $ORACLE_USER
exit 0;
;;
'stop')
$LOGMSG "Oracle EVMD set to stop"
;;
'run') # foreground run out of init
根據輸入引數值決定相應的操作。輸入引數為run,則表示啟動crsd.bin守護程序。
(4)小結
看了 init. cssd、init.crsd和 init. evmd三個指令碼的內容後,可以發現這三個指令碼的基本結構是:首先定義變數和作業系統命令,之後根據不同的作業系統平臺設定對應的環境變數,最後根據輸入的引數來決定對應的操作。但是這樣做也為叢集管理軟體帶來了問題:如果由於某種原因指令碼的內容或者許可權被修改,很可能導致叢集無法被啟動,並且很難進行診斷,而且所有的操作都儲存在指令碼中也會存在安全性的問題,所以,從11.2.0.2版本開始,叢集的啟動方式發生了改變。