Oracle歸檔日誌所在目錄時間不對&&Oracle叢集日誌時間顯示錯誤
Oracle歸檔日誌所在目錄時間不對&&Oracle叢集日誌時間顯示錯誤
前言
這個問題在18年的時候遇到了,基本不注意並且叢集或者資料庫執行正常是很難注意到的。
忘記當時怎麼發現的了,最近拿出來重現一下問題。
具體描述一下問題:
比如當前時間是2020年07月03日0點,資料庫對redo log發起歸檔。
在OMF下,歸檔會按照日期進行分類歸檔。
但是發起歸檔之後,新的歸檔檔案會放在的20200702的目錄中。
另外,db和asm和叢集的警告日誌的時間也和作業系統的date時間不一致。
現象
版本:11.2.0.4.0
在我的測試環境中,關閉所有節點的叢集之後,更改時間:
[root@rac1 ~]# date -s "2020-07-03 00:00:00" Fri Jul 3 00:00:00 CST 2020
然後,啟動所有節點的叢集,可以依次觀察
叢集的警告日誌,grid使用者下,$ORACLE_HOME/log/rac1/alertrac1.log
2020-07-02 09:00:26.667: [ohasd(24600)]CRS-2112:The OLR service started on node rac1. 2020-07-02 09:00:26.675: [ohasd(24600)]CRS-1301:Oracle High Availability Service started on node rac1. 2020-07-02 09:00:26.676: [ohasd(24600)]CRS-8017:location: /etc/oracle/lastgasp has 2 reboot advisory log files, 0 were announced and 0 errors occurred 2020-07-02 09:00:30.080: [/u01/app/11.2.0/grid/bin/orarootagent.bin(24643)]CRS-2302:Cannot get GPnP profile. Error CLSGPNP_NO_DAEMON (GPNPD daemon is not running). 2020-07-02 09:00:34.563: [gpnpd(24737)]CRS-2328:GPNPD started on node rac1. 2020-07-02 09:00:36.992: [cssd(24806)]CRS-1713:CSSD daemon is started in clustered mode 2020-07-02 09:00:38.798: [ohasd(24600)]CRS-2767:Resource state recovery not attempted for 'ora.diskmon' as its target state is OFFLINE 2020-07-02 09:01:01.319: [cssd(24806)]CRS-1707:Lease acquisition for node rac1 number 1 completed 2020-07-02 09:01:02.612: [cssd(24806)]CRS-1605:CSSD voting file is online: /dev/mapper/asm-ocr02; details in /u01/app/11.2.0/grid/log/rac1/cssd/ocssd.log. 2020-07-02 09:01:02.620: [cssd(24806)]CRS-1605:CSSD voting file is online: /dev/mapper/asm-ocr03; details in /u01/app/11.2.0/grid/log/rac1/cssd/ocssd.log. 2020-07-02 09:01:02.623: [cssd(24806)]CRS-1605:CSSD voting file is online: /dev/mapper/asm-ocr01; details in /u01/app/11.2.0/grid/log/rac1/cssd/ocssd.log. 2020-07-02 09:01:15.917: [cssd(24806)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac1 rac2 . 2020-07-02 09:01:18.061: [ctssd(25418)]CRS-2403:The Cluster Time Synchronization Service on host rac1 is in observer mode. 2020-07-02 09:01:18.421: [ctssd(25418)]CRS-2407:The new Cluster Time Synchronization Service reference node is host rac1. 2020-07-02 09:01:18.422: [ctssd(25418)]CRS-2401:The Cluster Time Synchronization Service started on host rac1. 2020-07-02 09:01:20.023: [ohasd(24600)]CRS-2767:Resource state recovery not attempted for 'ora.diskmon' as its target state is OFFLINE [client(25475)]CRS-10001:02-Jul-20 09:01 ACFS-9391: Checking for existing ADVM/ACFS installation. [client(25480)]CRS-10001:02-Jul-20 09:01 ACFS-9392: Validating ADVM/ACFS installation files for operating system. [client(25482)]CRS-10001:02-Jul-20 09:01 ACFS-9393: Verifying ASM Administrator setup. [client(25485)]CRS-10001:02-Jul-20 09:01 ACFS-9308: Loading installed ADVM/ACFS drivers. [client(25488)]CRS-10001:02-Jul-20 09:01 ACFS-9154: Loading 'oracleoks.ko' driver. [client(25521)]CRS-10001:02-Jul-20 09:01 ACFS-9154: Loading 'oracleadvm.ko' driver. [client(25548)]CRS-10001:02-Jul-20 09:01 ACFS-9154: Loading 'oracleacfs.ko' driver.
........
ASM例項警告日誌,
Thu Jul 02 09:01:26 2020 NOTE: No asm libraries found in the system NOTE: No asm libraries found in the system MEMORY_TARGET defaulting to 1128267776. * instance_number obtained from CSS = 1, checking for the existence of node 0... * node 0 does not exist. instance_number = 1 Starting ORACLE instance (normal) WARNING: You are trying to use the MEMORY_TARGET feature. This feature requires the /dev/shm file system to be mounted for at least 1140850688 bytes. /dev/shm is either not mounted or is mounted with available space less than this size. Please fix this so that MEMORY_TARGET can work as expected. Current available is 976789504 and used is 73728 bytes. Ensure that the mount point is /dev/shm for this directory. LICENSE_MAX_SESSION = 0 LICENSE_SESSIONS_WARNING = 0 Initial number of CPU is 4 Number of processor cores in the system is 4 Number of processor sockets in the system is 2 Private Interface 'eth1:1' configured from GPnP for use as a private interconnect. [name='eth1:1', type=1, ip=169.254.146.181, mac=00-0c-29-60-50-8b, net=169.254.0.0/16, mask=255.255.0.0, use=haip:cluster_interconnect/62] Public Interface 'eth0' configured from GPnP for use as a public interface. [name='eth0', type=1, ip=192.168.180.100, mac=00-0c-29-60-50-81, net=192.168.180.0/24, mask=255.255.255.0, use=public/1] Picked latch-free SCN scheme 3 Using LOG_ARCHIVE_DEST_1 parameter default value as /u01/app/11.2.0/grid/dbs/arch Autotune of undo retention is turned on. LICENSE_MAX_USERS = 0 SYS auditing is disabled NOTE: Volume support enabled Starting up: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Real Application Clusters and Automatic Storage Management options. ORACLE_HOME = /u01/app/11.2.0/grid System name: Linux Node name: rac1.example.com Release: 2.6.32-573.el6.x86_64 Version: #1 SMP Wed Jul 1 18:23:37 EDT 2015 Machine: x86_64 VM name: VMWare Version: 6 Using parameter settings in server-side spfile +OCR/rac-cluster/asmparameterfile/registry.253.1034884609
DB的警告日誌,
Thu Jul 02 09:02:47 2020 Adjusting the default value of parameter parallel_max_servers from 160 to 120 due to the value of parameter processes (150) Starting ORACLE instance (normal) LICENSE_MAX_SESSION = 0 LICENSE_SESSIONS_WARNING = 0 Initial number of CPU is 4 Number of processor cores in the system is 4 Number of processor sockets in the system is 2 Private Interface 'eth1:1' configured from GPnP for use as a private interconnect. [name='eth1:1', type=1, ip=169.254.146.181, mac=00-0c-29-60-50-8b, net=169.254.0.0/16, mask=255.255.0.0, use=haip:cluster_interconnect/62] Public Interface 'eth0' configured from GPnP for use as a public interface. [name='eth0', type=1, ip=192.168.180.100, mac=00-0c-29-60-50-81, net=192.168.180.0/24, mask=255.255.255.0, use=public/1] Public Interface 'eth0:2' configured from GPnP for use as a public interface. [name='eth0:2', type=1, ip=192.168.180.111, mac=00-0c-29-60-50-81, net=192.168.180.0/24, mask=255.255.255.0, use=public/1] Picked latch-free SCN scheme 3 Autotune of undo retention is turned on. LICENSE_MAX_USERS = 0 SYS auditing is disabled Starting up: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Partitioning, Real Application Clusters, OLAP, Data Mining and Real Application Testing options. ORACLE_HOME = /u01/app/oracle/product/11.2.0/db_1 System name: Linux Node name: rac1.example.com Release: 2.6.32-573.el6.x86_64
可以看到,日誌開頭全都是2020年07月02日的9點初。
我們再看看歸檔的情況,
在此之前,先刪除之前的所有歸檔便於觀察。
[oracle@rac2 ~]$ rman target / Recovery Manager: Release 11.2.0.4.0 - Production on Fri Jul 3 00:20:01 2020 Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved. connected to target database: ZKM (DBID=4210149549) RMAN> delete force noprompt archivelog all; using target database control file instead of recovery catalog allocated channel: ORA_DISK_1 channel ORA_DISK_1: SID=200 instance=zkm2 device type=DISK List of Archived Log Copies for database with db_unique_name ZKM ===================================================================== Key Thrd Seq S Low Time ------- ---- ------- - --------- 91 1 56 A 02-JUL-20 Name: +ARCH/zkm/archivelog/2020_07_03/thread_1_seq_56.303.1044695873 92 2 21 A 02-JUL-20 Name: +ARCH/zkm/archivelog/2020_07_02/thread_2_seq_21.299.1044695875 deleted archived log archived log file name=+ARCH/zkm/archivelog/2020_07_03/thread_1_seq_56.303.1044695873 RECID=91 STAMP=1044749873 deleted archived log archived log file name=+ARCH/zkm/archivelog/2020_07_02/thread_2_seq_21.299.1044695875 RECID=92 STAMP=1044695874 Deleted 2 objects
其中一個節點發出命令歸檔redo log,
SYS@zkm1> alter session set nls_date_format='yyyy-mm-dd hh24:mi:ss'; Session altered. SYS@zkm1> SYS@zkm1> select sysdate from dual; SYSDATE ------------------------ 2020-07-03 00:20:27 SYS@zkm1> alter system archive log current; System altered.
觀察警告日誌,顯示的時間是“Thu Jul 02 09:20:32 2020”。
Thu Jul 02 09:20:32 2020 Thread 1 advanced to log sequence 58 (LGWR switch) Current log# 2 seq# 58 mem# 0: +DATA/zkm/onlinelog/group_2.258.1034889845 Current log# 2 seq# 58 mem# 1: +ARCH/zkm/onlinelog/group_2.258.1034889845 Thu Jul 02 09:20:32 2020 Archived Log entry 93 added for thread 1 sequence 57 ID 0xfaf1eead dest 1:
進去ASM磁碟檢視檔案情況,
ASMCMD> ls -l Type Redund Striped Time Sys Name Y 2020_07_02/ ASMCMD> ls -l * Type Redund Striped Time Sys Name ARCHIVELOG UNPROT COARSE JUL 02 09:00:00 Y thread_1_seq_57.299.1044696033 ARCHIVELOG UNPROT COARSE JUL 02 09:00:00 Y thread_2_seq_22.303.1044696033 ASMCMD> pwd +arch/zkm/ARCHIVELOG
至此,問題完全重現成功。
原因
不知道各位發現沒有,實際上最開始啟動叢集(crsctl start crs)那一瞬間(大概是2020-07-03 00:00:00多,剛改完時間不到1分鐘),
叢集的警告日誌最開始出現的時間為“2020-07-02 09:00:26.667:”,基本相差15小時。
因此很容易也可以猜出,是由於時區的原因。
系統的時區為:
[grid@rac1 ~]$ cat /etc/sysconfig/clock # The time zone of the system is defined by the contents of /etc/localtime. # This file is only for evaluation by system-config-date, do not rely on its # contents elsewhere. ZONE="Asia/Shanghai"
那麼,叢集應該也有自己的配置檔案對應的時區,也就是叢集不依賴OS的時區(-_-||)。
mos可以找到這麼一篇文章:How To Change Timezone for Grid Infrastructure (Doc ID 1209444.1)
因此可以發現Grid Infrastructure的時區配置檔案位置在$GRID_HOME/crs/install/s_crsconfig_<nodename>_env.txt。
所以,現象的模擬其實,就是我把該檔案的時區改成如下了:
[grid@rac1 ~]$ cd $ORACLE_HOME/crs/install [grid@rac1 install]$ ls cmdllroot.sh crsconfig_lib.pm crsdelete.pm inittab installRemove.excl oracle-ohasd.conf paramfile.crs rootcrs.pl s_crsconfig_defs s_crsconfig_rac1_env.txt.bak crsconfig_addparams.sbs crsconfig_params crspatch.pm install.excl onsconfig oracle-ohasd.service ParentDirPerm_rac1.txt roothas.pl s_crsconfig_lib.pm tfa_setup.sh crsconfig_fileperms.excl crsconfig_params.sbs hasdconfig.pl install.incl oraacfs.pm oracss.pm preupdate.sh rootofs.sh s_crsconfig_rac1_env.txt [grid@rac1 install]$ cat s_crsconfig_rac1_env.txt ### This file can be used to modify the NLS_LANG environment variable, which determines the charset to be used for messages. ### For example, a new charset can be configured by setting NLS_LANG=JAPANESE_JAPAN.UTF8 ### Do not modify this file except to change NLS_LANG, or under the direction of Oracle Support Services TZ=US/Pacific NLS_LANG=AMERICAN_AMERICA.AL32UTF8 TNS_ADMIN= ORACLE_BASE=
網上找找看是不是剛好和上海時間相差15個小時。
https://time.is/PT
得證!!!
&n