1. 程式人生 > 其它 >oracle rac grid使用者執行命令crs_stat 耗時30秒處理

oracle rac grid使用者執行命令crs_stat 耗時30秒處理

現象:

環境:2節點oracle rac 11.2.0.4.。物理機

現象1

一套oracle rac 11.2.0.4 叢集環境,grid使用者執行一些命令返回時間異常。需要近30

[grid@rac1 ~]$ time crs_stat -t -v
Name           Type           R/RA   F/FT   Target    State     Host        
----------------------------------------------------------------------
ora.DATA.dg    ora....up.type 0/5    0/     ONLINE    ONLINE    rac1        
ora.FRA.dg     ora....up.type 0/5    0/     ONLINE    ONLINE    rac1        
ora....ER.lsnr ora....er.type 0/5    0/     ONLINE    ONLINE    rac1        
ora....N1.lsnr ora....er.type 0/5    0/0    ONLINE    ONLINE    rac1        
ora.OCR.dg     ora....up.type 0/5    0/     ONLINE    ONLINE    rac1        
ora.asm        ora.asm.type   0/5    0/     ONLINE    ONLINE    rac1        
ora.*.db     ora....se.type 0/2    0/1    OFFLINE   OFFLINE               
ora.*.db     ora....se.type 0/2    0/1    ONLINE    ONLINE    rac1        
ora.cvu        ora.cvu.type   0/5    0/0    ONLINE    ONLINE    rac1        
ora.gsd        ora.gsd.type   0/5    0/     OFFLINE   OFFLINE               
ora....network ora....rk.type 0/5    0/     ONLINE    ONLINE    rac1        
ora.oc4j       ora.oc4j.type  0/1    0/2    ONLINE    ONLINE    rac1        
ora.ons        ora.ons.type   0/3    0/     ONLINE    ONLINE    rac1        
ora....SM1.asm application    0/5    0/0    ONLINE    ONLINE    rac1        
ora....C1.lsnr application    0/5    0/0    ONLINE    ONLINE    rac1        
ora.rac1.gsd   application    0/5    0/0    OFFLINE   OFFLINE               
ora.rac1.ons   application    0/3    0/0    ONLINE    ONLINE    rac1        
ora.rac1.vip   ora....t1.type 0/0    0/0    ONLINE    ONLINE    rac1        
ora....SM2.asm application    0/5    0/0    ONLINE    ONLINE    rac2        
ora....C2.lsnr application    0/5    0/0    ONLINE    ONLINE    rac2        
ora.rac2.gsd   application    0/5    0/0    OFFLINE   OFFLINE               
ora.rac2.ons   application    0/3    0/0    ONLINE    ONLINE    rac2        
ora.rac2.vip   ora....t1.type 0/0    0/0    ONLINE    ONLINE    rac2        
ora.scan1.vip  ora....ip.type 0/0    0/0    ONLINE    ONLINE    rac1        
ora.*.db     ora....se.type 0/2    0/1    ONLINE    ONLINE    rac1        

real	0m27.927s
user 0m17.025s sys 0m10.712s

現象2zabbix監控告警,偶爾提示 agent.ping超時。同時zabbix serve日誌大量報錯

14975:20210730:051954.737 Zabbix agent item "oracle.status_online.process" on host "ORAC-NODE2-" failed: first network error, wait for 15 seconds
 14980:20210730:052049.401 resuming Zabbix agent checks on host "ORAC-NODE2": connection restored
 
14978:20210730:052250.841 Zabbix agent item "oracle.status_offline.process" on host "ORAC-NODE1" failed: first network error, wait for 15 seconds 14979:20210730:052300.585 Zabbix agent item "oracle.status_online.process" on host "ORAC-NODE2" failed: first network error, wait for 15 seconds 14980:20210730:052312.489 resuming Zabbix agent checks on host "
ORAC-NODE1": connection restored 14980:20210730:052414.317 resuming Zabbix agent checks on host "ORAC-NODE2": connection restored 14976:20210730:052557.756 Zabbix agent item "oracle.status_online.process" on host "ORAC-NODE2" failed: first network error, wait for 15 seconds 14980:20210730:052647.517 resuming Zabbix agent checks on host "ORAC-NODE2": connection restored 14976:20210730:052750.795 Zabbix agent item "oracle.status_offline.process" on host "ORAC-NODE1" failed: first network error, wait for 15 seconds 14977:20210730:052800.474 Zabbix agent item "oracle.status_online.process" on host "ORAC-NODE2" failed: first network error, wait for 15 seconds 14980:20210730:052812.589 resuming Zabbix agent checks on host "ORAC-NODE1": connection restored 14980:20210730:052911.698 resuming Zabbix agent checks on host "ORAC-NODE2": connection restored 14978:20210730:052918.528 Zabbix agent item "oracle.status_online.process" on host "ORAC-NODE1" failed: first network error, wait for 15 seconds 14979:20210730:052956.146 Zabbix agent item "oracle.status_online.process" on host "ORAC-NODE2" failed: first network error, wait for 15 seconds 14980:20210730:053014.555 resuming Zabbix agent checks on host "ORAC-NODE1": connection restored 14980:20210730:053015.559 resuming Zabbix agent checks on host "ORAC-NODE2": connection restored 14977:20210730:053156.451 Zabbix agent item "oracle.status_offline.process" on host "ORAC-NODE2" failed: first network error, wait for 15 seconds 14980:20210730:053246.504 resuming Zabbix agent checks on host "ORAC-NODE2": connection restored 14978:20210730:053348.958 Zabbix agent item "oracle.status_online.process" on host "ORAC-NODE1" failed: first network error, wait for 15 seconds 14980:20210730:053412.577 resuming Zabbix agent checks on host "RAC-NODE1": connection restored 14975:20210730:053557.316 Zabbix agent item "oracle.status_online.process" on host "ORAC-NODE2-" failed: first network error, wait for 15 seconds 14980:20210730:053649.374 resuming Zabbix agent checks on host "ORAC-NODE2": connection restored 14962:20210730:054521.587 item "ORAC-NODE1:oracle.status_offline.process" became not supported: Timeout while executing a shell script. 14975:20210730:055156.431 Zabbix agent item "oracle.status_offline.process" on host "ORAC-NODE2" failed: first network error, wait for 15 seconds 14980:20210730:055248.282 resuming Zabbix agent checks on host "ORAC-NODE2": connection restored 14963:20210730:055347.878 item "ORAC-NODE1:oracle.status_offline.process" became supported 14975:20210730:055358.595 Zabbix agent item "oracle.status_offline.process" on host "ORAC-NODE2" failed: first network error, wait for 15 seconds 14980:20210730:055446.042 resuming Zabbix agent checks on host "ORAC-NODE2": connection restored 14965:20210730:055518.943 item "ORAC-NODE1:oracle.status_online.process" became not supported: Timeout while executing a shell script. 14964:20210730:055519.942 item "ORAC-NODE1:oracle.status_offline.process" became not supported: Timeout while executing a shell scrip

 

有時候圖形還會斷

分析:

1 分析oracle錯誤日誌

[grid@rac2 trace]$ pwd

/u01/app/grid/diag/asm/+asm/+ASM2/trace

[grid@rac2 trace]$ tail -n 100 alert_+ASM2.log

[oracle@rac2 trace]$ pwd

/u01/app/oracle/diag/rdbms/*/*2/trace

[oracle@rac2 trace]$ tail -n 100 alert_*.log

沒有發現有關異常

2 檢視伺服器效能資訊

[root@rac2 ~]# iostat -x 1
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           5.76    0.00    0.53    1.22    0.00   92.49

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00    19.00    0.00    6.00     0.00   200.00    33.33     0.00    0.00    0.00    0.00   0.00   0.00
up-0              0.00     0.00    2.00    1.00     2.00     1.00     1.00     0.00    0.00    0.00    0.00   0.00   0.00
up-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
up-2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
up-3              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
up-4            237.00     0.00  562.00    1.00 203376.00     5.00   361.25     0.31    0.55    0.55    1.00   0.31  17.70
up-5              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
up-6              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
up-7              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
up-8              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdb               0.00     0.00    2.00    1.00     2.00     1.00     1.00     0.00    0.00    0.00    0.00   0.00   0.00
sdc               0.00     0.00    2.00    1.00     2.00     1.00     1.00     0.00    0.33    0.50    0.00   0.33   0.10
sdd               0.00     0.00    2.00    1.00     2.00     1.00     1.00     0.00    0.33    0.50    0.00   0.33   0.10
sde               0.00     0.00 1035.00    3.00 385232.00    39.00   371.17     0.55    0.58    0.57    1.67   0.31  32.40
sdf               0.00     0.00 1082.00    3.00 396736.00    39.00   365.69     0.59    0.60    0.60    1.00   0.32  34.30
sdg               0.00     0.00 1049.00    1.00 384224.00    32.00   365.96     0.55    0.56    0.56    1.00   0.32  33.30
sdh               0.00     0.00    0.00    3.00     0.00    39.00    13.00     0.00    2.00    0.00    2.00   0.33   0.10
sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdj               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
up-10             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
up-12           233.00     0.00  490.00    3.00 184336.00    39.00   373.99     0.27    0.56    0.55    1.67   0.32  16.00
up-14           237.00     0.00  563.00    0.00 203744.00     0.00   361.89     0.31    0.55    0.55    0.00   0.33  18.70
up-16             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
up-19             0.00     0.00    2.00    1.00     2.00     1.00     1.00     0.00    0.33    0.50    0.00   0.33   0.10
up-21           244.00     0.00  545.00    0.00 200896.00     0.00   368.62     0.33    0.60    0.60    0.00   0.34  18.40
up-23           225.00     0.00  486.00    1.00 180480.00    32.00   370.66     0.29    0.59    0.59    1.00   0.35  17.20
up-25             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
up-27             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
up-29             0.00     0.00    2.00    1.00     2.00     1.00     1.00     0.00    0.33    0.50    0.00   0.33   0.10
up-31           242.00     0.00  520.00    2.00 193360.00    34.00   370.49     0.34    0.65    0.65    1.00   0.37  19.30
up-33             0.00     0.00    0.00    3.00     0.00    39.00    13.00     0.01    2.00    0.00    2.00   2.00   0.60
up-35             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-0              0.00     0.00    0.00   25.00     0.00   200.00     8.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
[root@rac2 ~]# pidstat -d 1
Linux 2.6.32-642.el6.x86_64 (rac2)     07/30/2021     _x86_64_    (32 CPU)

08:45:30 PM       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
08:45:31 PM      7829 137292.31      0.00      0.00  oracle
08:45:31 PM     11890      2.88      0.00      0.00  ocssd.bin
08:45:31 PM     12525     30.77      0.00      0.00  oracle
08:45:31 PM     16723     46.15      0.00      0.00  oracle

08:45:31 PM       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
08:45:32 PM      7829 122400.00      0.00      0.00  oracle
08:45:32 PM     11890    836.00      4.00      0.00  ocssd.bin
08:45:32 PM     12089      0.00     32.00      0.00  ologgerd
08:45:32 PM     12385      0.00      4.00      0.00  orarootagent.bi
08:45:32 PM     12525     48.00      0.00      0.00  oracle
08:45:32 PM     16723     32.00      0.00      0.00  oracle

3 生成並檢視Oracle awr報表

[oracle@rac1 admin]$ ll /u01/app/oracle/product/11.2.0/db_1/rdbms/admin/awrgrpti.sql

-rw-r--r-- 1 oracle oinstall 6444 Jul 24 2011 /u01/app/oracle/product/11.2.0/db_1/rdbms/admin/awrgrpti.sql

4 CRS-4700: The Cluster Time Synchronization Service is in Observer mode.

處理:

[grid@rac1 ~]$ cat /opt/synctime.sh 
#!/bin/bash
ntpdate ***
hwclock -w
[grid@rac1 ~]$ cluvfy comp clocksync -verbose

Verifying Clock Synchronization across the cluster nodes 

Checking if Clusterware is installed on all nodes...
Check of Clusterware install passed

Checking if CTSS Resource is running on all nodes...
Check: CTSS Resource running on all nodes
  Node Name                             Status                  
  ------------------------------------  ------------------------
  rac1                                  passed                  
Result: CTSS resource check passed


Querying CTSS for time offset on all nodes...
Result: Query of CTSS for time offset passed

Check CTSS state started...
Check: CTSS state
  Node Name                             State                   
  ------------------------------------  ------------------------
  rac1                                  Observer                
CTSS is in Observer state. Switching over to clock synchronization checks using NTP


Starting Clock synchronization checks using Network Time Protocol(NTP)...

NTP Configuration file check started...
The NTP configuration file "/etc/ntp.conf" is available on all nodes
NTP Configuration file check passed

Checking daemon liveness...

Check: Liveness for "ntpd"
  Node Name                             Running?                
  ------------------------------------  ------------------------
  rac1                                  no                      
Result: Liveness check failed for "ntpd"
PRVF-5494 : The NTP Daemon or Service was not alive on all nodes
PRVF-5415 : Check to see if NTP daemon or service is running failed
Result: Clock synchronization check using Network Time Protocol(NTP) failed


PRVF-9652 : Cluster Time Synchronization Services check failed

Verification of Clock Synchronization across the cluster nodes was unsuccessful on all the specified nodes. 
[grid@rac1 ~]$ srvctl status listener
Listener LISTENER is enabled
Listener LISTENER is running on node(s): rac2,rac1
[grid@rac1 ~]$ ssh rac2 date;date
Fri Jul 30 21:56:39 * 2021
Fri Jul 30 21:56:39 * 2021
[grid@rac1 ~]$  crsctl check ctss
CRS-4700: The Cluster Time Synchronization Service is in Observer mode.
[grid@rac2 ~]$ crsctl check ctss
CRS-4700: The Cluster Time Synchronization Service is in Observer mode.
[root@rac1 ~]# mv /etc/ntp.conf /etc/ntp.conf.bak
[grid@rac1 ~]$ crsctl check ctss
CRS-4701: The Cluster Time Synchronization Service is in Active mode.
CRS-4702: Offset (in msec): 0
##節點2執行 [root@rac2 ~]# mv /etc/ntp.conf /etc/ntp.conf.bk [grid@rac2 ~]$ crsctl check ctss CRS-4701: The Cluster Time Synchronization Service is in Active mode. CRS-4702: Offset (in msec): 0 [grid@rac2 ~]$ cluvfy comp clocksync -verbose Verifying Clock Synchronization across the cluster nodes Checking if Clusterware is installed on all nodes... Check of Clusterware install passed Checking if CTSS Resource is running on all nodes... Check: CTSS Resource running on all nodes Node Name Status ------------------------------------ ------------------------ rac2 passed Result: CTSS resource check passed Querying CTSS for time offset on all nodes... Result: Query of CTSS for time offset passed Check CTSS state started... Check: CTSS state Node Name State ------------------------------------ ------------------------ rac2 Active CTSS is in Active state. Proceeding with check of clock time offsets on all nodes... Reference Time Offset Limit: 1000.0 msecs Check: Reference Time Offset Node Name Time Offset Status ------------ ------------------------ ------------------------ rac2 0.0 passed Time offset is within the specified limits on the following set of nodes: "[rac2]" Result: Check of clock time offsets passed Oracle Cluster Time Synchronization Services check passed Verification of Clock Synchronization across the cluster nodes was successful. [grid@rac1 ~]$ time srvctl status asm -a ##執行時間還是沒有變化 ASM is running on rac2,rac1 ASM is enabled. real 0m32.048s user 0m20.051s sys 0m11.758s

5 檢視網路日誌

[oracle@rac2 ~]$ tail -n 100 /u01/app/grid/diag/tnslsnr/rac2/listener/alert/log.xml
 host_addr='***'>
 <txt>31-JUL-2021 10:16:55 * (CONNECT_DATA=(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=root))(service_name=***)) * (ADDRESS=(PROTOCOL=tcp)(HOST=***)(PORT=49407)) * establish * *** * 0
 </txt>
</msg>
<msg time='2021-07-31T10:17:15.788+08:00' org_id='oracle' comp_id='tnslsnr'
 type='UNKNOWN' level='16' host_id='rac2'
 host_addr='***'>
 <txt>31-JUL-2021 10:17:15 * service_update * **** * 0
 </txt>

6 跟蹤命令執行

[grid@rac1 ~]$ strace crs_stat -t -v

getcwd("/home/grid", 4096)              = 11
chdir("/u01/app/11.2.0/grid/log/rac1/client") = 0
getcwd("/u01/app/11.2.0/grid/log/rac1/client", 4096) = 37
chdir("/home/grid")                     = 0
stat("/u01/app/11.2.0/grid/log/rac1/client/clsc105036.log", {st_mode=S_IFREG|0644, st_size=262, ...}) = 0
stat("/u01/app/11.2.0/grid/log/rac1/client/clsc105036.log", {st_mode=S_IFREG|0644, st_size=262, ...}) = 0
access("/u01/app/11.2.0/grid/log/rac1/client/clsc105036.log", F_OK) = 0
statfs("/u01/app/11.2.0/grid/log/rac1/client/clsc105036.log", {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=12868767, f_bfree=5387069, f_bavail=4731709, f_files=3276800, f_ffree=2483847, f_fsid={-1532779627, -1637007972}, f_namelen=255, f_frsize=4096}) = 0
open("/u01/app/11.2.0/grid/log/rac1/client/clsc105036.log", O_RDONLY) = 3
close(3)                                = 0
getcwd("/home/grid", 4096)              = 11
chdir("/u01/app/11.2.0/grid/log/rac1/client") = 0
getcwd("/u01/app/11.2.0/grid/log/rac1/client", 4096) = 37
chdir("/home/grid")                     = 0
stat("/u01/app/11.2.0/grid/log/rac1/client/clsc105037.log", {st_mode=S_IFREG|0644, st_size=262, ...}) = 0
stat("/u01/app/11.2.0/grid/log/rac1/client/clsc105037.log", {st_mode=S_IFREG|0644, st_size=262, ...}) = 0
access("/u01/app/11.2.0/grid/log/rac1/client/clsc105037.log", F_OK) = 0
statfs("/u01/app/11.2.0/grid/log/rac1/client/clsc105037.log", {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=12868767, f_bfree=5387069, f_bavail=4731709, f_files=3276800, f_ffree=2483847, f_fsid={-1532779627, -1637007972}, f_namelen=255, f_frsize=4096}) = 0
open("/u01/app/11.2.0/grid/log/rac1/client/clsc105037.log", O_RDONLY) = 3
close(3)                                = 0
getcwd("/home/grid", 4096)              = 11
chdir("/u01/app/11.2.0/grid/log/rac1/client") = 0
getcwd("/u01/app/11.2.0/grid/log/rac1/client", 4096) = 37
chdir("/home/grid")                     = 0
stat("/u01/app/11.2.0/grid/log/rac1/client/clsc105038.log", {st_mode=S_IFREG|0644, st_size=262, ...}) = 0
stat("/u01/app/11.2.0/grid/log/rac1/client/clsc105038.log", {st_mode=S_IFREG|0644, st_size=262, ...}) = 0
access("/u01/app/11.2.0/grid/log/rac1/client/clsc105038.log", F_OK) = 0
statfs("/u01/app/11.2.0/grid/log/rac1/client/clsc105038.log", {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=12868767, f_bfree=5387069, f_bavail=4731709, f_files=3276800, f_ffree=2483847, f_fsid={-1532779627, -1637007972}, f_namelen=255, f_frsize=4096}) = 0
open("/u01/app/11.2.0/grid/log/rac1/client/clsc105038.log", O_RDONLY) = 3
close(3)                                = 0
getcwd("/home/grid", 4096)              = 11
chdir("/u01/app/11.2.0/grid/log/rac1/client") = 0
getcwd("/u01/app/11.2.0/grid/log/rac1/client", 4096) = 37
chdir("/home/grid")                     = 0
stat("/u01/app/11.2.0/grid/log/rac1/client/clsc105039.log", {st_mode=S_IFREG|0644, st_size=262, ...}) = 0
stat("/u01/app/11.2.0/grid/log/rac1/client/clsc105039.log", {st_mode=S_IFREG|0644, st_size=262, ...}) = 0
access("/u01/app/11.2.0/grid/log/rac1/client/clsc105039.log", F_OK) = 0
statfs("/u01/app/11.2.0/grid/log/rac1/client/clsc105039.log", {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=12868767, f_bfree=5387069, f_bavail=4731709, f_files=3276800, f_ffree=2483847, f_fsid={-1532779627, -1637007972}, f_namelen=255, f_frsize=4096}) = 0
open("/u01/app/11.2.0/grid/log/rac1/client/clsc105039.log", O_RDONLY) = 3
close(3)                                = 0
getcwd("/home/grid", 4096)              = 11
chdir("/u01/app/11.2.0/grid/log/rac1/client") = 0
getcwd("/u01/app/11.2.0/grid/log/rac1/client", 4096) = 37
chdir("/home/grid")                     = 0
stat("/u01/app/11.2.0/grid/log/rac1/client/clsc105040.log", {st_mode=S_IFREG|0644, st_size=262, ...}) = 0
stat("/u01/app/11.2.0/grid/log/rac1/client/clsc105040.log", {st_mode=S_IFREG|0644, st_size=262, ...}) = 0
access("/u01/app/11.2.0/grid/log/rac1/client/clsc105040.log", F_OK) = 0
statfs("/u01/app/11.2.0/grid/log/rac1/client/clsc105040.log", {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=12868767, f_bfree=5387069, f_bavail=4731709, f_files=3276800, f_ffree=2483847, f_fsid={-1532779627, -1637007972}, f_namelen=255, f_frsize=4096}) = 0
open("/u01/app/11.2.0/grid/log/rac1/client/clsc105040.log", O_RDONLY) = 3
close(3)                                = 0
getcwd("/home/grid", 4096)              = 11
chdir("/u01/app/11.2.0/grid/log/rac1/client") = 0
getcwd("/u01/app/11.2.0/grid/log/rac1/client", 4096) = 37
chdir("/home/grid")                     = 0
stat("/u01/app/11.2.0/grid/log/rac1/client/clsc105041.log", {st_mode=S_IFREG|0644, st_size=262, ...}) = 0
stat("/u01/app/11.2.0/grid/log/rac1/client/clsc105041.log", {st_mode=S_IFREG|0644, st_size=262, ...}) = 0
access("/u01/app/11.2.0/grid/log/rac1/client/clsc105041.log", F_OK) = 0
statfs("/u01/app/11.2.0/grid/log/rac1/client/clsc105041.log", {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=12868767, f_bfree=5387069, f_bavail=4731709, f_files=3276800, f_ffree=2483847, f_fsid={-1532779627, -1637007972}, f_namelen=255, f_frsize=4096}) = 0
open("/u01/app/11.2.0/grid/log/rac1/client/clsc105041.log", O_RDONLY) = 3
close(3)                                = 0
getcwd("/home/grid", 4096)              = 11
^Cchdir("/u01/app
[grid@rac1 ~]$ tail -n 100 /u01/app/11.2.0/grid/log/rac1/client/clsc105036.log
Oracle Database 11g Clusterware Release 11.2.0.4.0 - Production Copyright 1996, 2011 Oracle. All rights reserved.
2021-04-28 08:01:57.908: [ CRSCOMM][3588262272] NAME: `UI_DATA` length=7
2021-04-28 08:01:57.908: [ CRSCOMM][3588262272] Successfully read response

從上面的跟蹤日誌很明顯就能發現問題所在

解決:

發現命令執行大量呼叫access("/u01/app/11.2.0/grid/log/rac1/client/clsc105041.log", F_OK) = 0

[grid@rac1 client]$ ll |wc -l
576583
[grid@rac1 client]$ du -sh
2.3G    .
[grid@rac1 client]$ ll /u01/app/11.2.0/grid/log/rac1/client/clsc105037.log
-rw-r--r-- 1 zabbix zabbix 262 Apr 28 08:02 /u01/app/11.2.0/grid/log/rac1/client/clsc105037.log
[grid@rac1 client]$ ll clsc*.log|wc -l
576561
[root@rac1 client]# find -type f -mtime -1|wc -l
2328
[root@rac1 client]# ll clsc575437.log
-rw-r--r-- 1 zabbix zabbix 262 Aug  1 10:16 clsc575437.log
[root@rac1 ~]# df -i
Filesystem             Inodes  IUsed    IFree IUse% Mounted on
/dev/mapper/vgnode110102723-lv_root
                      3276800 793009  2483791   25% /
tmpfs                 1000000   1024   998976    1% /dev/shm
/dev/sda1              128016     43   127973    1% /boot
/dev/mapper/vg_node110102723-lv_home
                     13926400     95 13926305    1% /home
[root@rac1 client]# find -amin -20
./clsc576616.log
./clsc576613.log
./clsc576615.log
./clsc576610.log
./clsc576614.log
./clsc576609.log
./clsc576611.log
./clsc576612.log
[root@rac1 client]# ll -h clsc576612.log
-rw-r--r-- 1 zabbix zabbix 262 Aug  1 22:31 clsc576612.log
[root@rac1 client]# ll clsc5766*.log |wc -l
34
You have mail in /var/spool/mail/root

發現大量的clsc*.log日誌,而且使用者和組都是zabbix,由此懷疑zabbix監控專案寫入了此檔案,而且也是同監控頻率一致,一分鐘一個檔案。

檢視另外一套正常的rac庫,該目錄下是沒有生成如此多的檔案。

[root@rac1 client]# pwd
/u01/app/11.2.0/grid/log/rac1/client
[root@rac1 client]# rm -f clsc5*.log
[root@rac1 client]# ll |wc -l
[grid@rac1 ~]$ time crs_stat -t -v
Name           Type           R/RA   F/FT   Target    State     Host        
----------------------------------------------------------------------
ora.DATA.dg    ora....up.type 0/5    0/     ONLINE    ONLINE    rac1        
ora.FRA.dg     ora....up.type 0/5    0/     ONLINE    ONLINE    rac1        
ora....ER.lsnr ora....er.type 0/5    0/     ONLINE    ONLINE    rac1        
ora....N1.lsnr ora....er.type 0/5    0/0    ONLINE    ONLINE    rac1        
ora.OCR.dg     ora....up.type 0/5    0/     ONLINE    ONLINE    rac1        
ora.asm        ora.asm.type   0/5    0/     ONLINE    ONLINE    rac1        
ora.***.db     ora....se.type 0/2    0/1    OFFLINE   OFFLINE               
ora.***.db     ora....se.type 0/2    0/1    ONLINE    ONLINE    rac1        
ora.cvu        ora.cvu.type   0/5    0/0    ONLINE    ONLINE    rac1        
ora.gsd        ora.gsd.type   0/5    0/     OFFLINE   OFFLINE               
ora....network ora....rk.type 0/5    0/     ONLINE    ONLINE    rac1        
ora.oc4j       ora.oc4j.type  0/1    0/2    ONLINE    ONLINE    rac1        
ora.ons        ora.ons.type   0/3    0/     ONLINE    ONLINE    rac1        
ora....SM1.asm application    0/5    0/0    ONLINE    ONLINE    rac1        
ora....C1.lsnr application    0/5    0/0    ONLINE    ONLINE    rac1        
ora.rac1.gsd   application    0/5    0/0    OFFLINE   OFFLINE               
ora.rac1.ons   application    0/3    0/0    ONLINE    ONLINE    rac1        
ora.rac1.vip   ora....t1.type 0/0    0/0    ONLINE    ONLINE    rac1        
ora....SM2.asm application    0/5    0/0    ONLINE    ONLINE    rac2        
ora....C2.lsnr application    0/5    0/0    ONLINE    ONLINE    rac2        
ora.rac2.gsd   application    0/5    0/0    OFFLINE   OFFLINE               
ora.rac2.ons   application    0/3    0/0    ONLINE    ONLINE    rac2        
ora.rac2.vip   ora....t1.type 0/0    0/0    ONLINE    ONLINE    rac2        
ora.scan1.vip  ora....ip.type 0/0    0/0    ONLINE    ONLINE    rac1        
ora.***.db     ora....se.type 0/2    0/1    ONLINE    ONLINE    rac1        

real    0m0.049s
user    0m0.014s
sys    0m0.008s

目前,問題已經解決

刪除相關檔案:clsc*.log

再次執行命令,耗時近real 0m0.049s

[grid@rac1 ~]$ strace -tt -T -v -o /tmp/strace_crs_20210801.log crs_stat -t -v

再次開啟zabbix監控,發現還是1分鐘一個檔案生成,看來還沒有從根本上解決此問題。目前是解決了。如果找不到根本解決辦法,先用定時任務找到該型別檔案並刪除吧。

疑問:1 為什麼這一套oracle rac庫會存在zabbix監控oracle的item,會生成這麼多檔案。監控專案(1521,ora_pmon,asm.process,session_counts,等

2 經過檢視zabbix針對Oracle環境沒有什麼特殊的設定?

3 oracle 環境變數或者引數有沒有什麼特殊設定?