1. 程式人生 > 實用技巧 >OEM報錯"Failed to connect to ASM instance. The connection is closed: The connection is closed"處理

OEM報錯"Failed to connect to ASM instance. The connection is closed: The connection is closed"處理

OEM報錯"Failed to connect to ASM instance. The connection is closed: The connection is closed"處理

前言

秉著出現的報錯就追根問底的原則,這次剛部署不久的OEM 13C有出現如下報警:

Host=xxxxx1 
Target type=Automatic Storage Management 
Target name=+ASM1_xxxxx1 
Categories=Availability 
Message=Failed to connect to ASM instance. The connection is closed: The connection is closed 
Severity
=Fatal Event reported time=Aug 9, 2020 10:08:18 AM CST Operating System=Linux Platform=x86_64 Associated Incident Id=88 Associated Incident Status=New Associated Incident Owner= Associated Incident Acknowledged By Owner=No Associated Incident Priority=None Associated Incident Escalation Level=0 Event Type
=Target Availability Event name=Status Availability status=Down Root Cause Analysis Status=Neither Cause Nor Symptom Causal analysis result=Neither a cause nor a symptom Rule Name=Incident management rule set for all targets,Incident creation rule for a Target Down availability status Rule Owner
=System Generated Update Details: Failed to connect to ASM instance. The connection is closed: The connection is closed Incident created by rule (Name = Incident management rule set for all targets, Incident creation rule for a Target Down availability status [System generated rule]).

照例問度娘是沒問出啥來......

MOS上搜的話就有結果了:

EM 13c: Enterprise Manager 13.2 Cloud Control ASM Incident Reported with Message=Failed To Connect To ASM Instance. The Connection Is Closed: The Connection Is Closed (Doc ID 2251591.1)

文件中提到,這個一個BUG。

驗證

文件中提到,在gcagent.log日誌會有如下報錯(示例):

[65336:GC.Executor.126 (osm_instance:+ASM__host.company.com:ofs_performance_metrics) (osm_instance:+ASM__host.company.com:ofs_performance_metrics:Instance_Volume_Performance)] ERROR - The connection is closed: The connection is closed
java.sql.SQLException: The connection is closed: The connection is closed
at oracle.ucp.util.UCPErrorHandler.newSQLException(UCPErrorHandler.java:464)
at oracle.ucp.util.UCPErrorHandler.newSQLException(UCPErrorHandler.java:448)
at oracle.ucp.jdbc.proxy.JDBCConnectionProxyFactory.invoke(JDBCConnectionProxyFactory.java:307)
at oracle.ucp.jdbc.proxy.ConnectionProxyFactory.invoke(ConnectionProxyFactory.java:50)
at com.sun.proxy.$Proxy27.prepareCall(Unknown Source)

該日誌位於客戶端如下位置:

[oracle@xxxxx1 log]$ ll $AGENT_HOME/sysman/log/gcagent.log
-rw-r----- 1 oracle oinstall 960998 Aug 10 14:20 /u01/app/oem13c/agent/agent_inst/sysman/log/gcagent.log

檢視日誌可以發現,確實存在相似的日誌資訊:

2020-08-09 10:08:18,645 [99899:GC.Executor.23807 (osm_instance:+ASM1_xxxxx1:Response) (osm_instance:+ASM1_xxxxx1:Response:Response)] ERROR - The connection is closed: The connection is closed
java.sql.SQLException: The connection is closed: The connection is closed
        at oracle.ucp.util.UCPErrorHandler.newSQLException(UCPErrorHandler.java:464)
        at oracle.ucp.util.UCPErrorHandler.newSQLException(UCPErrorHandler.java:448)
        at oracle.ucp.jdbc.proxy.JDBCConnectionProxyFactory.invoke(JDBCConnectionProxyFactory.java:307)
        at oracle.ucp.jdbc.proxy.ConnectionProxyFactory.invoke(ConnectionProxyFactory.java:50)
        at com.sun.proxy.$Proxy31.prepareCall(Unknown Source)

文件中還提到,

此外,如果在EM代理程序上進行了執行緒轉儲,則會觀察到大量的"Timer-"執行緒(它們隨著時間的推移而增加,並且從未關閉/結束)。例:

jstack <Agent PID>|grep "Timer-"|wc -l
983

注意:根據經驗,"Timer-"執行緒的數量應隨時間保持恆定,少於50,但這是一個近似值,因為它取決於目標數量,監視設定,執行的作業以及許多其他因素。關鍵因素是隨著時間(天)的增加,此類執行緒的數量將保持恆定。

問題節點再次驗證如下:

[oracle@xxxxx1 ~]# ps -ef | grep java
...省略部分內容...
oracle    7687  7601  3 Aug05 ?        03:34:38 /u01/app/oem13c/agent/agent_13.3.0.0.0/oracle_common/jdk/bin/java -Xmx128M -XX:MaxPermSize=128M -server -Djava.security.egd=file:///dev/./urandom -Dsun.lang.ClassLoader.allowArraySyntax=true -XX:-UseLargePages -XX:+UseLinuxPosixThreadCPUClocks -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:+UseCompressedOops -Dwatchdog.pid=7601 -cp /u01/app/oem13c/agent/agent_13.3.0.0.0/jdbc/lib/ojdbc7.jar:/u01/app/oem13c/agent/agent_13.3.0.0.0/ucp/lib/ucp.jar:/u01/app/oem13c/agent/agent_13.3.0.0.0/oracle_common/modules/jsch-0.1.53.jar:/u01/app/oem13c/agent/agent_13.3.0.0.0/oracle_common/modules/com.oracle.http_client.http_client_12.1.3.jar:/u01/app/oem13c/agent/agent_13.3.0.0.0/oracle_common/modules/oracle.xdk_12.1.3/xmlparserv2.jar:/u01/app/oem13c/agent/agent_13.3.0.0.0/oracle_common/modules/oracle.dms_12.1.3/dms.jar:/u01/app/oem13c/agent/agent_13.3.0.0.0/oracle_common/modules/oracle.odl_12.1.3/ojdl.jar:/u01/app/oem13c/agent/agent_13.3.0.0.0/oracle_common/modules/oracle.odl_12.1.3/ojdl2.jar:/u01/app/oem13c/agent/agent_13.3.0.0.0/lib/optic.jar:/u01/app/oem13c/agent/agent_13.3.0.0.0/sysman/jlib/log4j-core.jar:/u01/app/oem13c/agent/agent_13.3.0.0.0/jlib/gcagent_core.jar:/u01/app/oem13c/agent/agent_13.3.0.0.0/sysman/jlib/emagentSDK-intg.jar:/u01/app/oem13c/agent/agent_13.3.0.0.0/sysman/jlib/emagentSDK.jar oracle.sysman.gcagent.tmmain.TMMain   
[oracle@xxxxx1 ~]$ /u01/app/oem13c/agent/agent_13.3.0.0.0/oracle_common/jdk/bin/jstack 7687 | grep "Timer-" | wc -l
83

另外一個沒報警的節點情況:

[oracle@xxxxx2 ~]$ ps -ef | grep 13.3
oracle   31845 31753  0 Aug05 ?        00:26:26 /u01/app/oem13c/agent/agent_13.3.0.0.0/oracle_common/jdk/bin/java -Xmx128M -XX:MaxPermSize=128M -server -Djava.security.egd=file:///dev/./urandom -Dsun.lang.ClassLoader.allowArraySyntax=true -XX:-UseLargePages -XX:+UseLinuxPosixThreadCPUClocks -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:+UseCompressedOops -Dwatchdog.pid=31753 -cp /u01/app/oem13c/agent/agent_13.3.0.0.0/jdbc/lib/ojdbc7.jar:/u01/app/oem13c/agent/agent_13.3.0.0.0/ucp/lib/ucp.jar:/u01/app/oem13c/agent/agent_13.3.0.0.0/oracle_common/modules/jsch-0.1.53.jar:/u01/app/oem13c/agent/agent_13.3.0.0.0/oracle_common/modules/com.oracle.http_client.http_client_12.1.3.jar:/u01/app/oem13c/agent/agent_13.3.0.0.0/oracle_common/modules/oracle.xdk_12.1.3/xmlparserv2.jar:/u01/app/oem13c/agent/agent_13.3.0.0.0/oracle_common/modules/oracle.dms_12.1.3/dms.jar:/u01/app/oem13c/agent/agent_13.3.0.0.0/oracle_common/modules/oracle.odl_12.1.3/ojdl.jar:/u01/app/oem13c/agent/agent_13.3.0.0.0/oracle_common/modules/oracle.odl_12.1.3/ojdl2.jar:/u01/app/oem13c/agent/agent_13.3.0.0.0/lib/optic.jar:/u01/app/oem13c/agent/agent_13.3.0.0.0/sysman/jlib/log4j-core.jar:/u01/app/oem13c/agent/agent_13.3.0.0.0/jlib/gcagent_core.jar:/u01/app/oem13c/agent/agent_13.3.0.0.0/sysman/jlib/emagentSDK-intg.jar:/u01/app/oem13c/agent/agent_13.3.0.0.0/sysman/jlib/emagentSDK.jar oracle.sysman.gcagent.tmmain.TMMain
[oracle@xxxxx2 ~]$ /u01/app/oem13c/agent/agent_13.3.0.0.0/oracle_common/jdk/bin/jstack 31845 | grep "Timer-" | wc -l
7

文件中給出經驗值為<50,我在問題節點可以看出"Timer-"執行緒的數量為83,作為一個參考值,說明該節點很有可能出現了BUG。