Troubleshooting query v$asm_disk v$asm_diskgroup hang
今天遇到了同樣的問題
接大神的總結加深下記憶原文:https://www.anbob.com/archives/5713.html
An Oracle 11.2.0.4.7 4-nodes RAC on linux env, the fourth instance has this problem just, When the monitoring tool queries the usage of ASM DISKGroup, the query of V$asm_disk and v$asm_diskgroup has not been completed. from ASM INSTANCE found that waiting for ‘enq: DD – contention’ and final blocker session is waiting for ‘GPnP Get Item’ event. but query v$asm_diskgroup_stat work fine.
USERNAME SID EVENT MACHINE MODULE STATUS LAST_CALL_ET SQL_ID WAI_SECINW ROW_WAIT_OBJ# SQLTEXT BS CH# OSUSER HEX ----------- ---------- -------------------- ---------- -------------------- -------- ------------ --------------- ---------- ------------- ------------------------------ ---------- ---- ---------- --------- SYS 1715 On CPU / runqueue anbob sqlplus ACTIVE 2228 38z4yk2z6z3tm 4:2228 37 select * from v$asm_diskgroup : 0 oracle 4000000 SYS 3032 On CPU / runqueue anbob sqlplus ACTIVE 2962 d7r15ajvb30sp 3:2963 5161 select count(*) from X$KFGRP : 0 oracle 4000001 WEEJAR 387 On CPU / runqueue anbob SQL*Plus ACTIVE 2987 40sc4zz430aag 3:2987 122518 select 'datapoint:' || name fr : 0 itmuser 4036f77 SQL> oradebug setorapid 89 Oracle pid: 89, Unix process pid: 14711, image: oracle@kdhd4 (TNS V1-V3) SQL> oradebug short_stack; ksedsts()+465<-ksdxfstk()+32<-ksdxcb()+1927<-sspuser()+112<-__sighandler()<-read()+14<-ntpfprd()+117<-nsbasic_brc()+376<-nsbrecv()+69<-nioqrc()+495<-ttcdrv()+1469<-nioqwa()+61<-upirtrc()+1378<-kpurcsc()+98<-kpuexec()+10790<-OCIStmtExecute()+39<-kfgGrpTableCbInternal()+3255<-kfgGrpTableCb()+56<-qerfxFetch()+3141<-opifch2()+2766<-kpoal8()+2833<-opiodr()+917<-ttcpip()+2183<-opitsk()+1710<-opiino()+969<-opiodr()+917<-opidrv()+570<-sou2o()+103<-opimai_real()+133<-ssthrdmain()+265<-main()+201<-__libc_start_main()+230 SQL> oradebug short_stack; ksedsts()+465<-ksdxfstk()+32<-ksdxcb()+1927<-sspuser()+112<-__sighandler()<-read()+14<-ntpfprd()+117<-nsbasic_brc()+376<-nsbrecv()+69<-nioqrc()+495<-ttcdrv()+1469<-nioqwa()+61<-upirtrc()+1378<-kpurcsc()+98<-kpuexec()+10790<-OCIStmtExecute()+39<-kfgGrpTableCbInternal()+3255<-kfgGrpTableCb()+56<-qerfxFetch()+3141<-opifch2()+2766<-kpoal8()+2833<-opiodr()+917<-ttcpip()+2183<-opitsk()+1710<-opiino()+969<-opiodr()+917<-opidrv()+570<-sou2o()+103<-opimai_real()+133<-ssthrdmain()+265<-main()+201<-__libc_start_main()+230 Parsed short stack: ksedsts()+465 kernel service error debug dump the call stack short stack ksdxfstk()+32 kernel service debug internal errors dump abridged os stack ksdxcb()+1927 kernel service debug internal errors ksdx callback for sosd layer signal handler sspuser()+112 operating system dependent system handle SIGUSR2 for Oracle ntpfprd()+117 network transport (drivers) ipc ? read nsbasic_brc()+376 network session/transparent network service block(?) basic block receive nsbrecv()+69 network session/transparent network service block(?) receive nioqrc()+495 network interface operating system dependent networking receive ttcdrv()+1469 two task common called when message received nioqwa()+61 network interface operating system dependent networking wait upirtrc()+1378 user program interface router with callback kpurcsc()+98 kernel programmatic interface user remote call with service context remote call with service context, callbacks kpuexec()+10790 kernel programmatic interface user execute OCIStmtExecute()+39 statement execute kfgGrpTableCbInternal()+3255 kernel automatic storage management diskgroups [partial hit for: kfg ] kfgGrpTableCb()+56 kernel automatic storage management diskgroups [partial hit for: kfg ] qerfxFetch()+3141 query execute rowsource fixed table fetch opifch2()+2766 oracle program interface oracle side of the fetch interface main routine kpoal8()+2833 kernel programmatic interface oracle V8 bundled execution opiodr()+917 oracle program interface oracle code request driver, route the current request ttcpip()+2183 two task common pipe read/write opitsk()+1710 oracle program interface two task function dispatcher opiino()+969 oracle program interface initialize opi opiodr()+917 oracle program interface oracle code request driver, route the current request opidrv()+570 oracle program interface route current request driver, entry side into two task interface sou2o()+103 main oracle executable entry point opimai_real()+133 oracle program interface main real oracle start point
# ASM Instance
USERNAME SID EVENT MACHINE MODULE STATUS LAST_CALL_ET SQL_ID WAI_SECINW ROW_WAIT_OBJ# SQLTEXT B CH# OSUSER HEX ----------- ---------- -------------------- ---------- -------------------- -------- ------------ --------------- ---------- ------------- ------------------------------ ---------- ---- ---------- --------- SYS 752 wait for unread mess anbob oracle ACTIVE 1 0:0 -1 :0 grid SYS 877 wait for unread mess anbob oracle ACTIVE 2 0:0 -1 :0 oracle SYS 1008 enq: DD - contention anbob oracle ACTIVE 585919 b8a2pdbq3p2mj 0:247938 -1 select grpnum_kfdsk, number_kf 4:2251 0 oracle 4000064 SYS 1258 enq: DD - contention anbob oracle ACTIVE 626533 gztn2jj9jtjnu 0:626532 -1 select name_kfgrp, number_kfgr 4:2251 0 oracle 400572f SYS 883 enq: DD - contention anbob oracle ACTIVE 627267 88jqaz641rxf6 0:419928 -1 select name_kfgrp, number_kfgr 4:2251 0 oracle 403280f SYS 758 enq: DD - contention anbob oracle ACTIVE 627292 88jqaz641rxf6 0:627292 -1 select name_kfgrp, number_kfgr 4:2251 0 itmuser 403280e SYS 633 enq: DD - contention anbob oracle ACTIVE 627490 88jqaz641rxf6 0:349040 -1 select name_kfgrp, number_kfgr 4:2251 0 oracle 403280d SYS 508 enq: DD - contention anbob oracle ACTIVE 627510 88jqaz641rxf6 0:627510 -1 select name_kfgrp, number_kfgr 4:2251 0 oracle 403280c SYS 383 enq: DD - contention anbob oracle ACTIVE 627541 88jqaz641rxf6 0:550178 -1 select name_kfgrp, number_kfgr 4:2251 0 oracle 403280b SQL> @sed dd Show wait event descriptions matching %dd%.. EVENT# EVENT_NAME WAIT_CLASS PARAMETER1 PARAMETER2 PARAMETER3 ENQUEUE_NAME REQ_REASON REQ_DESCRIPTION ------ ------------------------ -------------------- ------------------------- ------------------------- ------------------------- ------------------------------ -------------------------------- ------------------------------------------------- 1265 enq: DD - contention Other name|mode disk group type ASM Local Disk Group contention Synchronizes local accesses to ASM disk groups ... SQL> select program,command,status,last_call_et,wait_class,event,blocking_session,state from v$session where sid=2251; PROGRAM COMMAND STATUS LAST_CALL_ET WAIT_CLASS EVENT BLOCKING_SESSION STATE ----------------------- ---------- -------- ------------ --------------- -------------------- ---------------- ------------------- oracle@anbob (RBAL) 0 ACTIVE 107650374 Other GPnP Get Item WAITING grid@anbob:/home/grid>asmcmd ASMCMD> lsdg State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name MOUNTED EXTERN N 512 4096 1048576 10240 9751 0 9751 0 Y CRSDG/ MOUNTED EXTERN N 512 4096 1048576 100663296 2375 0 2375 0 N DATADG/ MOUNTED EXTERN N 512 4096 1048576 31457280 228642 0 228642 0 N DATADG2/ MOUNTED EXTERN N 512 4096 1048576 25165824 16809264 0 16809264 0 N DATADG3/ ASMCMD> lsop Group_Name Dsk_Num State Power EST_WORK EST_RATE EST_TIME ASMCMD> SQL> select pid,spid from v$process where addr='00000000A1B3C2A0'; PID SPID ---------- ------------------------ 18 28092 SQL> oradebug setorapid 18 Oracle pid: 18, Unix process pid: 28092, image: oracle@anbob (RBAL) SQL> oradebug setorapid 18 Oracle pid: 18, Unix process pid: 28092, image: oracle@anbob (RBAL) SQL> oradebug short_stack; ksedsts()+465<-ksdxfstk()+32<-ksdxcb()+1927<-sspuser()+112<-__sighandler()<-__poll()+47<-sgipcwWaitHelper()+5002<-sgipcwWait()+718<-gipcWaitOsd()+387<-gipcInternalWait()+14323<-gipcWaitF()+3225<-gipcInternalRecvSync()+8749<-gipcRecvSyncF()+3334<-clsgpnpm_gipcGets()+297<-clsgpnpm_receiveMsg()+414<-clsgpnpm_exchange()+1059<-clsgpnp_profileCallUrlInt()+3568<-clsgpnp_getProfileEx()+325<-clsgpnp_dbmsGetItem()+263<-kggpnpAttrGet()+926<-kfdParseProfileString()+924<-kfdDiscoverShallow()+2023<-kfgbDriver()+1768<-ksbabs()+771<-ksbrdp()+1045<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai_real()+250<-ssthrdmain()+265<-main()+201<-__libc_start_main()+230 SQL> oradebug short_stack; ksedsts()+465<-ksdxfstk()+32<-ksdxcb()+1927<-sspuser()+112<-__sighandler()<-__poll()+47<-sgipcwWaitHelper()+5002<-sgipcwWait()+718<-gipcWaitOsd()+387<-gipcInternalWait()+14323<-gipcWaitF()+3225<-gipcInternalRecvSync()+8749<-gipcRecvSyncF()+3334<-clsgpnpm_gipcGets()+297<-clsgpnpm_receiveMsg()+414<-clsgpnpm_exchange()+1059<-clsgpnp_profileCallUrlInt()+3568<-clsgpnp_getProfileEx()+325<-clsgpnp_dbmsGetItem()+263<-kggpnpAttrGet()+926<-kfdParseProfileString()+924<-kfdDiscoverShallow()+2023<-kfgbDriver()+1768<-ksbabs()+771<-ksbrdp()+1045<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai_real()+250<-ssthrdmain()+265<-main()+201<-__libc_start_main()+230 Parsed short stack: ksedsts()+465 kernel service error debug dump the call stack short stack ksdxfstk()+32 kernel service debug internal errors dump abridged os stack ksdxcb()+1927 kernel service debug internal errors ksdx callback for sosd layer signal handler sspuser()+112 operating system dependent system handle SIGUSR2 for Oracle __sighandler() (?) [partial hit for: ] __poll()+47 (?) [partial hit for: ] sgipcwWaitHelper()+5002 operating system dependent grid inter process calls [partial hit for: sgipc ] sgipcwWait()+718 operating system dependent grid inter process calls [partial hit for: sgipc ] gipcWaitOsd()+387 grid inter process calls [partial hit for: gipc ] gipcInternalWait()+14323 grid inter process calls [partial hit for: gipc ] gipcWaitF()+3225 grid inter process calls [partial hit for: gipc ] gipcInternalRecvSync()+8749 grid inter process calls [partial hit for: gipc ] gipcRecvSyncF()+3334 grid inter process calls [partial hit for: gipc ] clsgpnpm_gipcGets()+297 clusterware integration grid plug and play [partial hit for: clsgpnp ] clsgpnpm_receiveMsg()+414 clusterware integration grid plug and play [partial hit for: clsgpnp ] clsgpnpm_exchange()+1059 clusterware integration grid plug and play [partial hit for: clsgpnp ] clsgpnp_profileCallUrlInt()+3568 clusterware integration grid plug and play [partial hit for: clsgpnp ] clsgpnp_getProfileEx()+325 clusterware integration grid plug and play [partial hit for: clsgpnp ] clsgpnp_dbmsGetItem()+263 clusterware integration grid plug and play [partial hit for: clsgpnp ] kggpnpAttrGet()+926 kernel generic generic routines [partial hit for: kgg ] kfdParseProfileString()+924 kernel automatic storage management disk [partial hit for: kfd ] kfdDiscoverShallow()+2023 kernel automatic storage management disk [partial hit for: kfd ] kfgbDriver()+1768 kernel automatic storage management diskgroups background [partial hit for: kfgb ] ksbabs()+771 kernel service background processes action based server ksbrdp()+1045 kernel service background processes run a detached background process opirip()+623 oracle program interface run independent process. invoked from special function sou2o direct call in opimai of detached process invocation. initializes fixed-PGA in RAM and SQL layers. SQL> oradebug setmypid Statement processed. SQL> oradebug unlimit Statement processed. SQL> oradebug -g all hanganalyze 3 Hang Analysis in /oracle/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_diag_32198.trc =============================================================================== Chains most likely to have caused the hang: [a] Chain 1 Signature: 'GPnP Get Item'<='rdbms ipc reply'<='enq: DD - contention' Chain 1 Signature Hash: 0xec6b634f [b] Chain 2 Signature: 'GPnP Get Item'<='rdbms ipc reply'<='enq: DD - contention' Chain 2 Signature Hash: 0xec6b634f [c] Chain 3 Signature: 'GPnP Get Item'<='rdbms ipc reply'<='enq: DD - contention' Chain 3 Signature Hash: 0xec6b634f =============================================================================== SQL> show parameter string NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ asm_diskstring string /dev/*-disk* grid@anbob:/home/grid>kfod asm_diskstring='/dev/*-disk*' disks=all -------------------------------------------------------------------------------- Disk Size Path User Group ================================================================================ 1: 1048576 Mb /dev/asm-disk001 grid asmadmin 2: 1048576 Mb /dev/asm-disk002 grid asmadmin 3: 1048576 Mb /dev/asm-disk003 grid asmadmin 4: 1048576 Mb /dev/asm-disk004 grid asmadmin 5: 1048576 Mb /dev/asm-disk005 grid asmadmin 6: 1048576 Mb /dev/asm-disk006 grid asmadmin ...
Cause
This is caused by non-published bug:12398300 which is a duplicate of bug:12356910 (also non-published).
Solution
Killing the ora.gpnpd on the node when ASM is blocked in the gpnp wait, permits to avoid to stop the ASM instance.
grid@anbob:/home/grid>ps -ef|grep gpnp grid 16952 1 0 2016 ? 09:44:39 /oracle/app/11.2.0.4/grid/bin/gpnpd.bin grid 32187 31703 0 18:04 pts/1 00:00:00 grep gpnp grid@anbob:/home/grid>kill -HUP 16952 grid@anbob:/home/grid>ps -ef|grep gpnp grid 6147 1 2 18:04 ? 00:00:00 /oracle/app/11.2.0.4/grid/bin/gpnpd.bin grid 11573 31703 0 18:05 pts/1 00:00:00 grep gpnp
Note:
kill -HUP The HUP signal is sent to a process when its controlling terminal is closed. It was originally designed to notify the process of a serial line drop (HUP stands for “Hang Up”). In modern systems, this signal usually means that the controlling pseudo or virtual terminal has been closed.
It is recommended to query usage of ASM DISK[GROUP] using theV$ASM_DISK[GROUP]_STATview, the VIEW data is in memory without scanning disk, and has the same column as v$asm_diskgroup.