1. 程式人生 > 實用技巧 >Troubleshooting query v$asm_disk v$asm_diskgroup hang

Troubleshooting query v$asm_disk v$asm_diskgroup hang

今天遇到了同樣的問題

接大神的總結加深下記憶原文:https://www.anbob.com/archives/5713.html

An Oracle 11.2.0.4.7 4-nodes RAC on linux env, the fourth instance has this problem just, When the monitoring tool queries the usage of ASM DISKGroup, the query of V$asm_disk and v$asm_diskgroup has not been completed. from ASM INSTANCE found that waiting for ‘enq: DD – contention’ and final blocker session is waiting for ‘GPnP Get Item’ event. but query v$asm_diskgroup_stat work fine.

# DB Instance
USERNAME           SID EVENT                MACHINE    MODULE               STATUS   LAST_CALL_ET SQL_ID          WAI_SECINW ROW_WAIT_OBJ# SQLTEXT                        BS          CH# OSUSER     HEX
----------- ---------- -------------------- ---------- -------------------- -------- ------------ --------------- ---------- ------------- ------------------------------ ---------- ---- ---------- ---------
SYS               1715 On CPU / runqueue    anbob      sqlplus              ACTIVE           2228 38z4yk2z6z3tm   4:2228                37 select * from v$asm_diskgroup  :             0 oracle       4000000
SYS               3032 On CPU / runqueue    anbob      sqlplus              ACTIVE           2962 d7r15ajvb30sp   3:2963              5161 select count(*) from  X$KFGRP  :             0 oracle       4000001
WEEJAR           387 On CPU / runqueue    anbob      SQL*Plus             ACTIVE           2987 40sc4zz430aag   3:2987            122518 select 'datapoint:' || name fr :             0 itmuser      4036f77

SQL> oradebug setorapid 89
Oracle pid: 89, Unix process pid: 14711, image: oracle@kdhd4 (TNS V1-V3)
SQL> oradebug short_stack;
ksedsts()+465<-ksdxfstk()+32<-ksdxcb()+1927<-sspuser()+112<-__sighandler()<-read()+14<-ntpfprd()+117<-nsbasic_brc()+376<-nsbrecv()+69<-nioqrc()+495<-ttcdrv()+1469<-nioqwa()+61<-upirtrc()+1378<-kpurcsc()+98<-kpuexec()+10790<-OCIStmtExecute()+39<-kfgGrpTableCbInternal()+3255<-kfgGrpTableCb()+56<-qerfxFetch()+3141<-opifch2()+2766<-kpoal8()+2833<-opiodr()+917<-ttcpip()+2183<-opitsk()+1710<-opiino()+969<-opiodr()+917<-opidrv()+570<-sou2o()+103<-opimai_real()+133<-ssthrdmain()+265<-main()+201<-__libc_start_main()+230
SQL> oradebug short_stack;
ksedsts()+465<-ksdxfstk()+32<-ksdxcb()+1927<-sspuser()+112<-__sighandler()<-read()+14<-ntpfprd()+117<-nsbasic_brc()+376<-nsbrecv()+69<-nioqrc()+495<-ttcdrv()+1469<-nioqwa()+61<-upirtrc()+1378<-kpurcsc()+98<-kpuexec()+10790<-OCIStmtExecute()+39<-kfgGrpTableCbInternal()+3255<-kfgGrpTableCb()+56<-qerfxFetch()+3141<-opifch2()+2766<-kpoal8()+2833<-opiodr()+917<-ttcpip()+2183<-opitsk()+1710<-opiino()+969<-opiodr()+917<-opidrv()+570<-sou2o()+103<-opimai_real()+133<-ssthrdmain()+265<-main()+201<-__libc_start_main()+230
Parsed short stack:
ksedsts()+465                    kernel service  error debug dump the call stack short stack 
ksdxfstk()+32                    kernel service  debug internal errors dump abridged os stack 
ksdxcb()+1927                    kernel service  debug internal errors ksdx callback for sosd layer signal handler 
sspuser()+112                    operating system dependent system handle SIGUSR2 for Oracle 
ntpfprd()+117                    network transport (drivers) ipc ? read 
nsbasic_brc()+376                network session/transparent network service block(?) basic block receive 
nsbrecv()+69                     network session/transparent network service block(?) receive 
nioqrc()+495                     network interface operating system dependent networking receive 
ttcdrv()+1469                    two task common called when message received 
nioqwa()+61                      network interface operating system dependent networking wait 
upirtrc()+1378                   user program interface router with callback 
kpurcsc()+98                     kernel programmatic interface user remote call with service context remote call with service context, callbacks 
kpuexec()+10790                  kernel programmatic interface user execute 
OCIStmtExecute()+39              statement execute 
kfgGrpTableCbInternal()+3255     kernel automatic storage management diskgroups  [partial hit for: kfg ] 
kfgGrpTableCb()+56               kernel automatic storage management diskgroups  [partial hit for: kfg ] 
qerfxFetch()+3141                query execute rowsource fixed table fetch 
opifch2()+2766                   oracle program interface oracle side of the fetch interface main routine 
kpoal8()+2833                    kernel programmatic interface oracle V8 bundled execution 
opiodr()+917                     oracle program interface oracle code request driver, route the current request 
ttcpip()+2183                    two task common pipe read/write 
opitsk()+1710                    oracle program interface two task function dispatcher 
opiino()+969                     oracle program interface initialize opi 
opiodr()+917                     oracle program interface oracle code request driver, route the current request 
opidrv()+570                     oracle program interface route current request driver, entry side into two task interface 
sou2o()+103                      main oracle executable entry point 
opimai_real()+133                oracle program interface main real oracle start point 

# ASM Instance

USERNAME           SID EVENT                MACHINE    MODULE               STATUS   LAST_CALL_ET SQL_ID          WAI_SECINW ROW_WAIT_OBJ# SQLTEXT                        B      CH# OSUSER     HEX
----------- ---------- -------------------- ---------- -------------------- -------- ------------ --------------- ---------- ------------- ------------------------------ ---------- ---- ---------- ---------
SYS                752 wait for unread mess anbob      oracle               ACTIVE              1                 0:0                   -1                                :0 grid
SYS                877 wait for unread mess anbob      oracle               ACTIVE              2                 0:0                   -1                                :0 oracle
SYS               1008 enq: DD - contention anbob      oracle               ACTIVE         585919 b8a2pdbq3p2mj   0:247938              -1 select grpnum_kfdsk, number_kf 4:2251    0 oracle       4000064
SYS               1258 enq: DD - contention anbob      oracle               ACTIVE         626533 gztn2jj9jtjnu   0:626532              -1 select name_kfgrp, number_kfgr 4:2251    0 oracle       400572f
SYS                883 enq: DD - contention anbob      oracle               ACTIVE         627267 88jqaz641rxf6   0:419928              -1 select name_kfgrp, number_kfgr 4:2251    0 oracle       403280f
SYS                758 enq: DD - contention anbob      oracle               ACTIVE         627292 88jqaz641rxf6   0:627292              -1 select name_kfgrp, number_kfgr 4:2251    0 itmuser      403280e
SYS                633 enq: DD - contention anbob      oracle               ACTIVE         627490 88jqaz641rxf6   0:349040              -1 select name_kfgrp, number_kfgr 4:2251    0 oracle       403280d
SYS                508 enq: DD - contention anbob      oracle               ACTIVE         627510 88jqaz641rxf6   0:627510              -1 select name_kfgrp, number_kfgr 4:2251    0 oracle       403280c
SYS                383 enq: DD - contention anbob      oracle               ACTIVE         627541 88jqaz641rxf6   0:550178              -1 select name_kfgrp, number_kfgr 4:2251    0 oracle       403280b

SQL> @sed dd
Show wait event descriptions matching %dd%..

EVENT# EVENT_NAME               WAIT_CLASS           PARAMETER1                PARAMETER2                PARAMETER3                ENQUEUE_NAME                   REQ_REASON                       REQ_DESCRIPTION
------ ------------------------ -------------------- ------------------------- ------------------------- ------------------------- ------------------------------ -------------------------------- -------------------------------------------------
 1265 enq: DD - contention     Other                name|mode                 disk group                type                      ASM Local Disk Group           contention                       Synchronizes local accesses to ASM disk groups
...


SQL> select program,command,status,last_call_et,wait_class,event,blocking_session,state from v$session where sid=2251;

PROGRAM                    COMMAND STATUS   LAST_CALL_ET WAIT_CLASS      EVENT                BLOCKING_SESSION STATE
----------------------- ---------- -------- ------------ --------------- -------------------- ---------------- -------------------
oracle@anbob (RBAL)              0 ACTIVE      107650374 Other           GPnP Get Item                         WAITING

grid@anbob:/home/grid>asmcmd
ASMCMD> lsdg
State    Type    Rebal  Sector  Block       AU   Total_MB   Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  EXTERN  N         512   4096  1048576      10240      9751                0            9751              0             Y  CRSDG/
MOUNTED  EXTERN  N         512   4096  1048576  100663296      2375                0            2375              0             N  DATADG/
MOUNTED  EXTERN  N         512   4096  1048576   31457280    228642                0          228642              0             N  DATADG2/
MOUNTED  EXTERN  N         512   4096  1048576   25165824  16809264                0        16809264              0             N  DATADG3/
ASMCMD> lsop
Group_Name  Dsk_Num  State  Power  EST_WORK  EST_RATE  EST_TIME  
ASMCMD> 


SQL> select pid,spid from v$process where addr='00000000A1B3C2A0';

       PID SPID
---------- ------------------------
        18 28092

SQL> oradebug setorapid 18
Oracle pid: 18, Unix process pid: 28092, image: oracle@anbob (RBAL)
SQL> oradebug setorapid 18
Oracle pid: 18, Unix process pid: 28092, image: oracle@anbob (RBAL)
SQL> oradebug short_stack;
ksedsts()+465<-ksdxfstk()+32<-ksdxcb()+1927<-sspuser()+112<-__sighandler()<-__poll()+47<-sgipcwWaitHelper()+5002<-sgipcwWait()+718<-gipcWaitOsd()+387<-gipcInternalWait()+14323<-gipcWaitF()+3225<-gipcInternalRecvSync()+8749<-gipcRecvSyncF()+3334<-clsgpnpm_gipcGets()+297<-clsgpnpm_receiveMsg()+414<-clsgpnpm_exchange()+1059<-clsgpnp_profileCallUrlInt()+3568<-clsgpnp_getProfileEx()+325<-clsgpnp_dbmsGetItem()+263<-kggpnpAttrGet()+926<-kfdParseProfileString()+924<-kfdDiscoverShallow()+2023<-kfgbDriver()+1768<-ksbabs()+771<-ksbrdp()+1045<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai_real()+250<-ssthrdmain()+265<-main()+201<-__libc_start_main()+230
SQL> oradebug short_stack;
ksedsts()+465<-ksdxfstk()+32<-ksdxcb()+1927<-sspuser()+112<-__sighandler()<-__poll()+47<-sgipcwWaitHelper()+5002<-sgipcwWait()+718<-gipcWaitOsd()+387<-gipcInternalWait()+14323<-gipcWaitF()+3225<-gipcInternalRecvSync()+8749<-gipcRecvSyncF()+3334<-clsgpnpm_gipcGets()+297<-clsgpnpm_receiveMsg()+414<-clsgpnpm_exchange()+1059<-clsgpnp_profileCallUrlInt()+3568<-clsgpnp_getProfileEx()+325<-clsgpnp_dbmsGetItem()+263<-kggpnpAttrGet()+926<-kfdParseProfileString()+924<-kfdDiscoverShallow()+2023<-kfgbDriver()+1768<-ksbabs()+771<-ksbrdp()+1045<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai_real()+250<-ssthrdmain()+265<-main()+201<-__libc_start_main()+230


Parsed short stack:
ksedsts()+465                        kernel service  error debug dump the call stack short stack 
ksdxfstk()+32                        kernel service  debug internal errors dump abridged os stack 
ksdxcb()+1927                        kernel service  debug internal errors ksdx callback for sosd layer signal handler 
sspuser()+112                        operating system dependent system handle SIGUSR2 for Oracle 
__sighandler()                        (?)  [partial hit for:  ] 
__poll()+47                           (?)  [partial hit for:  ] 
sgipcwWaitHelper()+5002              operating system dependent grid inter process calls  [partial hit for: sgipc ] 
sgipcwWait()+718                     operating system dependent grid inter process calls  [partial hit for: sgipc ] 
gipcWaitOsd()+387                    grid inter process calls  [partial hit for: gipc ] 
gipcInternalWait()+14323             grid inter process calls  [partial hit for: gipc ] 
gipcWaitF()+3225                     grid inter process calls  [partial hit for: gipc ] 
gipcInternalRecvSync()+8749          grid inter process calls  [partial hit for: gipc ] 
gipcRecvSyncF()+3334                 grid inter process calls  [partial hit for: gipc ] 
clsgpnpm_gipcGets()+297              clusterware integration grid plug and play  [partial hit for: clsgpnp ] 
clsgpnpm_receiveMsg()+414            clusterware integration grid plug and play  [partial hit for: clsgpnp ] 
clsgpnpm_exchange()+1059             clusterware integration grid plug and play  [partial hit for: clsgpnp ] 
clsgpnp_profileCallUrlInt()+3568     clusterware integration grid plug and play  [partial hit for: clsgpnp ] 
clsgpnp_getProfileEx()+325           clusterware integration grid plug and play  [partial hit for: clsgpnp ] 
clsgpnp_dbmsGetItem()+263            clusterware integration grid plug and play  [partial hit for: clsgpnp ] 
kggpnpAttrGet()+926                  kernel generic generic routines  [partial hit for: kgg ] 
kfdParseProfileString()+924          kernel automatic storage management disk  [partial hit for: kfd ] 
kfdDiscoverShallow()+2023            kernel automatic storage management disk  [partial hit for: kfd ] 
kfgbDriver()+1768                    kernel automatic storage management diskgroups background  [partial hit for: kfgb ] 
ksbabs()+771                         kernel service  background processes action based server 
ksbrdp()+1045                        kernel service  background processes run a detached background process 
opirip()+623                         oracle program interface run independent process. invoked from special function sou2o direct call in opimai of detached process invocation. initializes fixed-PGA in RAM and SQL layers. 

SQL> oradebug setmypid
Statement processed.
SQL> oradebug unlimit
Statement processed.
SQL> oradebug -g all hanganalyze 3
Hang Analysis in /oracle/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_diag_32198.trc


===============================================================================

Chains most likely to have caused the hang:
 [a] Chain 1 Signature: 'GPnP Get Item'<='rdbms ipc reply'<='enq: DD - contention'
     Chain 1 Signature Hash: 0xec6b634f
 [b] Chain 2 Signature: 'GPnP Get Item'<='rdbms ipc reply'<='enq: DD - contention'
     Chain 2 Signature Hash: 0xec6b634f
 [c] Chain 3 Signature: 'GPnP Get Item'<='rdbms ipc reply'<='enq: DD - contention' 
Chain 3 Signature Hash: 0xec6b634f
=============================================================================== 

SQL> show parameter string

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
asm_diskstring                       string      /dev/*-disk*

grid@anbob:/home/grid>kfod asm_diskstring='/dev/*-disk*' disks=all
--------------------------------------------------------------------------------
 Disk          Size Path                                     User     Group   
================================================================================
   1:    1048576 Mb /dev/asm-disk001                         grid     asmadmin
   2:    1048576 Mb /dev/asm-disk002                         grid     asmadmin
   3:    1048576 Mb /dev/asm-disk003                         grid     asmadmin
   4:    1048576 Mb /dev/asm-disk004                         grid     asmadmin
   5:    1048576 Mb /dev/asm-disk005                         grid     asmadmin
   6:    1048576 Mb /dev/asm-disk006                         grid     asmadmin
...

Cause
This is caused by non-published bug:12398300 which is a duplicate of bug:12356910 (also non-published).

Solution
Killing the ora.gpnpd on the node when ASM is blocked in the gpnp wait, permits to avoid to stop the ASM instance.

grid@anbob:/home/grid>ps -ef|grep gpnp
grid     16952     1  0  2016 ?        09:44:39 /oracle/app/11.2.0.4/grid/bin/gpnpd.bin
grid     32187 31703  0 18:04 pts/1    00:00:00 grep gpnp

grid@anbob:/home/grid>kill -HUP 16952

grid@anbob:/home/grid>ps -ef|grep gpnp
grid      6147     1  2 18:04 ?        00:00:00 /oracle/app/11.2.0.4/grid/bin/gpnpd.bin
grid     11573 31703  0 18:05 pts/1    00:00:00 grep gpnp

Note:
kill -HUP The HUP signal is sent to a process when its controlling terminal is closed. It was originally designed to notify the process of a serial line drop (HUP stands for “Hang Up”). In modern systems, this signal usually means that the controlling pseudo or virtual terminal has been closed.

It is recommended to query usage of ASM DISK[GROUP] using theV$ASM_DISK[GROUP]_STATview, the VIEW data is in memory without scanning disk, and has the same column as v$asm_diskgroup.