1. 程式人生 > >'multipathd' and disk checker recognizes failed disks later than Oracle ASM

'multipathd' and disk checker recognizes failed disks later than Oracle ASM

Environment

  • Red Hat Enterprise Linux 5, 6
  • Oracle ASM

Issue

  • We have recognized that the multipathd daemon and the scsi path checker for checking LUNs on an EMC Symmetrix storage box recognizes failed or unresponsive LUNs later than the Oracle RAC ASM volume manager.

    The Oracle RAC ASM seems to deactivate disks which are unresponsive for 15 seconds. But the multipathd

     or the SCSI tur checker seems to recognize a unresponsive disk or a scsi path after 60 seconds. This leads to the situation that the Oracle ASM deactivates disks even if they seem fine from the OS. This results in following error messages in database logs:

    Raw

    WARNING: Waited 15 secs for write IO to PST disk 0 in group 3.
    WARNING: Waited 15 secs for write IO to PST disk 0 in group 3.
    WARNING: Waited 15 secs for write IO to PST disk 1 in group 8.
    WARNING: Waited 15 secs for write IO to PST disk 1 in group 8.
    Fri Oct 17 21:40:56 2014
    NOTE: process _b000_+asm1 (30427) initiating offline of disk 0.3915928412 (DATA1_0000) with mask 0x7e in group 3
    NOTE: checking PST: grp = 3
    Fri Oct 17 21:40:56 2014
    NOTE: process _b001_+asm1 (30429) initiating offline of disk 1.3915928451 (DATA2_0001) with mask 0x7e in group 8
    NOTE: checking PST: grp = 8
    
  • Is there any possibility to change the checking parameter for unresponsive LUNs and path, so that we recognize unresponsive disks earlier that the Oracle RAC ARM volume manager.

Resolution

Please try to implement the following tuning options to reduce the time required for dm-multipath, SCSI path checker to detect the failed paths and to initiate the recovery action:

A. Reduce the polling_interval and checker_timeout for dm-multipath

The device mapper multipath uses following two options to query the status of sub paths to the SAN devices:

  • polling_interval: This option defines how often a path's state is checked, in seconds. For paths that are usable, the time between checks will gradually increase to (4 * polling_interval). The default value of this option is 5.

    The polling_interval's main functions are to check failed paths to be restored, to preemptively fail valid paths that are not currently receiving IO, and to react to configuration changes on the devices. Setting the polling_interval to a value less than 4 second isn't generally necessary for rapid failover, but it will cause a significant increase in system overhead, CPU load. Please refer to the article - High CPU usage of multipathd with low polling_intervalfor detailed information about the same. So, it would be suggested to decrease the polling_interval value upto 4 only.

  • checker_timeout: This option is available in Red Hat Enterprise Linux 5.5 and later. It sets timeout value to use for path checkers that issue SCSI commands with an explicit timeout, in seconds. The default value is taken from /sys/block/sdx/device/timeout (60 sec).

    Following steps could be used to decrease the value for above options to reduce the time required for dm-multipath in detecting the IO failure and to initiate a recovery action.

  1. Set the lower SCSI timeout for underlying sdXX devices:

    Raw

    $ echo "20" > /sys/block/sdXX/device/timeout
    $ cat /sys/block/sdXX/device/timeout
    
  2. Set the following options in default section of /etc/multipath.conf file:

    Raw

    polling_interval          4          #### With this value, dm-multipath will try to do a status check on sub paths in every 4 seconds.
    checker_timeout          10          #### Sets 10s as timeout value for path checker
    
  3. Reload the multipath configuration using following steps:

    Raw

    $ /etc/init.d/multipathd reload
    

    NOTE: Before applying any of these setting on a production system, please make sure the changes are tried in a testing environment and no issues are observed during the tests.

B. Reduce the time required in completing SCSI error handling process using steps in following article

  • The RHEL 5 kernel-2.6.18-371.6.1.el5 and RHEL 6 kernel-2.6.32-358.32.3.el6 and later provides couple of options to reduce the time spend at SCSI layer in dealing with SCSI error handling process when IO requests issued to any sub paths are failed. Please refer to the following article for detailed information about the same. It would be recommended to please apply the tuning options described in following article also to reduce the time required for SCSI error handling process: Limiting path failover time for SCSI devices

C. Increase the value of _asm_hbeatiowait parameter in Oracle configuration

  • There is _asm_hbeatiowait parameter available in Oracle configuration, which is by default set to 15 seconds. The timeout value set with _asm_hbeatiowait option causes following messages to be logged if the IO is not completed within 15 seconds timeout:

    Raw

    WARNING: Waited 15 secs for write IO to PST disk 0 in group 3.
    WARNING: Waited 15 secs for write IO to PST disk 0 in group 3.
    
  • The value for above _asm_hbeatiowait option was retrieved with following command:

    Raw

    SQL> select name,value,describe from v$asm_hidden_paras;  
    NAME                                    VALUE    DESCRIBE  
    --------------------------------------- -------- ----------------------------------------------------------------------  
    _asm_acd_chunks                         1        initial ACD chunks created  
    [...]
    _asm_global_dump_level                  267      System state dump level for ASM asserts  
    _asm_hbeatiowait                        15       number of secs to wait for PST Async Hbeat IO return       <<----------
    _asm_hbeatwaitquantum                   2        quantum used to compute time-to-wait for a PST Hbeat check  
    
  • It would also be suggested to increase the _asm_hbeatiowait parameter available in Oracle configuration to allow it to wait for some more time before logging above messages. As this option is specific to Oracle configuration, we would recommend to please try to check with Oracle support for detailed information about how to modify the value of above option.