'multipathd' and disk checker recognizes failed disks later than Oracle ASM
Environment
- Red Hat Enterprise Linux 5, 6
- Oracle ASM
Issue
-
We have recognized that the
multipathd
daemon and the scsi path checker for checking LUNs on an EMC Symmetrix storage box recognizes failed or unresponsive LUNs later than the Oracle RAC ASM volume manager.The Oracle RAC ASM seems to deactivate disks which are unresponsive for 15 seconds. But the
multipathd
tur
checker seems to recognize a unresponsive disk or a scsi path after 60 seconds. This leads to the situation that the Oracle ASM deactivates disks even if they seem fine from the OS. This results in following error messages in database logs:WARNING: Waited 15 secs for write IO to PST disk 0 in group 3. WARNING: Waited 15 secs for write IO to PST disk 0 in group 3. WARNING: Waited 15 secs for write IO to PST disk 1 in group 8. WARNING: Waited 15 secs for write IO to PST disk 1 in group 8. Fri Oct 17 21:40:56 2014 NOTE: process _b000_+asm1 (30427) initiating offline of disk 0.3915928412 (DATA1_0000) with mask 0x7e in group 3 NOTE: checking PST: grp = 3 Fri Oct 17 21:40:56 2014 NOTE: process _b001_+asm1 (30429) initiating offline of disk 1.3915928451 (DATA2_0001) with mask 0x7e in group 8 NOTE: checking PST: grp = 8
-
Is there any possibility to change the checking parameter for unresponsive LUNs and path, so that we recognize unresponsive disks earlier that the Oracle RAC ARM volume manager.
Resolution
Please try to implement the following tuning options to reduce the time required for dm-multipath, SCSI path checker to detect the failed paths and to initiate the recovery action:
A. Reduce the polling_interval
and checker_timeout
for dm-multipath
The device mapper multipath uses following two options to query the status of sub paths to the SAN devices:
-
polling_interval: This option defines how often a path's state is checked, in seconds. For paths that are usable, the time between checks will gradually increase to (4 * polling_interval). The default value of this option is 5.
The polling_interval's main functions are to check failed paths to be restored, to preemptively fail valid paths that are not currently receiving IO, and to react to configuration changes on the devices. Setting the polling_interval to a value less than 4 second isn't generally necessary for rapid failover, but it will cause a significant increase in system overhead, CPU load. Please refer to the article - High CPU usage of multipathd with low polling_intervalfor detailed information about the same. So, it would be suggested to decrease the polling_interval value upto
4
only. -
checker_timeout: This option is available in Red Hat Enterprise Linux 5.5 and later. It sets timeout value to use for path checkers that issue SCSI commands with an explicit timeout, in seconds. The default value is taken from
/sys/block/sdx/device/timeout
(60 sec).Following steps could be used to decrease the value for above options to reduce the time required for dm-multipath in detecting the IO failure and to initiate a recovery action.
-
Set the lower SCSI timeout for underlying sdXX devices:
$ echo "20" > /sys/block/sdXX/device/timeout $ cat /sys/block/sdXX/device/timeout
-
Set the following options in default section of
/etc/multipath.conf
file:polling_interval 4 #### With this value, dm-multipath will try to do a status check on sub paths in every 4 seconds. checker_timeout 10 #### Sets 10s as timeout value for path checker
-
Reload the multipath configuration using following steps:
$ /etc/init.d/multipathd reload
NOTE: Before applying any of these setting on a production system, please make sure the changes are tried in a testing environment and no issues are observed during the tests.
B. Reduce the time required in completing SCSI error handling process using steps in following article
- The RHEL 5 kernel-2.6.18-371.6.1.el5 and RHEL 6 kernel-2.6.32-358.32.3.el6 and later provides couple of options to reduce the time spend at SCSI layer in dealing with SCSI error handling process when IO requests issued to any sub paths are failed. Please refer to the following article for detailed information about the same. It would be recommended to please apply the tuning options described in following article also to reduce the time required for SCSI error handling process: Limiting path failover time for SCSI devices
C. Increase the value of _asm_hbeatiowait
parameter in Oracle configuration
-
There is
_asm_hbeatiowait
parameter available in Oracle configuration, which is by default set to 15 seconds. The timeout value set with_asm_hbeatiowait
option causes following messages to be logged if the IO is not completed within 15 seconds timeout:WARNING: Waited 15 secs for write IO to PST disk 0 in group 3. WARNING: Waited 15 secs for write IO to PST disk 0 in group 3.
-
The value for above
_asm_hbeatiowait
option was retrieved with following command:SQL> select name,value,describe from v$asm_hidden_paras; NAME VALUE DESCRIBE --------------------------------------- -------- ---------------------------------------------------------------------- _asm_acd_chunks 1 initial ACD chunks created [...] _asm_global_dump_level 267 System state dump level for ASM asserts _asm_hbeatiowait 15 number of secs to wait for PST Async Hbeat IO return <<---------- _asm_hbeatwaitquantum 2 quantum used to compute time-to-wait for a PST Hbeat check
-
It would also be suggested to increase the
_asm_hbeatiowait
parameter available in Oracle configuration to allow it to wait for some more time before logging above messages. As this option is specific to Oracle configuration, we would recommend to please try to check with Oracle support for detailed information about how to modify the value of above option.