First, this article, albeit a little step by step and thus simple(?) at times, is really excellent. This article by Dell is worth reading as well as it uses a number of terms/concepts that may not be familiar to non-storage administrators.
On a couple earlier posts about dm_multipath (1, 2), ‘paul’ had commented “I see some errors in your configuration. The problem is that you are using readsector0 for path checking instead of RDAC and a wrong hwhandler.” He said following examples here worked in his situation, but didn’t elaborate on what his situation was exactly. That article/benchmark says:
After trying the array successfully with Fedora Core 5, CentOS5 (which is RHEL 5 64bit) and exploring all the above issues, in the end I settled on SuSE SLES-10-SP1 x86_64 (Suse 10 service pack 1 for 64bit) and used it as-is, there was no need to install anything other than the Java “SMdevices/SMmonitor/SMagent” stuff on the resource CD.
It’s work noting that those are all RPM based distributions. No surprise since Dell appears to support them in some way although as usual, YMMV with any enterprise support. ‘paul’ failed to say why configuring dm_multipath this way is a configuration error, so I set out to read more. It’s important to make the distinction between the MD3000 in that article and the MD3000i which I have.
The MD3000 is traditional Direct-Attached-Storage (DAS) and uses SAS 8470 cables to connect to SAS HBAs in the host. In Highly-Available (HA) mode, you put two HBAs in each of two hosts and connect one HBA in each host to one of the two controllers in the MD3000.
The MD3000i is an iSCSI Storage-Area-Network (SAN) and uses regular gigabit ethernet to interconnect to up to 16 hosts. It’s recommend to use two separate switches and two network cards per host, creating multiple physical paths to each controller on the MD3000i.
My brain had trouble for a while separating DRAC (Dell Remote Access Controller), which is IPMI like Dell kit from RDAC (Redundant Disk Array Controller ). The benchmark article mentions that the MD3000i is an awful lot like an IBM DS4100. Dell likes rebranding gear, so maybe the MD3000i is just an IBM N3700 or something (I don’t have enough interest to poke through that data sheets). I mention it though because RDAC is a technology in a lot of IBMs products so you can sometimes find more information search for ‘IBM RDAC’ than Dell.
When I boot up, I only have two paths to a virtual disk:
# multipath -d -ll
sdb: checker msg is “readsector0 checker reports path is down”
sdc: checker msg is “readsector0 checker reports path is down”
36001c23000d59fc600000284478bcdcadm-0 DELL,MD3000i
[size=558G][features=0][hwhandler=0]
\_ round-robin 0 [prio=1][active]
\_ 2:0:0:0 sdd 8:48 [active][ready]
\_ round-robin 0 [prio=1][enabled]
\_ 3:0:0:0 sde 8:64 [active][ready]
Which is across the active controller. If I switch the preferred path in MDSM the disk fails:
# ls
ls: reading directory .: Input/output error
# multipath -d -ll
sdb: checker msg is “readsector0 checker reports path is down”
sdc: checker msg is “readsector0 checker reports path is down”
sdd: checker msg is “readsector0 checker reports path is down”
sde: checker msg is “readsector0 checker reports path is down”
36001c23000d59fc600000284478bcdcadm-0 DELL,MD3000i
[size=558G][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][enabled]
\_ 2:0:0:0 sdd 8:48 [failed][faulty]
\_ round-robin 0 [prio=0][enabled]
\_ 3:0:0:0 sde 8:64 [failed][faulty]
Running multipath once picks up the other paths:
# multipath
error calling out /sbin/scsi_id -g -u -s /block/sda
sdd: checker msg is “readsector0 checker reports path is down”
sde: checker msg is “readsector0 checker reports path is down”
sdd: checker msg is “readsector0 checker reports path is down”
sde: checker msg is “readsector0 checker reports path is down”
reload: 36001c23000d59fc600000284478bcdca DELL,MD3000i
[size=558G][features=0][hwhandler=0]
\_ round-robin 0 [prio=1][undef]
\_ 1:0:0:0 sdb 8:16 [undef][ready]
\_ round-robin 0 [prio=1][undef]
\_ 4:0:0:0 sdc 8:32 [undef][ready]
\_ round-robin 0 [prio=0][undef]
\_ 2:0:0:0 sdd 8:48 [failed][faulty]
\_ round-robin 0 [prio=0][undef]
\_ 3:0:0:0 sde 8:64 [failed][faulty]
# multipath -d -ll
sdd: checker msg is “readsector0 checker reports path is down”
sde: checker msg is “readsector0 checker reports path is down”
36001c23000d59fc600000284478bcdcadm-0 DELL,MD3000i
[size=558G][features=0][hwhandler=0]
\_ round-robin 0 [prio=1][enabled]
\_ 1:0:0:0 sdb 8:16 [active][ready]
\_ round-robin 0 [prio=1][enabled]
\_ 4:0:0:0 sdc 8:32 [active][ready]
\_ round-robin 0 [prio=0][enabled]
\_ 2:0:0:0 sdd 8:48 [active][faulty]
\_ round-robin 0 [prio=0][enabled]
\_ 3:0:0:0 sde 8:64 [active][faulty]
If I now remount the filesystem and change the preferred path back, things work okay. You can see device-mapper failing the paths in the dmesg output:
end_request: I/O error, dev sdb, sector 794703
device-mapper: multipath: Failing path 8:16.
end_request: I/O error, dev sdb, sector 71
end_request: I/O error, dev sdb, sector 8279
end_request: I/O error, dev sdb, sector 12375
end_request: I/O error, dev sdb, sector 794711
end_request: I/O error, dev sdc, sector 794703
device-mapper: multipath: Failing path 8:32.
end_request: I/O error, dev sdc, sector 794711
end_request: I/O error, dev sdc, sector 71
end_request: I/O error, dev sdc, sector 8279
end_request: I/O error, dev sdc, sector 12375
But touching some files and switching again, things went down hill:
device-mapper: multipath: Failing path 8:48.
end_request: I/O error, dev sde, sector 12735
device-mapper: multipath: Failing path 8:64.
Buffer I/O error on device dm-1, logical block 1586
lost page write due to I/O error on dm-1
Aborting journal on device dm-1.
Buffer I/O error on device dm-1, logical block 1027
lost page write due to I/O error on dm-1
And I ended up with a read only filesystem. Running multipath dry shows that all that paths have failed, more specifically the standby paths did not come active:
# multipath -d -ll
sdd: checker msg is “readsector0 checker reports path is down”
sde: checker msg is “readsector0 checker reports path is down”
36001c23000d59fc600000284478bcdcadm-0 DELL,MD3000i
[size=558G][features=0][hwhandler=0]
\_ round-robin 0 [prio=1][enabled]
\_ 1:0:0:0 sdb 8:16 [failed][ready]
\_ round-robin 0 [prio=1][enabled]
\_ 4:0:0:0 sdc 8:32 [failed][ready]
\_ round-robin 0 [prio=0][enabled]
\_ 2:0:0:0 sdd 8:48 [failed][faulty]
\_ round-robin 0 [prio=0][enabled]
\_ 3:0:0:0 sde 8:64 [failed][faulty]
Futzing around a bit they would, obviously a unacceptable failure for the design. I noticed lenny, which has 2.6.24 instead of 2.6.18 has the rdac modules:
linux-image-2.6.24-1-686: /lib/modules/2.6.24-1-686/kernel/drivers/md/dm-rdac.ko
multipath-tools: /sbin/mpath_prio_rdac
# multipath
/proc/misc: No entry for device-mapper found
Is device-mapper driver missing from kernel?
Failure to communicate with kernel device-mapper driver.
/proc/misc: No entry for device-mapper found
Is device-mapper driver missing from kernel?
Failure to communicate with kernel device-mapper driver.
Incompatible libdevmapper 1.02.25 (2008-04-10)(compat) and kernel driver
# modprobe dm_mod
# multipath
DM multipath kernel driver not loaded
# modprobe dm-multipath
# multipath
error calling out /lib/udev/scsi_id -g -u -s /block/sda
create: 36001e4f0003968c60000000000000000 DELL ,Universal Xpor
[size=20M][features=0][hwhandler=0]
\_ round-robin 0 [prio=1][undef]
\_ 2:0:0:31 sdc 8:32 [undef][ready]
\_ round-robin 0 [prio=1][undef]
\_ 3:0:0:31 sde 8:64 [undef][ready]
create: 36001c23000d59fc60000000000000000 DELL ,Universal Xpor
[size=20M][features=0][hwhandler=0]
\_ round-robin 0 [prio=1][undef]
\_ 1:0:0:31 sdb 8:16 [undef][ready]
\_ round-robin 0 [prio=1][undef]
\_ 4:0:0:31 sdd 8:48 [undef][ready]
# multipath -d -ll
36001c23000d59fc60000000000000000dm-1 DELL ,Universal Xpor
[size=20M][features=0][hwhandler=0]
\_ round-robin 0 [prio=1][active]
\_ 1:0:0:31 sdb 8:16 [active][ready]
\_ round-robin 0 [prio=1][enabled]
\_ 4:0:0:31 sdd 8:48 [active][ready]
36001e4f0003968c60000000000000000dm-0 DELL ,Universal Xpor
[size=20M][features=0][hwhandler=0]
\_ round-robin 0 [prio=1][active]
\_ 2:0:0:31 sdc 8:32 [active][ready]
\_ round-robin 0 [prio=1][enabled]
\_ 3:0:0:31 sde 8:64 [active][ready]
The kicker here is seeing ‘size=20M’ which gives away that we’re only seeing the access partition. I had logged in before adding the host to virtual disk mapping so I ran ‘iscsiadm -m session -R’ to rescan the disks and then ‘multipath -F’ to flush the mapping to the access partition. Still not getting the disks:
sd 1:0:0:31: [sdb] Unit Not Ready
sd 1:0:0:31: [sdb] Sense Key : Illegal Request [current]
sd 1:0:0:31: [sdb] Add. Sense: Logical unit not supported
sd 1:0:0:31: [sdb] READ CAPACITY failed
sd 1:0:0:31: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
sd 1:0:0:31: [sdb] Sense Key : Illegal Request [current]
sd 1:0:0:31: [sdb] Add. Sense: Logical unit not supported
sd 1:0:0:31: [sdb] Write Protect is off
sd 1:0:0:31: [sdb] Mode Sense: 0b 00 10 08
sd 1:0:0:31: [sdb] Got wrong page
sd 1:0:0:31: [sdb] Assuming drive cache: write through
I logged out and back in (iscsiadm -m node -u ; iscsiadm -m node -l) and the disks showed up:
# multipath
error calling out /lib/udev/scsi_id -g -u -s /block/sda
sdc: checker msg is “directio checker reports path is down”
sdd: checker msg is “directio checker reports path is down”
reload: 36001c23000d59fc600000284478bcdca DELL ,MD3000i
[size=558G][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][undef]
\_ 5:0:0:0 sdc 8:32 [undef][faulty]
\_ round-robin 0 [prio=1][undef]
\_ 6:0:0:0 sdb 8:16 [active][ready]
\_ round-robin 0 [prio=0][undef]
\_ 8:0:0:0 sdd 8:48 [undef][faulty]
\_ round-robin 0 [prio=1][undef]
\_ 7:0:0:0 sde 8:64 [active][ready]
Swapping the preferred path around basically required running multipath each time so it would detect that the paths had changed. Running multipath is the job of multipathd so I checked and saw it hadn’t been started by installing multipath-tools, so I started it (/etc/init.d/multipath-tools start), after which I had no I/O problems touching and rm’ing files on the filesystem while swapping back and forth the preferred path in MDSM.
I created /etc/multipath.conf, based from here:
devices {
device {
vendor DELL
product MD3000i
hardware_handler "1 rdac"
path_checker rdac
path_grouping_policy group_by_prio
prio_callout "/sbin/mpath_prio_rdac /dev/%n"
failback immediate
getuid_callout "/lib/udev/scsi_id -g -u -s /block/%n"
}
}
multipaths {
mulitpath {
device {
vendor DELL
product MD3000i
}
}
}
And then reset up multipath:
# /etc/init.d/multipath-tools restart
Stopping multipath daemon: multipathd.
Starting multipath daemon: multipathd.
# multipath -F
libdevmapper: libdm-common.c(374): Removed /dev/mapper/36001c23000d59fc600000284478bcdca-part1
libdevmapper: libdm-common.c(374): Removed /dev/mapper/36001c23000d59fc600000284478bcdca
# multipath -ll
36001c23000d59fc600000284478bcdcadm-0 DELL ,MD3000i
[size=558G][features=0][hwhandler=1 rdac]
\_ round-robin 0 [prio=6][active]
\_ 5:0:0:0 sdc 8:32 [active][ready]
\_ 8:0:0:0 sdd 8:48 [active][ready]
\_ round-robin 0 [prio=0][enabled]
\_ 6:0:0:0 sdb 8:16 [active][ghost]
\_ 7:0:0:0 sde 8:64 [active][ghost]
Flipping the preferred path this way, I saw a lot less I/O errors in the dmesg output. I’m still not sure what the RDAC path checker does exactly, but it appears to work cleaner.