Monthly Archives: June 2008

Secure Connection Failed

Secure Connection Failed
Hostname uses an invalid security certificate.
The certificate is only valid for *example.org
(Error code: ssl_error_bad_cert_domain)

Firefox 3 produces this error a lot for me. Mostly because I’m using local ssl sites by their hostname rather than their fqdn and the cert only has the fqdn in it. The solution is going to be setting up the hostname in apache as a separate site (servername) rather than a serveralias, and having a rewrite rule to send it to the full site. Of course I’ll need a bunch of code to autogenerate certificates I think, and sign them, which sounds like a terrible bore.

In the interim, the FF3 error is really tough, it’s a few clicks to get through it rather than formerly just being able to acknowledge as much and continue on. Setting ‘browser.xul.error_pages.expert_bad_cert‘ in ‘about:config’ to true helps a lot, as you don’t get the popup anymore and you just have to click ‘Add exception’ then ‘confirm security exception’.

preseeding with dbconfig-common

I’ve been playing around with a puppet recipe for ocsng and trying to get a preseed working that would create the database rather than debconf popping up with questions. This task had a number of difficulties.

1) Running the install by hand with apt-get never asked if I wanted to use another host for my mysql database. This is probably a bug, although I never tried changing my debconf priority level. because I eventually figured out from some source to use ‘method select tcp/ip’.

2) I started running into errors with debconf-set-selections erroring with ‘error: Cannot find a question for …’ Interestingly the only other place I saw this error was where someone was trying to do something similar with puppet and glpi in Puppet ticket #1213. I eventually found the series of events that caused this and how to work around them thanks to fjp and cjwatson (these two guys always seem to save my ass) giving me the right places to look. More about that in Debian bug #487300.

3) There isn’t an etch package for ocsng. Not a big deal, the lenny/testing packages don’t really have any new dependencies.

I thought there was something else, but maybe it was that easy. Here’s my seed file as an example:
ocsinventory-server ocsinventory-server/dbconfig-install boolean true
ocsinventory-server ocsinventory-server/mysql/admin-pass password supersecret
ocsinventory-server ocsinventory-server/mysql/method select tcp/ip
ocsinventory-server ocsinventory-server/mysql/app-pass password kindasecret
ocsinventory-server ocsinventory-server/remote/host select mysql01.example.org
ocsinventory-server ocsinventory-server/remote/newhost string mysql01.example.org
ocsinventory-server ocsinventory-server/database-type select mysql
ocsinventory-server ocsinventory-server/db/dbname string ocsweb

open source friendly

I was reading an article at The Register today about more yahoo executives quitting, specifically Joshua Schachter, the founder of delicious. The article includes a note about Jeremy Zawodny, who left Yahoo recently to sign on at craigstlist:

He said in a blog post yesterday: “Over the course of about three seconds, something clicked in my little brain, and I realised that Craigslist is a pretty unique combination of things: a small company with a solid financial base, a great service that I use myself, a focused group of people who really care about doing things well, and an open source-friendly environment.”

I’ve added the emphasis there. I was talking to a friend last night about how at lot of IT people, especially ones I know, are much more connected to the technology they use and communities around them than the actual product of their companies. I think I tend to keep that to myself as I feel most of the non technical people I work with are apt to confuse that as not caring about the company. Well, perhaps, but we care immensely about what we do, and you hired us to do what we do. How many 20% projects have turned into products? Executives of the world, are you listening?

debian, dell md3000i, dm_multipath and path checking

First, this article, albeit a little step by step and thus simple(?) at times, is really excellent. This article by Dell is worth reading as well as it uses a number of terms/concepts that may not be familiar to non-storage administrators.

On a couple earlier posts about dm_multipath (1, 2), ‘paul’ had commented “I see some errors in your configuration. The problem is that you are using readsector0 for path checking instead of RDAC and a wrong hwhandler.” He said following examples here worked in his situation, but didn’t elaborate on what his situation was exactly. That article/benchmark says:

After trying the array successfully with Fedora Core 5, CentOS5 (which is RHEL 5 64bit) and exploring all the above issues, in the end I settled on SuSE SLES-10-SP1 x86_64 (Suse 10 service pack 1 for 64bit) and used it as-is, there was no need to install anything other than the Java “SMdevices/SMmonitor/SMagent” stuff on the resource CD.

It’s work noting that those are all RPM based distributions. No surprise since Dell appears to support them in some way although as usual, YMMV with any enterprise support. ‘paul’ failed to say why configuring dm_multipath this way is a configuration error, so I set out to read more. It’s important to make the distinction between the MD3000 in that article and the MD3000i which I have.

The MD3000 is traditional Direct-Attached-Storage (DAS) and uses SAS 8470 cables to connect to SAS HBAs in the host. In Highly-Available (HA) mode, you put two HBAs in each of two hosts and connect one HBA in each host to one of the two controllers in the MD3000.

The MD3000i is an iSCSI Storage-Area-Network (SAN) and uses regular gigabit ethernet to interconnect to up to 16 hosts. It’s recommend to use two separate switches and two network cards per host, creating multiple physical paths to each controller on the MD3000i.

My brain had trouble for a while separating DRAC (Dell Remote Access Controller), which is IPMI like Dell kit from RDAC (Redundant Disk Array Controller ). The benchmark article mentions that the MD3000i is an awful lot like an IBM DS4100. Dell likes rebranding gear, so maybe the MD3000i is just an IBM N3700 or something (I don’t have enough interest to poke through that data sheets). I mention it though because RDAC is a technology in a lot of IBMs products so you can sometimes find more information search for ‘IBM RDAC’ than Dell.

When I boot up, I only have two paths to a virtual disk:

# multipath -d -ll
sdb: checker msg is “readsector0 checker reports path is down”
sdc: checker msg is “readsector0 checker reports path is down”
36001c23000d59fc600000284478bcdcadm-0 DELL,MD3000i
[size=558G][features=0][hwhandler=0]
\_ round-robin 0 [prio=1][active]
\_ 2:0:0:0 sdd 8:48 [active][ready]
\_ round-robin 0 [prio=1][enabled]
\_ 3:0:0:0 sde 8:64 [active][ready]

Which is across the active controller. If I switch the preferred path in MDSM the disk fails:

# ls
ls: reading directory .: Input/output error
# multipath -d -ll
sdb: checker msg is “readsector0 checker reports path is down”
sdc: checker msg is “readsector0 checker reports path is down”
sdd: checker msg is “readsector0 checker reports path is down”
sde: checker msg is “readsector0 checker reports path is down”
36001c23000d59fc600000284478bcdcadm-0 DELL,MD3000i
[size=558G][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][enabled]
\_ 2:0:0:0 sdd 8:48 [failed][faulty]
\_ round-robin 0 [prio=0][enabled]
\_ 3:0:0:0 sde 8:64 [failed][faulty]

Running multipath once picks up the other paths:

# multipath
error calling out /sbin/scsi_id -g -u -s /block/sda
sdd: checker msg is “readsector0 checker reports path is down”
sde: checker msg is “readsector0 checker reports path is down”
sdd: checker msg is “readsector0 checker reports path is down”
sde: checker msg is “readsector0 checker reports path is down”
reload: 36001c23000d59fc600000284478bcdca DELL,MD3000i
[size=558G][features=0][hwhandler=0]
\_ round-robin 0 [prio=1][undef]
\_ 1:0:0:0 sdb 8:16 [undef][ready]
\_ round-robin 0 [prio=1][undef]
\_ 4:0:0:0 sdc 8:32 [undef][ready]
\_ round-robin 0 [prio=0][undef]
\_ 2:0:0:0 sdd 8:48 [failed][faulty]
\_ round-robin 0 [prio=0][undef]
\_ 3:0:0:0 sde 8:64 [failed][faulty]

# multipath -d -ll
sdd: checker msg is “readsector0 checker reports path is down”
sde: checker msg is “readsector0 checker reports path is down”
36001c23000d59fc600000284478bcdcadm-0 DELL,MD3000i
[size=558G][features=0][hwhandler=0]
\_ round-robin 0 [prio=1][enabled]
\_ 1:0:0:0 sdb 8:16 [active][ready]
\_ round-robin 0 [prio=1][enabled]
\_ 4:0:0:0 sdc 8:32 [active][ready]
\_ round-robin 0 [prio=0][enabled]
\_ 2:0:0:0 sdd 8:48 [active][faulty]
\_ round-robin 0 [prio=0][enabled]
\_ 3:0:0:0 sde 8:64 [active][faulty]

If I now remount the filesystem and change the preferred path back, things work okay. You can see device-mapper failing the paths in the dmesg output:

end_request: I/O error, dev sdb, sector 794703
device-mapper: multipath: Failing path 8:16.
end_request: I/O error, dev sdb, sector 71
end_request: I/O error, dev sdb, sector 8279
end_request: I/O error, dev sdb, sector 12375
end_request: I/O error, dev sdb, sector 794711
end_request: I/O error, dev sdc, sector 794703
device-mapper: multipath: Failing path 8:32.
end_request: I/O error, dev sdc, sector 794711
end_request: I/O error, dev sdc, sector 71
end_request: I/O error, dev sdc, sector 8279
end_request: I/O error, dev sdc, sector 12375

But touching some files and switching again, things went down hill:

device-mapper: multipath: Failing path 8:48.
end_request: I/O error, dev sde, sector 12735
device-mapper: multipath: Failing path 8:64.
Buffer I/O error on device dm-1, logical block 1586
lost page write due to I/O error on dm-1
Aborting journal on device dm-1.
Buffer I/O error on device dm-1, logical block 1027
lost page write due to I/O error on dm-1

And I ended up with a read only filesystem. Running multipath dry shows that all that paths have failed, more specifically the standby paths did not come active:

# multipath -d -ll
sdd: checker msg is “readsector0 checker reports path is down”
sde: checker msg is “readsector0 checker reports path is down”
36001c23000d59fc600000284478bcdcadm-0 DELL,MD3000i
[size=558G][features=0][hwhandler=0]
\_ round-robin 0 [prio=1][enabled]
\_ 1:0:0:0 sdb 8:16 [failed][ready]
\_ round-robin 0 [prio=1][enabled]
\_ 4:0:0:0 sdc 8:32 [failed][ready]
\_ round-robin 0 [prio=0][enabled]
\_ 2:0:0:0 sdd 8:48 [failed][faulty]
\_ round-robin 0 [prio=0][enabled]
\_ 3:0:0:0 sde 8:64 [failed][faulty]

Futzing around a bit they would, obviously a unacceptable failure for the design. I noticed lenny, which has 2.6.24 instead of 2.6.18 has the rdac modules:

linux-image-2.6.24-1-686: /lib/modules/2.6.24-1-686/kernel/drivers/md/dm-rdac.ko
multipath-tools: /sbin/mpath_prio_rdac

# multipath
/proc/misc: No entry for device-mapper found
Is device-mapper driver missing from kernel?
Failure to communicate with kernel device-mapper driver.
/proc/misc: No entry for device-mapper found
Is device-mapper driver missing from kernel?
Failure to communicate with kernel device-mapper driver.
Incompatible libdevmapper 1.02.25 (2008-04-10)(compat) and kernel driver

# modprobe dm_mod
# multipath
DM multipath kernel driver not loaded

# modprobe dm-multipath
# multipath
error calling out /lib/udev/scsi_id -g -u -s /block/sda
create: 36001e4f0003968c60000000000000000 DELL ,Universal Xpor
[size=20M][features=0][hwhandler=0]
\_ round-robin 0 [prio=1][undef]
\_ 2:0:0:31 sdc 8:32 [undef][ready]
\_ round-robin 0 [prio=1][undef]
\_ 3:0:0:31 sde 8:64 [undef][ready]
create: 36001c23000d59fc60000000000000000 DELL ,Universal Xpor
[size=20M][features=0][hwhandler=0]
\_ round-robin 0 [prio=1][undef]
\_ 1:0:0:31 sdb 8:16 [undef][ready]
\_ round-robin 0 [prio=1][undef]
\_ 4:0:0:31 sdd 8:48 [undef][ready]
# multipath -d -ll
36001c23000d59fc60000000000000000dm-1 DELL ,Universal Xpor
[size=20M][features=0][hwhandler=0]
\_ round-robin 0 [prio=1][active]
\_ 1:0:0:31 sdb 8:16 [active][ready]
\_ round-robin 0 [prio=1][enabled]
\_ 4:0:0:31 sdd 8:48 [active][ready]
36001e4f0003968c60000000000000000dm-0 DELL ,Universal Xpor
[size=20M][features=0][hwhandler=0]
\_ round-robin 0 [prio=1][active]
\_ 2:0:0:31 sdc 8:32 [active][ready]
\_ round-robin 0 [prio=1][enabled]
\_ 3:0:0:31 sde 8:64 [active][ready]

The kicker here is seeing ‘size=20M’ which gives away that we’re only seeing the access partition. I had logged in before adding the host to virtual disk mapping so I ran ‘iscsiadm -m session -R’ to rescan the disks and then ‘multipath -F’ to flush the mapping to the access partition. Still not getting the disks:

sd 1:0:0:31: [sdb] Unit Not Ready
sd 1:0:0:31: [sdb] Sense Key : Illegal Request [current]
sd 1:0:0:31: [sdb] Add. Sense: Logical unit not supported
sd 1:0:0:31: [sdb] READ CAPACITY failed
sd 1:0:0:31: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
sd 1:0:0:31: [sdb] Sense Key : Illegal Request [current]
sd 1:0:0:31: [sdb] Add. Sense: Logical unit not supported
sd 1:0:0:31: [sdb] Write Protect is off
sd 1:0:0:31: [sdb] Mode Sense: 0b 00 10 08
sd 1:0:0:31: [sdb] Got wrong page
sd 1:0:0:31: [sdb] Assuming drive cache: write through

I logged out and back in (iscsiadm -m node -u ; iscsiadm -m node -l) and the disks showed up:

# multipath
error calling out /lib/udev/scsi_id -g -u -s /block/sda
sdc: checker msg is “directio checker reports path is down”
sdd: checker msg is “directio checker reports path is down”
reload: 36001c23000d59fc600000284478bcdca DELL ,MD3000i
[size=558G][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][undef]
\_ 5:0:0:0 sdc 8:32 [undef][faulty]
\_ round-robin 0 [prio=1][undef]
\_ 6:0:0:0 sdb 8:16 [active][ready]
\_ round-robin 0 [prio=0][undef]
\_ 8:0:0:0 sdd 8:48 [undef][faulty]
\_ round-robin 0 [prio=1][undef]
\_ 7:0:0:0 sde 8:64 [active][ready]

Swapping the preferred path around basically required running multipath each time so it would detect that the paths had changed. Running multipath is the job of multipathd so I checked and saw it hadn’t been started by installing multipath-tools, so I started it (/etc/init.d/multipath-tools start), after which I had no I/O problems touching and rm’ing files on the filesystem while swapping back and forth the preferred path in MDSM.

I created /etc/multipath.conf, based from here:

devices {
        device {
                vendor                  DELL
                product                 MD3000i
                hardware_handler        "1 rdac"
                path_checker            rdac
                path_grouping_policy    group_by_prio
                prio_callout            "/sbin/mpath_prio_rdac /dev/%n"
                failback                immediate
                getuid_callout          "/lib/udev/scsi_id -g -u -s /block/%n"
        }
}
multipaths {
        mulitpath {
                device {
                        vendor DELL
                        product MD3000i
                }
        }
}

And then reset up multipath:

# /etc/init.d/multipath-tools restart
Stopping multipath daemon: multipathd.
Starting multipath daemon: multipathd.
# multipath -F
libdevmapper: libdm-common.c(374): Removed /dev/mapper/36001c23000d59fc600000284478bcdca-part1
libdevmapper: libdm-common.c(374): Removed /dev/mapper/36001c23000d59fc600000284478bcdca

# multipath -ll
36001c23000d59fc600000284478bcdcadm-0 DELL    ,MD3000i
[size=558G][features=0][hwhandler=1 rdac]
\_ round-robin 0 [prio=6][active]
\_ 5:0:0:0 sdc 8:32  [active][ready]
\_ 8:0:0:0 sdd 8:48  [active][ready]
\_ round-robin 0 [prio=0][enabled]
\_ 6:0:0:0 sdb 8:16  [active][ghost]
\_ 7:0:0:0 sde 8:64  [active][ghost]

Flipping the preferred path this way, I saw a lot less I/O errors in the dmesg output. I’m still not sure what the RDAC path checker does exactly, but it appears to work cleaner.

Linux certifications and releases

Every once in a while I head over to CertCities to see if they’ve finally gotten around to another Hottest Certs for xxxx, which we haven’t seen one since 12/2005 or certification salary surveys which have fallen behind the times as well. I collect certifications now and then. Initially I picked up a bunch of Microsoft certifications to get a foot hold on the Seattle market after moving here. Now they’re not so important because I work for startups where Microsoft comprehension is essential, but challenges lie elsewhere, mostly in Open Source. A while back I went and got an LPIC-1 and LPIC-2, feeling like I should have a Linux certification but didn’t have the time or money for the RHCE lab, or any respect for the CompTIA Linux+. I got an email from LPI today for a survey they’re conducting about where LPI should go from here, which made me head back to CertCities and I found a number of recent articles by Emmett Dulaney about Linux that made me send him a couple of emails.

One, “Pondering Ubuntu 8.04“, subtitled “Did the few minor tweaks included in the latest version of Ubuntu actually warrant a new release? Emmett’s not so sure.” is about how the lack of new features in hardy doesn’t justify the release. It misses every point of the release cycle, and even comments about how everyone hated Microsoft for making regular releases. Well, because we had to pay for them each time, maybe?

To the folks that think upgrading from Server 2000 to Server 2003 is good because it’s new, you simply present Ubuntu as 7.10 and 8.04. When interacting with colleagues we usually refer to releases by short name such as ‘gutsy’ or ‘hardy’, which allows interjecting debian releases like ‘etch’ and ‘lenny’ without having to specify the distribution explicitly.

Of course a suitable reason for 8.04 enough is the release cycle. Debian has an amazing framework but releases are slow. Debian etch was initially released in 2007-04 and we’re hoping that lenny will be out this year, but we’ll see. Just yesterday I had to back port packages from lenny to etch because each release gets security updates, not version updates, so you have to wait for the next release for the version updates or go through the trouble of doing the backport yourself.

One might question why backport a deb package when you can simply install the new software and the answer is one of configuration management. Whenever I inherit a network full of linux systems I have to try to figure out what software was installed where. There are many instances where the same software will have been installed as different versions by different people over the years and it’s difficult to tell which is being used. Packaging solves this because (slotting aside) there’s one version installed and you can use packaging software to tell exactly what files belong to that package and
where they are.

While this may not seem of immediate benefit to a single user, it is because it’s essential to troubleshooting user problems for those that provide support, in the case of Ubuntu, mostly for free.

While Hardy may not have any visually apparent and stunning changes, I assure you there are lots of updates behind the desktop that are well worth the appreciation.

The other was, “Linux Certs and the Cutting Edge“, subtitled “Some certifications seem stuck in the Dark Ages. Plus, Book of the Week toes the command line.” This article goes on to talk about how “df, du, kill, ls, mv, rm, tar, umask, vi and so on” are on all the tests and offers that it’s because of the “commonality between the distributions”, not because these are all essential utilities. Anyways, here’s my email:

CompTIA is always a terrible example of certifications because it’s so entry level. I can’t complain a whole lot because it’s respected and besides questions that I consider obscure to my job roles (like fixing laser printers) it’s pretty easy to pass the tests.

“df, du, kill, ls, mv, rm, tar, umask, vi and so on”

These are all -essential-. I would never hire someone who failed to explain exactly what each of these tools does. I feel like LPI certifications may be a little overboard because they expect you to know what certain flags do for each command, when you can always look them up in the man page. But knowing the difference between tar -z and tar -j is always a good thing.

For example though, we have a fairly complex configuration employing debian linux hosting and as a guest on vmware-server, with configuration management by puppet with git, and capistrano for system administration. While someone with experience with these things is good, the following is a piece of a puppet recipe I wrote:

# set linux clock algorithm
# non rescue (single) kernel lines in grub config that don’t have a clock algorithm set get set to pit
# best to run this regularly (this will run everytime) so that new kernel installs get this added
# there is the edge case that a kernel is upgraded and we don’t wait for puppet to run before the reboot
exec { “set-vmware-clock”:
command => “/bin/sed -ie ‘/clock\|single/! s/^kernel.*/& clocksource=pit/ ‘ /boot/grub/menu.lst”,
onlyif => “/bin/grep ‘^kernel’ /boot/grub/menu.lst | /bin/grep -v ‘single’ | /bin/grep -v ‘clock'”,
}

If someone can’t look at that and tell me what it does, they’re not getting hired here. They don’t need to know so much about puppet, that they can figure out, even just by looking at the recipe you get a good idea of what the puppet portion of the configuration is for. But if you’re not familiar with standard tools, you’re not going to get much done, regardless of how much you may know about something like puppet. If you look at that and know that ‘grep’ returns a line of text, but don’t know that ‘-v’ makes it exclude that line, you’re going to miss the point of that recipe.

The key isn’t that these tools are distribution neutral, giving you a lot of common ground. The key is that these tools are extremely powerful provided you know how to use them. The more you familiarize yourself with them, the more you can chain them together and make more powerful solutions.