Category Archives: Uncategorized

wireshark and tcpdump monitoring of vlans on debian

I was troubleshooting some vlan issues today and was annoyed that when dumping eth0 (a trunk) I was not seeing the vlan headers for each packet. This makes it pretty difficult to tell which vlan the traffic is destined for, because I have to dump the vlan interface and the physical interface at the same time and use deduction.

I found this in the man page for vconfig which explains that Broadcom devices will strip vlan information:

VLAN will use Broadcom’s NICE interface when the network device supports it. This is necessary, since usually the hardware of these devices already  removes  the vlan tag from the ethernet packet. The set_flag option on vlan-devices created on such a physical network device will be ignored.  Dumping the network-device will show only untagged(non-vlan) traffic, and dumping the  vlan-devices  will  only  show  traffic intended for that vlan, without the tags.

creating different configurations on each node using capistrano

This is probably in the top ten lamest hacks of my life. But I blame capistrano for not being what I want it to be.

This replaces BINDIPADDRESS in the configuration file with the IP address of eth0 on each host.

def get_ip_address (ethernet_device = "eth0")
  ip_addresses = Hash.new
  run("/sbin/ifconfig #{ethernet_device} | grep inet | grep -v inet6") do |ch, stream, data|
    case data
    when /s*inet addr:((d+.){3}d+)s+.*/
      ip_addresses["#{ch[:host]}"] = $1
    end
  end
  return ip_addresses
end  
task :put_auto_install_xml, :roles => :app do
    ip_addresses = get_ip_address()
    auto_install =
      IO.read(
        File.join(File.dirname(__FILE__), "files", "#{application}", "#{application}_auto_install_#{env}.xml")
      )
    put(
      auto_install,
      "/tmp/#{application}_auto_install-#{env}-#{build}.xml",
      :mode => 0644
    )
    run("/bin/bash") do |ch, stream, data|
      ch.send_data "/bin/sed -i -e 's/BINDIPADDRESS/#{ip_addresses[ch[:host]]}/' /tmp/#{application}_auto_install-#{env}-#{build}.xmln"
      ch.send_data "exitn"
    end
  end

booting debian-installer with a static ip address

1) When preseeding a debian install, you need the network up to be able to get to the preseed file.
2) The kernel used by debian-installer defaults to only allowing 256 characters on the append/cmdline.

Phil Hands did some stuff for etch / debian-installer to try to help out with this. This is the auto-install udeb. Most importantly is trying to get all the network information on the command line. We can do some tricks to save space.

1) Use preseed aliases where possible (Debian Manual, Appendix B 2.4)
2) Use ‘auto=true’ on the append line. This will use ai-choosers to delay locale/keyboard so you can specify these in your preseed file instead. It also lets you set ‘url=hostname.domain.org’ rather than a full path to your preseed file, but you’ll need symlinks so that ‘http://hostname/d-i/etch/preseed.cfg’ points to your config file. This is a little obtuse in the directions because most documentation talks about ‘auto’, but this is actually a label in the default debian pxelinux/syslinux.cfg files. Look in debian-installer/i386/pxelinux.cfg from the netboot.tar.gz for an example.
3) Setup symlinks for your initrd so you can use something like ‘initrd=d/e/3/i” which is short for ‘initrd=debian/etch/i386/initrd.gz’

My append line ends up looking something like this:

append initrd=d/e/3/i auto=true url=debian.example.org hostname=myhost \
domain=example.org interface=eth0 netcfg/get_ipaddress=10.0.0.100 \
netcfg/get_netmask=255.255.255.0 netcfg/get_gateway=10.0.0.1 \
netcfg/get_nameservers=10.0.0.250 netcfg/disable_dhcp=true

It’s unfortunate that you’re required to add ‘netcfg/disable_dhcp=true’ to get netcfg to acknowledge that you’ve given it a static configuration. that’s a lot of wasted cmdline that may get you in trouble if you’re using high octets in ip addresses or a long hostname. That’s exactly 256 characters right there.

If I keep having problems with this, I’ll probably start using DHCP reservations instead of static addressing. You can even push the url out via dhcp. I’ve also gone back and shortened my url hostname to ‘d.example.org’ to gain a few more characters. I probably could just recompile the kernel to support more than 256 chars on the command line as well.

scripting vmware serial number configuration

I deploy vmware-server with puppet and wanted to automate the installation of a serial number. I thought I was going to use ‘vmware-cmd’ for this, but I got the truth out of vmware-cmd.pl, that you use ‘vmware-vmx’. I really need to stop using vmware, most of it’s features seem to be that it was windows admin friendly gui interfaces all over the place, and while it has API’s, they’re poorly documented because they just aren’t that widely used. I story the key manually in iclassify for now, producing the vmwarekey variable because it’s an attribute.

if $vmwarekey {

  exec { "vmware-server-license":

    command => "/usr/lib/vmware-server/bin/vmware-vmx --new-sn ${vmwarekey}",

    onlyif => "/usr/bin/test ! -f /etc/vmware/license.vs.1.0-00",

    require => Package["vmware-server"],

  }

}

listing multiple hostnames on an ssh_config Host line

I wanted to use ssh authentication agent forwarding on a few hosts at work because I only keep my private keys at work on my workstation. You enable this on a case by case agent with the ‘-A’ option, but I don’t like remember such things, so you can add ‘ForwardAgent yes’ to your ~/.ssh/config file. However, it seemed unneeded to enable this for all hosts, as it’s got a slight security issue or two. You can limit the commands to a certain host by preceding the command with a ‘Host server_hostname’ line, but ‘server_hostname’ must match whatever you write on the command line. You can use wildcards to match a domain like ‘*.example.org’ but I really only use hostnames not fqdns when I use ssh, so I wanted to list multiple hostnames. You can do this like so:

Host *
ForwardX11 yes
ForwardX11Trusted no

Host ops ops.example.org secretbox otherbox
ForwardAgent yes

request-tracker and winmail.dat or MS-TNEF

Outlook likes putting attachments and other crap in a stupid attachment called ‘winmail.dat’. Request-tracker doe not like this and neither do I. On debian, install the ‘ytnef’ package. ‘/usr/bin/ytnef-filter’ appears to be the ‘ytnefprocess.pl’ script metioned in the README file. Make sure you’ve got the perl package MIME requrements. I didn’t have to but you may need to install the ‘libmime-perl’ package. Then modify your RT email configuration to look like this:

helpdesk: "|/usr/bin/ytnef-filter|/usr/bin/rt-mailgate \
--queue helpdesk --action correspond \
 --url http://rt.example.com"
helpdesk-comment: "|/usr/bin/ytnef-filter|/usr/bin/rt-mailgate \
--queue helpdesk --action comment \
 --url http://rt.example.com"

That’s two lines altogether. WordPress sucks with this stuff. It’ll filter out the attachments it can, the useful ones like actual file attachments and leave the rest alone, like the proprietary MS crap.

running rserve on debian etch

I had not heard of R until very recently when I was asked to set up the ‘server’ interface to R on a box. Apparently a bunch of math dudes thought they were too smart for bc and wrote R… yeah, I dunno. Lots of parts of R seem like a reinvention of the wheel. R has it’s own packaging system, like CPAN, called CRAN. It’s a little frustrating to have cran, cpan, gem, and all these different different methods to install libraries. I’m glad debian is still taking to packaging them up in debs.

I had to rebuild the R packages from sid on etch as the binary packages were built against a newer version of libc. I did this by adding a deb-src line for sid to sources.list:

deb-src http://ftp.us.debian.org/debian sid main

And then apt-getting the source for each package and building it (dpkg-buildpackage -rfakeroot). This was a lot of work because of long list of build dependencies which I had to backport, build and install (including tcl/tk). When the dependencies for ‘r-cran-rserve’ were finally met I was good to go.

R has some cheesy imitation of tcp-wrappers built in. It’ll bind to 0.0.0.0 by default, but will drop connections from another host. Without even looking at the code I’m sure it’s crap, but you need to put “remote enable” in /etc/Rserv.conf, to disable this foolishness. This file didn’t exist out of any of the packages, but it’s just a flat config file. I won’t bother talking about the authentication infrastructure, except that it’s useless, and you should probably wrap this around ssl and certificates or whatnot if you care.

You can start Rserve with ‘R CMD Rserve’, but that’s a little lame. Or ‘/usr/lib/R/bin/Rserve’ which will probably need R_HOME defined (see next). There’s also ‘/usr/lib/R/bin/Rcmd Rserve’ which doesn’t need R_HOME defined.

Fatal error: R home directory is not defined
This isn’t a well written error. You need to export R_HOME=’/usr/lib/R’, not an actual home directory (not to be confused with the working directory either). I wanted to use runit but without recompiling Rserve it lacks an option to not fork.

setgid(1): failed.
apparently R has a system function that lets you run shell commands. you can use setuid/setgid in the config file, but setgid won’t work. I don’t know why.

monitoring normally open circuits with an em01b

The newer EEsensors em01b has a contact module for monitoring normally closed circuits. My HVAC has a common alarm circuit that can close if any of the alarms (such as the high water switch or low pressure switch) go off, but it’s normally open.

The logic behind normally closed circuits is that you can detect a wiring fault. I’m not sure why eesensors went with NC instead of NO. They’re generally pretty helpful at answering questions via email though.

The em01b normally returns ‘N’ in the contact section when open and a ‘W’ when closed. Normally you’re in ‘W’ and if it trips to ‘N’ you need to send a reset command to get it back to ‘W’ (/index.html?eL) after you fix the problem. So if you’re querying an attached circuit that is normally open, you need to send a reset every time you query, and keeping in mind that the result that’s returned from the reset command is not a valid query.

I modified the C version of their nagios script to support this with a ‘D’ option instead of a ‘C’ option. You can find the source in git at check_em01.c.

Secure Connection Failed

Secure Connection Failed
Hostname uses an invalid security certificate.
The certificate is only valid for *example.org
(Error code: ssl_error_bad_cert_domain)

Firefox 3 produces this error a lot for me. Mostly because I’m using local ssl sites by their hostname rather than their fqdn and the cert only has the fqdn in it. The solution is going to be setting up the hostname in apache as a separate site (servername) rather than a serveralias, and having a rewrite rule to send it to the full site. Of course I’ll need a bunch of code to autogenerate certificates I think, and sign them, which sounds like a terrible bore.

In the interim, the FF3 error is really tough, it’s a few clicks to get through it rather than formerly just being able to acknowledge as much and continue on. Setting ‘browser.xul.error_pages.expert_bad_cert‘ in ‘about:config’ to true helps a lot, as you don’t get the popup anymore and you just have to click ‘Add exception’ then ‘confirm security exception’.

preseeding with dbconfig-common

I’ve been playing around with a puppet recipe for ocsng and trying to get a preseed working that would create the database rather than debconf popping up with questions. This task had a number of difficulties.

1) Running the install by hand with apt-get never asked if I wanted to use another host for my mysql database. This is probably a bug, although I never tried changing my debconf priority level. because I eventually figured out from some source to use ‘method select tcp/ip’.

2) I started running into errors with debconf-set-selections erroring with ‘error: Cannot find a question for …’ Interestingly the only other place I saw this error was where someone was trying to do something similar with puppet and glpi in Puppet ticket #1213. I eventually found the series of events that caused this and how to work around them thanks to fjp and cjwatson (these two guys always seem to save my ass) giving me the right places to look. More about that in Debian bug #487300.

3) There isn’t an etch package for ocsng. Not a big deal, the lenny/testing packages don’t really have any new dependencies.

I thought there was something else, but maybe it was that easy. Here’s my seed file as an example:
ocsinventory-server ocsinventory-server/dbconfig-install boolean true
ocsinventory-server ocsinventory-server/mysql/admin-pass password supersecret
ocsinventory-server ocsinventory-server/mysql/method select tcp/ip
ocsinventory-server ocsinventory-server/mysql/app-pass password kindasecret
ocsinventory-server ocsinventory-server/remote/host select mysql01.example.org
ocsinventory-server ocsinventory-server/remote/newhost string mysql01.example.org
ocsinventory-server ocsinventory-server/database-type select mysql
ocsinventory-server ocsinventory-server/db/dbname string ocsweb

open source friendly

I was reading an article at The Register today about more yahoo executives quitting, specifically Joshua Schachter, the founder of delicious. The article includes a note about Jeremy Zawodny, who left Yahoo recently to sign on at craigstlist:

He said in a blog post yesterday: “Over the course of about three seconds, something clicked in my little brain, and I realised that Craigslist is a pretty unique combination of things: a small company with a solid financial base, a great service that I use myself, a focused group of people who really care about doing things well, and an open source-friendly environment.”

I’ve added the emphasis there. I was talking to a friend last night about how at lot of IT people, especially ones I know, are much more connected to the technology they use and communities around them than the actual product of their companies. I think I tend to keep that to myself as I feel most of the non technical people I work with are apt to confuse that as not caring about the company. Well, perhaps, but we care immensely about what we do, and you hired us to do what we do. How many 20% projects have turned into products? Executives of the world, are you listening?

debian, dell md3000i, dm_multipath and path checking

First, this article, albeit a little step by step and thus simple(?) at times, is really excellent. This article by Dell is worth reading as well as it uses a number of terms/concepts that may not be familiar to non-storage administrators.

On a couple earlier posts about dm_multipath (1, 2), ‘paul’ had commented “I see some errors in your configuration. The problem is that you are using readsector0 for path checking instead of RDAC and a wrong hwhandler.” He said following examples here worked in his situation, but didn’t elaborate on what his situation was exactly. That article/benchmark says:

After trying the array successfully with Fedora Core 5, CentOS5 (which is RHEL 5 64bit) and exploring all the above issues, in the end I settled on SuSE SLES-10-SP1 x86_64 (Suse 10 service pack 1 for 64bit) and used it as-is, there was no need to install anything other than the Java “SMdevices/SMmonitor/SMagent” stuff on the resource CD.

It’s work noting that those are all RPM based distributions. No surprise since Dell appears to support them in some way although as usual, YMMV with any enterprise support. ‘paul’ failed to say why configuring dm_multipath this way is a configuration error, so I set out to read more. It’s important to make the distinction between the MD3000 in that article and the MD3000i which I have.

The MD3000 is traditional Direct-Attached-Storage (DAS) and uses SAS 8470 cables to connect to SAS HBAs in the host. In Highly-Available (HA) mode, you put two HBAs in each of two hosts and connect one HBA in each host to one of the two controllers in the MD3000.

The MD3000i is an iSCSI Storage-Area-Network (SAN) and uses regular gigabit ethernet to interconnect to up to 16 hosts. It’s recommend to use two separate switches and two network cards per host, creating multiple physical paths to each controller on the MD3000i.

My brain had trouble for a while separating DRAC (Dell Remote Access Controller), which is IPMI like Dell kit from RDAC (Redundant Disk Array Controller ). The benchmark article mentions that the MD3000i is an awful lot like an IBM DS4100. Dell likes rebranding gear, so maybe the MD3000i is just an IBM N3700 or something (I don’t have enough interest to poke through that data sheets). I mention it though because RDAC is a technology in a lot of IBMs products so you can sometimes find more information search for ‘IBM RDAC’ than Dell.

When I boot up, I only have two paths to a virtual disk:

# multipath -d -ll
sdb: checker msg is “readsector0 checker reports path is down”
sdc: checker msg is “readsector0 checker reports path is down”
36001c23000d59fc600000284478bcdcadm-0 DELL,MD3000i
[size=558G][features=0][hwhandler=0]
\_ round-robin 0 [prio=1][active]
\_ 2:0:0:0 sdd 8:48 [active][ready]
\_ round-robin 0 [prio=1][enabled]
\_ 3:0:0:0 sde 8:64 [active][ready]

Which is across the active controller. If I switch the preferred path in MDSM the disk fails:

# ls
ls: reading directory .: Input/output error
# multipath -d -ll
sdb: checker msg is “readsector0 checker reports path is down”
sdc: checker msg is “readsector0 checker reports path is down”
sdd: checker msg is “readsector0 checker reports path is down”
sde: checker msg is “readsector0 checker reports path is down”
36001c23000d59fc600000284478bcdcadm-0 DELL,MD3000i
[size=558G][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][enabled]
\_ 2:0:0:0 sdd 8:48 [failed][faulty]
\_ round-robin 0 [prio=0][enabled]
\_ 3:0:0:0 sde 8:64 [failed][faulty]

Running multipath once picks up the other paths:

# multipath
error calling out /sbin/scsi_id -g -u -s /block/sda
sdd: checker msg is “readsector0 checker reports path is down”
sde: checker msg is “readsector0 checker reports path is down”
sdd: checker msg is “readsector0 checker reports path is down”
sde: checker msg is “readsector0 checker reports path is down”
reload: 36001c23000d59fc600000284478bcdca DELL,MD3000i
[size=558G][features=0][hwhandler=0]
\_ round-robin 0 [prio=1][undef]
\_ 1:0:0:0 sdb 8:16 [undef][ready]
\_ round-robin 0 [prio=1][undef]
\_ 4:0:0:0 sdc 8:32 [undef][ready]
\_ round-robin 0 [prio=0][undef]
\_ 2:0:0:0 sdd 8:48 [failed][faulty]
\_ round-robin 0 [prio=0][undef]
\_ 3:0:0:0 sde 8:64 [failed][faulty]

# multipath -d -ll
sdd: checker msg is “readsector0 checker reports path is down”
sde: checker msg is “readsector0 checker reports path is down”
36001c23000d59fc600000284478bcdcadm-0 DELL,MD3000i
[size=558G][features=0][hwhandler=0]
\_ round-robin 0 [prio=1][enabled]
\_ 1:0:0:0 sdb 8:16 [active][ready]
\_ round-robin 0 [prio=1][enabled]
\_ 4:0:0:0 sdc 8:32 [active][ready]
\_ round-robin 0 [prio=0][enabled]
\_ 2:0:0:0 sdd 8:48 [active][faulty]
\_ round-robin 0 [prio=0][enabled]
\_ 3:0:0:0 sde 8:64 [active][faulty]

If I now remount the filesystem and change the preferred path back, things work okay. You can see device-mapper failing the paths in the dmesg output:

end_request: I/O error, dev sdb, sector 794703
device-mapper: multipath: Failing path 8:16.
end_request: I/O error, dev sdb, sector 71
end_request: I/O error, dev sdb, sector 8279
end_request: I/O error, dev sdb, sector 12375
end_request: I/O error, dev sdb, sector 794711
end_request: I/O error, dev sdc, sector 794703
device-mapper: multipath: Failing path 8:32.
end_request: I/O error, dev sdc, sector 794711
end_request: I/O error, dev sdc, sector 71
end_request: I/O error, dev sdc, sector 8279
end_request: I/O error, dev sdc, sector 12375

But touching some files and switching again, things went down hill:

device-mapper: multipath: Failing path 8:48.
end_request: I/O error, dev sde, sector 12735
device-mapper: multipath: Failing path 8:64.
Buffer I/O error on device dm-1, logical block 1586
lost page write due to I/O error on dm-1
Aborting journal on device dm-1.
Buffer I/O error on device dm-1, logical block 1027
lost page write due to I/O error on dm-1

And I ended up with a read only filesystem. Running multipath dry shows that all that paths have failed, more specifically the standby paths did not come active:

# multipath -d -ll
sdd: checker msg is “readsector0 checker reports path is down”
sde: checker msg is “readsector0 checker reports path is down”
36001c23000d59fc600000284478bcdcadm-0 DELL,MD3000i
[size=558G][features=0][hwhandler=0]
\_ round-robin 0 [prio=1][enabled]
\_ 1:0:0:0 sdb 8:16 [failed][ready]
\_ round-robin 0 [prio=1][enabled]
\_ 4:0:0:0 sdc 8:32 [failed][ready]
\_ round-robin 0 [prio=0][enabled]
\_ 2:0:0:0 sdd 8:48 [failed][faulty]
\_ round-robin 0 [prio=0][enabled]
\_ 3:0:0:0 sde 8:64 [failed][faulty]

Futzing around a bit they would, obviously a unacceptable failure for the design. I noticed lenny, which has 2.6.24 instead of 2.6.18 has the rdac modules:

linux-image-2.6.24-1-686: /lib/modules/2.6.24-1-686/kernel/drivers/md/dm-rdac.ko
multipath-tools: /sbin/mpath_prio_rdac

# multipath
/proc/misc: No entry for device-mapper found
Is device-mapper driver missing from kernel?
Failure to communicate with kernel device-mapper driver.
/proc/misc: No entry for device-mapper found
Is device-mapper driver missing from kernel?
Failure to communicate with kernel device-mapper driver.
Incompatible libdevmapper 1.02.25 (2008-04-10)(compat) and kernel driver

# modprobe dm_mod
# multipath
DM multipath kernel driver not loaded

# modprobe dm-multipath
# multipath
error calling out /lib/udev/scsi_id -g -u -s /block/sda
create: 36001e4f0003968c60000000000000000 DELL ,Universal Xpor
[size=20M][features=0][hwhandler=0]
\_ round-robin 0 [prio=1][undef]
\_ 2:0:0:31 sdc 8:32 [undef][ready]
\_ round-robin 0 [prio=1][undef]
\_ 3:0:0:31 sde 8:64 [undef][ready]
create: 36001c23000d59fc60000000000000000 DELL ,Universal Xpor
[size=20M][features=0][hwhandler=0]
\_ round-robin 0 [prio=1][undef]
\_ 1:0:0:31 sdb 8:16 [undef][ready]
\_ round-robin 0 [prio=1][undef]
\_ 4:0:0:31 sdd 8:48 [undef][ready]
# multipath -d -ll
36001c23000d59fc60000000000000000dm-1 DELL ,Universal Xpor
[size=20M][features=0][hwhandler=0]
\_ round-robin 0 [prio=1][active]
\_ 1:0:0:31 sdb 8:16 [active][ready]
\_ round-robin 0 [prio=1][enabled]
\_ 4:0:0:31 sdd 8:48 [active][ready]
36001e4f0003968c60000000000000000dm-0 DELL ,Universal Xpor
[size=20M][features=0][hwhandler=0]
\_ round-robin 0 [prio=1][active]
\_ 2:0:0:31 sdc 8:32 [active][ready]
\_ round-robin 0 [prio=1][enabled]
\_ 3:0:0:31 sde 8:64 [active][ready]

The kicker here is seeing ‘size=20M’ which gives away that we’re only seeing the access partition. I had logged in before adding the host to virtual disk mapping so I ran ‘iscsiadm -m session -R’ to rescan the disks and then ‘multipath -F’ to flush the mapping to the access partition. Still not getting the disks:

sd 1:0:0:31: [sdb] Unit Not Ready
sd 1:0:0:31: [sdb] Sense Key : Illegal Request [current]
sd 1:0:0:31: [sdb] Add. Sense: Logical unit not supported
sd 1:0:0:31: [sdb] READ CAPACITY failed
sd 1:0:0:31: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
sd 1:0:0:31: [sdb] Sense Key : Illegal Request [current]
sd 1:0:0:31: [sdb] Add. Sense: Logical unit not supported
sd 1:0:0:31: [sdb] Write Protect is off
sd 1:0:0:31: [sdb] Mode Sense: 0b 00 10 08
sd 1:0:0:31: [sdb] Got wrong page
sd 1:0:0:31: [sdb] Assuming drive cache: write through

I logged out and back in (iscsiadm -m node -u ; iscsiadm -m node -l) and the disks showed up:

# multipath
error calling out /lib/udev/scsi_id -g -u -s /block/sda
sdc: checker msg is “directio checker reports path is down”
sdd: checker msg is “directio checker reports path is down”
reload: 36001c23000d59fc600000284478bcdca DELL ,MD3000i
[size=558G][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][undef]
\_ 5:0:0:0 sdc 8:32 [undef][faulty]
\_ round-robin 0 [prio=1][undef]
\_ 6:0:0:0 sdb 8:16 [active][ready]
\_ round-robin 0 [prio=0][undef]
\_ 8:0:0:0 sdd 8:48 [undef][faulty]
\_ round-robin 0 [prio=1][undef]
\_ 7:0:0:0 sde 8:64 [active][ready]

Swapping the preferred path around basically required running multipath each time so it would detect that the paths had changed. Running multipath is the job of multipathd so I checked and saw it hadn’t been started by installing multipath-tools, so I started it (/etc/init.d/multipath-tools start), after which I had no I/O problems touching and rm’ing files on the filesystem while swapping back and forth the preferred path in MDSM.

I created /etc/multipath.conf, based from here:

devices {
        device {
                vendor                  DELL
                product                 MD3000i
                hardware_handler        "1 rdac"
                path_checker            rdac
                path_grouping_policy    group_by_prio
                prio_callout            "/sbin/mpath_prio_rdac /dev/%n"
                failback                immediate
                getuid_callout          "/lib/udev/scsi_id -g -u -s /block/%n"
        }
}
multipaths {
        mulitpath {
                device {
                        vendor DELL
                        product MD3000i
                }
        }
}

And then reset up multipath:

# /etc/init.d/multipath-tools restart
Stopping multipath daemon: multipathd.
Starting multipath daemon: multipathd.
# multipath -F
libdevmapper: libdm-common.c(374): Removed /dev/mapper/36001c23000d59fc600000284478bcdca-part1
libdevmapper: libdm-common.c(374): Removed /dev/mapper/36001c23000d59fc600000284478bcdca

# multipath -ll
36001c23000d59fc600000284478bcdcadm-0 DELL    ,MD3000i
[size=558G][features=0][hwhandler=1 rdac]
\_ round-robin 0 [prio=6][active]
\_ 5:0:0:0 sdc 8:32  [active][ready]
\_ 8:0:0:0 sdd 8:48  [active][ready]
\_ round-robin 0 [prio=0][enabled]
\_ 6:0:0:0 sdb 8:16  [active][ghost]
\_ 7:0:0:0 sde 8:64  [active][ghost]

Flipping the preferred path this way, I saw a lot less I/O errors in the dmesg output. I’m still not sure what the RDAC path checker does exactly, but it appears to work cleaner.

Linux certifications and releases

Every once in a while I head over to CertCities to see if they’ve finally gotten around to another Hottest Certs for xxxx, which we haven’t seen one since 12/2005 or certification salary surveys which have fallen behind the times as well. I collect certifications now and then. Initially I picked up a bunch of Microsoft certifications to get a foot hold on the Seattle market after moving here. Now they’re not so important because I work for startups where Microsoft comprehension is essential, but challenges lie elsewhere, mostly in Open Source. A while back I went and got an LPIC-1 and LPIC-2, feeling like I should have a Linux certification but didn’t have the time or money for the RHCE lab, or any respect for the CompTIA Linux+. I got an email from LPI today for a survey they’re conducting about where LPI should go from here, which made me head back to CertCities and I found a number of recent articles by Emmett Dulaney about Linux that made me send him a couple of emails.

One, “Pondering Ubuntu 8.04“, subtitled “Did the few minor tweaks included in the latest version of Ubuntu actually warrant a new release? Emmett’s not so sure.” is about how the lack of new features in hardy doesn’t justify the release. It misses every point of the release cycle, and even comments about how everyone hated Microsoft for making regular releases. Well, because we had to pay for them each time, maybe?

To the folks that think upgrading from Server 2000 to Server 2003 is good because it’s new, you simply present Ubuntu as 7.10 and 8.04. When interacting with colleagues we usually refer to releases by short name such as ‘gutsy’ or ‘hardy’, which allows interjecting debian releases like ‘etch’ and ‘lenny’ without having to specify the distribution explicitly.

Of course a suitable reason for 8.04 enough is the release cycle. Debian has an amazing framework but releases are slow. Debian etch was initially released in 2007-04 and we’re hoping that lenny will be out this year, but we’ll see. Just yesterday I had to back port packages from lenny to etch because each release gets security updates, not version updates, so you have to wait for the next release for the version updates or go through the trouble of doing the backport yourself.

One might question why backport a deb package when you can simply install the new software and the answer is one of configuration management. Whenever I inherit a network full of linux systems I have to try to figure out what software was installed where. There are many instances where the same software will have been installed as different versions by different people over the years and it’s difficult to tell which is being used. Packaging solves this because (slotting aside) there’s one version installed and you can use packaging software to tell exactly what files belong to that package and
where they are.

While this may not seem of immediate benefit to a single user, it is because it’s essential to troubleshooting user problems for those that provide support, in the case of Ubuntu, mostly for free.

While Hardy may not have any visually apparent and stunning changes, I assure you there are lots of updates behind the desktop that are well worth the appreciation.

The other was, “Linux Certs and the Cutting Edge“, subtitled “Some certifications seem stuck in the Dark Ages. Plus, Book of the Week toes the command line.” This article goes on to talk about how “df, du, kill, ls, mv, rm, tar, umask, vi and so on” are on all the tests and offers that it’s because of the “commonality between the distributions”, not because these are all essential utilities. Anyways, here’s my email:

CompTIA is always a terrible example of certifications because it’s so entry level. I can’t complain a whole lot because it’s respected and besides questions that I consider obscure to my job roles (like fixing laser printers) it’s pretty easy to pass the tests.

“df, du, kill, ls, mv, rm, tar, umask, vi and so on”

These are all -essential-. I would never hire someone who failed to explain exactly what each of these tools does. I feel like LPI certifications may be a little overboard because they expect you to know what certain flags do for each command, when you can always look them up in the man page. But knowing the difference between tar -z and tar -j is always a good thing.

For example though, we have a fairly complex configuration employing debian linux hosting and as a guest on vmware-server, with configuration management by puppet with git, and capistrano for system administration. While someone with experience with these things is good, the following is a piece of a puppet recipe I wrote:

# set linux clock algorithm
# non rescue (single) kernel lines in grub config that don’t have a clock algorithm set get set to pit
# best to run this regularly (this will run everytime) so that new kernel installs get this added
# there is the edge case that a kernel is upgraded and we don’t wait for puppet to run before the reboot
exec { “set-vmware-clock”:
command => “/bin/sed -ie ‘/clock\|single/! s/^kernel.*/& clocksource=pit/ ‘ /boot/grub/menu.lst”,
onlyif => “/bin/grep ‘^kernel’ /boot/grub/menu.lst | /bin/grep -v ‘single’ | /bin/grep -v ‘clock'”,
}

If someone can’t look at that and tell me what it does, they’re not getting hired here. They don’t need to know so much about puppet, that they can figure out, even just by looking at the recipe you get a good idea of what the puppet portion of the configuration is for. But if you’re not familiar with standard tools, you’re not going to get much done, regardless of how much you may know about something like puppet. If you look at that and know that ‘grep’ returns a line of text, but don’t know that ‘-v’ makes it exclude that line, you’re going to miss the point of that recipe.

The key isn’t that these tools are distribution neutral, giving you a lot of common ground. The key is that these tools are extremely powerful provided you know how to use them. The more you familiarize yourself with them, the more you can chain them together and make more powerful solutions.