Consumer hardware in enterprise environments

Sunday, 6:55AM: The CEO of my last startup, which laid off the entire development team over a year ago calls my cellphone to report a total network outage. He has an international trip that he’s leaving on and needs to get into his email to coordinate his trip. Too bad they laid off everyone that could help, with prejudice.

I wrapped up my social plans and went in to the data center last night. The workstation in the lab couldn’t reach the domain controllers on another subnet, so it lacked a DHCP address as well. I narrowed it down to the Netgear GSM7324 L3 Switch that was being used in the core. A purchase of my predocessor there, I always looked at it funny. Netgear’s one foray into enterprise routing that I’ve ever seen, I didn’t care for it much. It’s CLI tried to be a Cisco, but was substantially different so it only served to confuse people with Cisco experience (the dell gear is much closer at the UI emulation, for the record).

I drag out a serial console to reveal:

Timebase: 33.000000 MHz, MEM: 132.000000 MHú

NetGear Boot code......

Flash size = 16MB

Testing 128MB SDRAM..............................Pass

Unknown PCI devices.
PCI devices found -
Motorola MPC8245
Select an option. If no selection in 10 seconds then
operational code will start.

1 - Start operational code.
2 - Start Boot Menu.
Select (1, 2):

Operational Code Date: Thu Aug  3 22:43:40 2006
Uncompressing.....

50%                     100%
||||||||||||||||||||||||||||||||||||||||||||||||||
Attaching interface lo0...done

Adding 36274 symbols for standalone.

Unknown box topology

This is apparently common for people that take the plunge and try to save some money over buying a tried and true piece of equipment for their core. There lineup has a few newer, more expensive, models. The GSM7324 still sits at the lowest price point for a Netgear L3 switch though, luring those in thinking that there’s no tradeoff in price.

So apart came all the trunks and redundant switch links. There was enough redundancy in the physical cabling to the edge switches that I could switch to access links for each subnet. I chained all the switches back together, like it was when I first started, and set up routing on a [Juniper] Netscreen 50 instead, being the only alternative. Everything started coming back up as I dig through the network in search of the original static routing entries that I never found the time to upgrade.

How important is network administration? Too important for system administrators to get away with not knowing. A colleague was recently complaining that he couldn’t get an interviewee to answer why having two physical separate switches is better than having one. I find all of this unfortunate and trying, when I have had to re-architect every network I’ve inherited since moving to Seattle. Sometimes I think we should go back to yelling about the 5-4-3 rule on a soapbox so we’ll at least get a sane switch topology. Hopefully by the time they realize why the 5-4-3 rule doesn’t apply anymore, they’ll have picked up why switch topology is too important to be a matter of just plugging switches together like they’re power strips. Because that’s a great pet peeve too.

rubygems super slow

On any Debian Etch or Ubuntu Hardy box, running gem tends to do a source update (‘Bulk updating Gem source index for: http://gems.rubyforge.org/’) and go super slow. Especially on low memory virtual machines. I could put in a purchase order for another 40GB of RAM, oorrrr…. Rumors were that newer versions of rubygems were better, so I went about upgrading hardy the debian way. And it’s much better.

Put intrepid in your sources.list. Puppet manages mine for my network so I use sources.list.d:

/etc/apt/sources.list.d/intrepid.sources.list:
deb-src http://mirrors.cat.pdx.edu/ubuntu/ intrepid main restricted universe multiverse
deb-src http://mirrors.cat.pdx.edu/ubuntu/ intrepid-updates main restricted universe multiverse
deb-src http://security.ubuntu.com/ubuntu intrepid-security main restricted universe multiverse

Then (basically)

sudo apt-get update
mkdir /tmp/rubydeb
cd /tmp/rubydeb
sudo apt-get build-dep ruby1.8 ruby1.9 rubygems
sudo apt-get source ruby1.8 ruby1.9 rubygems
# run dpkg-buildpackage -rfakeroot in ruby1.8 and ruby1.9
# sudo dpkg -i the resulting appropriate debs (you need ruby1.9 to build rubygems)
# run dpkg-buildpackage -rfakeroot in rubygems
sudo apt-get remove libgems-ruby1.8 rubygems
# sudo dpkg -i the new rubygems deb

Everything seems good, life seems better.

BackupExec 12.0 RALUS on debian/ubuntu

A recent upgrade of BE forced out the old legacy unix agent that was a single binary and easy to script, making us use RALUS. Could have been worse. The manual install is a bit sketch, someone noted that alien could convert the rpm’s, so I started there. I found the rpms in tgz files under ‘pkgs/Linux’ and ‘RALUS64/pkgs/Linux’ for both architectures. I ran alien on an x86 and an x64 box to create a full set. The debs don’t have dependency information, but I just dealt with that with puppet because it was quicker. Ignore the errors about alien not creating postrm/postinst scripts, they’re no big deal.

RALUS wants the user that connects to have beoper as a secondary group. Also you need libstdc++5 installed. Here’s a self-documenting puppet recipe, in case there are more questions:

class ralus {

  realize User["beoper"]
  realize Group["beoper"]

  package { "vrtsvxmsa":
    ensure => installed,
  }

  package { "libstdc++5":
    ensure => installed,
  }

  package { "vrtsralus":
    ensure => installed,
    require => [ Package["vrtsvxmsa"], Package["libstdc++5"] ],
  }

  exec { "ralus-init":
    command => "/bin/cp /opt/VRTSralus/bin/VRTSralus.init /etc/init.d/ralus",
    require => Package["vrtsralus"],
    onlyif => "/usr/bin/test ! -x /etc/init.d/ralus",
  }

  service { "ralus":
    ensure => running,
    enable => true,
    require => [ Exec["ralus-init"], User["beoper"] ],
  }
}

fixing OCS 2007 for LM with lcscmd

Live Meeting cannot connect to the meeting.
Wait a few moments, and then try to join the meeting again.
If you still cannot connect, contact your administrator or technical support.

See Microsoft KB #938288.

Lots of ‘lcscmd’ options (in \Program Files\Common Files\Microsoft Office Communications Server 2007) require a pool name. I eventually found it with ‘lcscmd /forest /action:checkallpoolsstate’, where I saw a line that said ‘Executing “Check Pool BLAH”‘ where BLAH was my pool name. Even easier, in the MMC applet, the first container under ‘Standard Edition Servers’ is the pool, the one that doesn’t specify the FQDN.

Get on your full server, not the edge server, and from the above directory run:

lcscmd /web /action:clearpoolexternalurls /poolname:POOLNAME
lcscmd /web /action:updatepoolurls /externalwebfqdn:conference.example.org /poolname:POOLNAME

Their example just shows ‘contoso.com’ as the fqdn. That’s a little sketchy, being a domain name that might point to a web server. There’s little explanation as to which IP address (host) on the edge that should go to. I have the hostname of my web conference edge server there. After running this the clients could connect to live meeting without a restart of OCS (LCS).

running god inside runit

God ignores the TERM signal. Maybe this is a ruby thing? I’m not really going to bother looking. But when trying to run God inside runit, which normally sends a TERM when you tell it to stop a process; this is no good. It looks like God ignores HUP if run as a daemon:

      def run_daemonized
        # trap and ignore SIGHUP
        Signal.trap('HUP') {}

In which case the only way I could find to stop it is with an INT signal. But HUP will kill it when run non-daemonized with the “-D” option. I guess mysql and other daemons do this sorta thing too and I got the idea from this thread. For other processes see this archive of runit scripts for something that works or examples. Anyways, my runit run script for god:

#!/bin/sh
exec 2>&1

trap 'kill -HUP %1' 1 2 13 15

/usr/bin/god -D -c /etc/god/master.god --log-level debug --no-syslog & wait

debianizing ruby gems

The Ruby Oniguruma gem pissed me off the other day. I was trying to install the ultraviolet gem, and while building dependencies, compiling was done, that failed. Ultraviolet depended on oniguruma. gem did it’s thing and started installing that, but oniguruma has extensions that must be compiled, and to do so needed some headers from oniguruma itself.

I’m not a debian developer, but I like debs. First, if I was doing this with debs instead of gems, a binary package compiled for my architecture would have been pulled in automatically and life would have moved on. If I really wanted to build this from sources, libonig-dev would have been installed when I ran ‘apt-get build-dep liboniguruma-ruby’.

I’ve built a couple ruby debs in the past by stealing from other debs, but last night in my frustration I went out, read a bunch, chattered some, and started building more. You can find a repo here:

deb http://ubuntu.ninjr.org/ hardy ninjr
deb-src http://ubuntu.ninjr.org/ hardy ninjr

And if you so desire grab the key with:

wget http://ubuntu.ninjr.org/btm@loftninjas.org.gpg.key
sudo apt-key add btm@loftninjas.org.gpg.key
sudo apt-get update

Gunnar Wolf responded to my query about packaging with an awesome tip, libsetup-ruby1.8.

  1. apt-get install libsetup-ruby1.8 ruby-pkg-tools fakeroot dpkg-dev debhelper cdbs ruby1.8-dev
  2. Grab a .tgz or .gem
  3. untar it (or ‘gem unpack foo.gem’).
  4. cp /usr/lib/ruby/1.8/setup.rb package-1.1.1/
  5. mkdir package-1.1.1/debian
  6. cd package-1.1.1
  7. dch –create -v1.1.1-1
  8. fix your email, pick a package name (libpackage-ruby is my choice), put in ‘unstable’
  9. cd debian
  10. put this in ‘rules’:
    #!/usr/bin/make -f
    # copyright 2006 by Esteban Manchado Vel�zquez
    
    include /usr/share/cdbs/1/rules/simple-patchsys.mk
    include /usr/share/cdbs/1/rules/debhelper.mk
    # Ruby package with setup.rb
    include /usr/share/ruby-pkg-tools/1/class/ruby-setup-rb.mk
  11. Make a ‘control’ file like this:
    Source: libtextpow-ruby
    Section: libs
    Priority: optional
    Maintainer: Bryan McLellan 
    Build-Depends: cdbs, debhelper (>> 5.0.0), ruby-pkg-tools, ruby1.8
    Standards-Version: 3.8.0
    
    Package: libtextpow-ruby
    Architecture: all
    Depends: libtextpow-ruby1.8
    Description: a library to parse and process Textmate bundles.
     .
     This is a dummy package to install the GD library bindings for
     the default version of Ruby.
    
    Package: libtextpow-ruby1.8
    Architecture: all
    Depends: ruby1.8, libplist-ruby, liboniguruma-ruby
    Description: a library to parse and process Textmate bundles.
    
    Package: libtextpow-ruby-doc
    Section: doc
    Architecture: all
    Description: a library to parse and process Textmate bundles.
     a library to parse and process Textmate bundles
     .
     This is the documentation package, with upstream documentation as well as
     generated rdoc.
    

    On the package libpackage-ruby1.8 line, change architecture to ‘any’ if the package compiles any extensions so your package output will correctly be architecture specific. If the ruby package has no docs, pull that section out.

  12. cd ..
  13. dpkg-buildpackage -rfakeroot

That’s about it. Contributing to debian appears difficult. I think you’ve got to know someone who knows someone to get involved. But at least this way you can start building debs.

GSM / SMS Pager?

The signal to noise ratio on my cellphone / pda / smartphone is too high. I want a pager just for nagios notifications so I can leave my phone on vibrate. Apparently pagers have dissappeared. I toiled on AT&T/Cingular’s site for a while, and found a few articles from years past about how pagers were going away.

So the logical conclusion? Grab another SIM card and get another device to use as a pager. I can’t find one though. I can’t believe that.

  • Rugged – Mil-spec whatever. No worries about dropping it.
  • Long battery life – Weeks to months
  • Small Form Factor – Think the old Motorola Advisors, must disappear onto a belt clip.
  • Simple UI – Getting the text messages must be a single button.
  • Sound/Vibrate – Should have a switch to go from vibrate to audible pretty easily.

I don’t really care if it has a qwerty keyboard and can do two-way communciation. I’d be okay with that. Really I just need the above features. It can have more if it so desires.

Any ideas?

munin plugins for jboss monitoring

I grabbed the tomcat plugins from Ticket #74 for munin, specifically tomcat-plugins.tar.2.gz. I then made small changes to the URL and xml lines to work with our jboss install.

-my $URL      = exists $ENV{'url'}      ? $ENV{'url'}      : "http://%s:%s\@127.0.0.1:%d/manager/status?XML=true";
+my $URL      = exists $ENV{'url'}      ? $ENV{'url'}      : "http://%s:%s\@127.0.0.1:%d/status?XML=true";
-if($xml->{'connector'}->{'http'.$PORT}->{'requestInfo'}->{'bytesSent'}) {
-    print "volume.value " . $xml->{'connector'}->{'http'.$PORT}->{'requestInfo'}->{'bytesSent'} . "\n";
+if($xml->{'connector'}->{'http-0.0.0.0-'.$PORT}->{'requestInfo'}->{'bytesSent'}) {
+    print "volume.value " . $xml->{'connector'}->{'http-0.0.0.0-'.$PORT}->{'requestInfo'}->{'bytesSent'} . "\n";

Do this for each xml entry and you’ll be all set:

$ for file in `ls` ; do ./$file ; done
accesses.value 550
free.value 201360024
used.value 317947240
max.value 1037959168
busy.value 4
idle.value 5
volume.value 4574821

how big is puppet’s envelope?

More and more I run into problems with puppet’s DSL. Today a coworker came to me with problems with a munin plugin definition we have. Normally if you want to add a munin plugin that isn’t in the standard base, you use our munin_plugin_file definition, which calls the remotefile definition that simplifies copying files via puppet, and also calls the munin_plugin definition which essentially makes the symlink to enable the plugin.

Today we wanted to do this with wildcard plugins, but more than one call to munin_plugin_file would fail, because the remotefile would get defined multiple times and puppet can’t handle that.

err: Could not retrieve catalog: Puppet::Parser::AST::Resource failed with error ArgumentError: Duplicate definition: Remotefile[munin-plugin-slapd_] is already defined in file /etc/puppet/site-modules/munin/manifests/definitions/munin_plugin_file.pp at line 10; cannot redefine at /etc/puppet/site-modules/munin/manifests/definitions/munin_plugin_file.pp:10 on node

The solution is to use puppet’s immature conditionals to test against if the type was already defined and not redfine it.

define munin_plugin_file($plugin_config = “/etc/munin/plugins”, $plugin_dir = “/usr/share/munin/plugins”, $plugin) {

if defined(Remotefile[“munin-plugin-$plugin”]) {
debug (“$munin-plugin-$plugin already defined”)
} else {
remotefile { “munin-plugin-$plugin”:
path => “$plugin_dir/$plugin”,
module => “munin”,
source => “plugins/$plugin”,
owner => root,
group => root,
mode => 755,
require => Package[“munin-node”]
}
}
munin_plugin { $name:
plugin_config => $plugin_config,
plugin_dir => $plugin_dir,
plugin => $plugin,
require => Remotefile[“munin-plugin-$plugin”]
}
}

Note that the debug line is there because puppet conditionals can’t have empty blocks, see bug #1109 (tracker is down now, I’m guessing at that link).

I’m really wondering because I’ve had these sorts of problems twice now today. Normally it’s every once in a while. In shorter form:

Bryan Mclellan [10:59 AM]:
production-sites includes apache::module::php4, which includes the package, and runs apache_module. i wanted the php4-ldap package, which the php4 class installs. so I added an include for php4 in the production-sites.
but php4 also installs the apache2 php4 module, so there was a naming conflict.
so I removed the package from apache::module::php4 and added an include to php4 there, but it simply wouldn’t do the include. perhaps too many levels deep.

You have to put a lot of thought into your design if it’s going to scale. Especially when you put everything in puppet like we do. Someone told me recently that our puppet code base was much larger than most.

~/puppet$ find site-modules/ -name ‘*.pp’ -exec cat ‘{}’ \; | wc
4166   10820  101647
~/puppet$ find site-modules/ -name ‘*.erb’ -exec cat ‘{}’ \; | wc
3565   12773  112231
$ grep -R class site-modules/ | wc
152     578   12264

modules and site-modules have a lot of overlap. As others are picking up puppet, I wonder how long it takes them until they start running into this. Of course, if you avoid nesting definitions, and keep all of your classes separate, you won’t see this. But you’re doing a lot of work too.

ldap auth for request-tracker3.6 on ubuntu hardy

A while back I posted about ‘ldap auth for request-tracker3.6 on debian etch‘. I upgraded the old server from debian etch to ubuntu hardy recently, here is an update:

I’ve recently rebuilt request-tracker and there is a newer method for handling LDAP using ExternalAuth. You can find it on CPAN. I created a deb for it:

# get librt-extension-commandbymail for some dependencies
wget http://mjj29.matthew.ath.cx/debian-upload/librt-extension-commandbymail-perl/librt-extension-commandbymail-perl_0.05-1.dsc
wget http://mjj29.matthew.ath.cx/debian-upload/librt-extension-commandbymail-perl/librt-extension-commandbymail-perl_0.05.orig.tar.gz
wget http://mjj29.matthew.ath.cx/debian-upload/librt-extension-commandbymail-perl/librt-extension-commandbymail-perl_0.05-1.diff.gz
dpkg-source -x librt-extension-commandbymail-perl_0.05-1.dsc
wget http://www.cpan.org/authors/id/Z/ZO/ZORDRAK/RT-Authen-ExternalAuth-0.05.tar.gz
tar -xvzf RT-Authen-ExternalAuth-0.05.tar.gz
dh-make-perl RT-Authen-ExternalAuth-0.05
cp librt-extension-commandbymail-perl-0.05/debian/RT.pm RT-Authen-ExternalAuth-0.05/debian/
# add -Idebian to RT-Authen-ExternalAuth-0.05/debian/rules
# $(PERL) -Idebian Makefile.PL INSTALLDIRS=vendor \
cd RT-Authen-ExternalAuth-0.05/
dpkg-buildpackage -rfakeroot

The take the examples (RT-Authen-ExternalAuth-0.05/etc/RT_SiteConfig.pm) and add them to your RT_SiteConfig.pm like:

Set($ExternalAuthPriority, [ ‘My_LDAP’
]
);
Set($ExternalInfoPriority, [ ‘My_LDAP’
]
);
Set($ExternalServiceUsesSSLorTLS, 0);
Set($AutoCreateNonExternalUsers, 0);
Set($ExternalSettings, { # AN EXAMPLE DB SERVICE
‘My_LDAP’ => { ## GENERIC SECTION
# GREAT BIG SNIP HERE
}
}
}
);

munin-cgi-graph with fastcgi on debian etch

We use munin a lot. Consequently munin-graph takes up more than 5 minutes every time, breaking munin-cron, and loosing data. Since we graph a lot more data than we normally look at, because most of it only matters when we’re planning something or when something breaks, we don’t need new graphs every five minutes. So I switched munin-graph to use munin-cgi-graph. The basic instructions are in the munin CgiHowto. It’s pretty easy.

But each node page has a lot of graphs, so it’s annoying to wait for them all to get created. FastCGI helps, so I went about setting that up too.

First, package ‘libapache2-mod-fastcgi’ is in non-free, so you may not find it. I started using ‘libapache2-mod-fcgid’ for a bit, but since I was having trouble, downloaded the fastcgi package from non-free and added it to the local repository. munin-cgi-graph is in /usr/lib/cgi-bin on debian so I added this to my apache config:

ScriptAlias /cgi-bin/ /usr/lib/cgi-bin/
<Directory /usr/lib/cgi-bin/>
AllowOverride None
SetHandler fastcgi-script
Options ExecCGI -MultiViews +SymLinksIfOwnerMatch
Order allow,deny
Allow from all
</Directory>
<Location /cgi-bin/munin-cgi-graph>
SetHandler  fastcgi-script
</Location>

I had munin-cgi-graph working if I removed the SetHandler line, but when I left it, I’d get a lot of errors and no graphs like:

FastCGI: incomplete headers (0 bytes) received from server

[error] [client 10.0.0.60] Premature end of script headers: munin-cgi-graph

Warning: Request for graph without specifying domain. Bailing out.

I had to apply a diff to mod-cgi-graph for fastcgi support. For whatever reason I skipped this, perhaps assuming it had gotten into the deb already. Download the diff to your home directory then apply it.

cd /usr/lib/cgi-bin/
cp munin-cgi-graph munin-cgi-graph.in
patch -p0 munin-cgi-graph.in ~/munin-cgi-graph_fastcgi.diff
mv munin-cgi-graph munin-cgi-graph.orig
cp munin-cgi-graph.in munin-cgi-graph

The action specified by the URL is not recognized by the wiki

A while back I setup mediawiki on debian etch. Recently I upgraded to MediaWiki 1.11.2-2 (ubuntu) from 1.7 (debian) and started having problems logging in. clicking on “login / create account” would return to an ‘index.php’ wiki page. Going to Special:Userlogin would give me the login page, but would say “No such action”, “The action specified by the URL is not recognized by the wiki” after submitting.

A few people have seen this in different circumstances. The most related thread I found to my problem mentions wgUsePathInfo.

I kept getting an error about a ‘redirect loop’ though, and finally realized my Apache rewrite configuration was using an absolute file path. I settled on this apache configuration combined with setting wgUsePathInfo to false in my LocalSettings.php:

RewriteEngine On
RewriteCond %{REQUEST_URI} !^/(index.php|skins|images|icons|opensearch_desc.php|api.php|~.*)
RewriteRule ^/(.*)$ /index.php?title=$1 [L]

Wrangling RT CommandByMail Error Messages

Request-tracker is more useful with the CommandByMail extension because you can interact with tickets in less steps. It likes to parse everything in the form ‘word: word’ as a command though, so ‘http://somewhere’ causes an error email to be generated.

When dealing with people that are emailing RT for support, you’re already usually dealing with a fairly non tech save crowd, who can be confused by the error message. I modified TakeAction.pm as such:

diff TakeAction.pm.orig TakeAction.pm
663c663,668
<     my $ErrorsTo = RT::Interface::Email::ParseErrorsToAddressFromHead( $args{'Message'}->head );
---
>     my $ErrorsTo = '';
>     if ( defined( $RT::ErrorEmailAddress ) ) {
>       $ErrorsTo = $RT::ErrorEmailAddress;
>     } else {
>       $ErrorsTo = RT::Interface::Email::ParseErrorsToAddressFromHead( $args{'Message'}->head );
>     }

Then add a line below to your RT_SiteConfig.pm and restart

Set($ErrorEmailAddress, 'noc@example.com');