Category Archives: Uncategorized

running multiple X servers at once

I’m running a pretty stock ubuntu hardy install on craptastic ATI video hardware and associated proprietary binary drivers. For some reason whenever I run an X app over an SSH tunnel, X just crashes. It’s super frustrating to have everything all set up and have it crash, so my work around… Switch the desktop to a virtual desktop with nothing on it and run Xephyr on full screen, throw fluxbox on it, grab a shell and ssh from there. It works, and I figure if it crashes it shouldn’t take down all of X.

apt-get install xserver-xephyr fluxbox
Xephyr :1 -fullscreen -ac -br
fluxbox -display :1 &

mounting my MD3000i on ubuntu hardy

Not too bad since I spent some time figuring this out on debian etch previously: 1, 2, 3.

Get the software:
apt-get install open-iscsi multipath-tools

First, make the node chat with the MD3000i (change portal IP):
iscsiadm -m discovery –type sendtargets –portal 10.0.9.10 -P 1

Then log in to the MD3000i with MDSM, add host (manual), type in the hostname, and set up some disk mappings.

Then have the node log in to the MD3000i:
iscsiadm -m node -l

Get multipath to sort out all the disks:
multipath -ll

It will print out some business like:
sdb: checker msg is "directio checker reports path is down"
sdf: checker msg is "directio checker reports path is down"
36001e4f0003968c6000012da48aac3d1dm-0 DELL ,MD3000i
[size=278G][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][enabled]
\_ 1:0:0:0 sdb 8:16 [failed][faulty]
\_ round-robin 0 [prio=1][active]
\_ 2:0:0:0 sdd 8:48 [active][ready]
\_ round-robin 0 [prio=1][enabled]
\_ 3:0:0:0 sde 8:64 [active][ready]
\_ round-robin 0 [prio=0][enabled]
\_ 4:0:0:0 sdf 8:80 [failed][faulty]
sdc: checker msg is "directio checker reports path is down"
1_dm-1 DELL ,Universal Xpor
[size=20M][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][enabled]
\_ 1:0:0:31 sdc 8:32 [failed][faulty]

The third line should map to a device in /dev/mapper/, you can do whatever you want with.

Using attach-interface with libvirt and qemu/kvm on ubuntu

Googling, asking in IRC channels, no answers to be found. It wasn’t until I finally backported libvirt-bin 0.4.4 to hardy (and /etc/init.d/libvirt-bin restart!) that I figured this out due to an updated error message. You can’t use attach-interface with qemu/kvm:

virsh # attach-interface domain bridge br0.11
libvir: QEMU error : this function is not supported by the hypervisor: only CDROM disk devices can be attached

Which is much more useful than:

virsh # attach-interface domain bridge br0
libvir: QEMU error : unknown device type

In fact, it looks like most of libvirt is xen-centric. Which is pretty annoying. ubuntu-vm-builder generates a MAC address like so:

/usr/share/ubuntu-vm-builder/vms/qemu-common:
MAC=”52:54:00$(hexdump -e ‘/1 “:%02x”‘ -n 3 /dev/urandom)”

I saw a list post saying libvirt would create a MAC, and tested it. I added a interface by copying the interface block in the xml configuration file for the domain, then removing the mac address entry and updating the source bridge line to the other interface I want to bridge. I connected back to libvirt with virsh, ran ‘define /etc/libvirt/qemu/domain.xml’ and ‘start domain’. Sure enough I had the second interface. When I halted the domain and dropped to shell I could see that libvirt had added a mac address line to that interface. Interestingly, it’s also in the ’52:54:00′ OUI.

I’m a little upset about all of this. I had to hack up ubuntu-vm-builder to get bridge support for kvm/qemu. (LP:257090) And now, I can’t script virsh to configure that second interface.

I think I’m back to using a text manipulation tool to modify the xml configuration file by hand, which is just plain stupid. I’m already doing that to create my vmx files for VMware Server, and virt-manager (0.5.3-0ubuntu10) lacks the ability to add interfaces to a running or stopped qemu/kvm guest. Which the vmware GUI already supports.

I suppose this is what I get for being an early adopter.

adding disks to a SQL 2005 Virtual Server

I already had a seperate disk added to a cluster as a resource but when I tried to change logs under default database locations, the disk wasn’t listed. setting the path to use the disk anyways and trying to create a database revieled the anwser.

You need to make add the disk resource as a dependency to the SQL Server resource in Cluster Administrator. Notably, you can’t do this while the cluster is online, which is pretty stupid. Hit take offline, add it, bring it back online, and then you’ll see the additional drive letter in the server properties in SQL Management Studio when trying to change the default paths for databases or logs.

mssql 2005 cluster on md3000i san, disk unknown / unreadable

I built a sql server with three disks on an md3000i, Q: S: L:, quorum, SQL (data) and Logs. First I setup the iSCSI, then setup the cluster. You can’t convert SQL 2005 from standalone to clustered, so you need to install the cluster first, then choose the sub checkbox when install SQL that it will be a cluster.

After it was all online, I spent a lot of time trying to get the second server to see the disks. They were showing up in Computer Management as “unknown / unreadable”. I couldn’t add this node to the cluster because it couldn’t see the quorum drive. I as able to add the second node to the cluster by running cluster administrator from the first node, where apparently it couldn’t figure out that the second node couldn’t see the quorum disk on the iSCSI SAN. Eventually I figured out to shut down the first server, and the quorum disk popped up on the second server.

I’m installing SQL on the second node now, which failed with this:

C:\Program Files\Microsoft SQL Server\90\Setup Bootstrap\LOG\Files\SQLSetup0001_DB-04_Core(Local).log

Warning: Disk resource Disk S: is not Online
Please bring it Online if it is needed
Warning: Disk resource Disk L: is not Online
Please bring it Online if it is needed
Error: Action “InvokeSqlSetupDllAction” threw an exception during execution.  Error information reported during run:
Source File Name: datastore\cachedpropertycollection.cpp
Compiler Timestamp: Fri Jul 29 01:13:49 2005
Function Name: CachedPropertyCollection::findProperty
Source Line Number: 130
———————————————————-
Failed to find property “ComputerList” {“SqlComputers”, “”, “”} in cache
Source File Name: datastore\clusterinfocollector.cpp
Compiler Timestamp: Fri Sep 16 13:20:12 2005
Function Name: ClusterInfoCollector::collectClusterVSInfo
Source Line Number: 883
———————————————————-
Failed to detect VS info due to datastore exception.
Source File Name: datastore\clustergroupsproperties.cpp
Compiler Timestamp: Fri Jul 29 01:13:49 2005
Function Name: ClusterGroupScope.SharedDisks
Source Line Number: 56
———————————————————-
Failed to find a cluster group that owned shared disk: S:

I shut down cluster service on that node to force the other node to take over the cluster, then started cluster services again.

Running the installer again didn’t fail the second time.

RAID10 on MD3000i

UI Fail.

  1. In MDSM, Configure tab, choose Create Virtual Disks
  2. Choose Unconfigured Capacity
  3. Choose RAID1 from the pull down
  4. Select four disks, Add.
    You’ll see them populate in sets of two on the right
  5. Click calculate disk size
  6. Click next
    Continue as normal

Making Outlook 2007 quote responsibly

Somewhere along the lines of HTML emails and Outlook RTF/Rich Text trash, reasonable email quoting got lost. I’ve switched outlook to default to text emails (Tools, Options, Mail Format tab), and setup nice quoting (Tools, Options, Email Options, When replying to a message: prefix each line of the original message). But replying to an html message still defaults to HTML, giving you the “blue line of evil“. Microsoft has directions to force everything to text though, as a “security feature”.

mediawiki on debian etch with AD LDAP Authentication

Installing media with on debian etch against Windows Server 2003 R2 Active Directory

A braindump:

  1. Installed the ‘mediawiki’ and ‘php5-ldap’ packages.
  2. Hit ‘http://hostname/mediawiki’ and ran through the configuration
  3. mv /var/lib/mediawiki1.7/config/LocalSettings.php /var/lib/mediawiki1.7/config/
  4. Added the latest LdapAuthentication.php extension to /var/lib/mediawiki1.7/config/extensions
  5. Edited the LocalSettings.php to add:

    # Additions for LDAP Auth
    require_once ‘extensions/LdapAuthentication.php’;

    $wgAuth = new LdapAuthenticationPlugin();

    $wgLDAPDomainNames = array(
    ‘AD’
    );

    $wgLDAPServerNames = array(
    ‘AD’ => ‘dc01.example.com’
    );

    $wgLDAPSearchStrings = array(
    ‘AD’ => ‘uid=USER-NAME,dc=corp,dc=example,dc=com’
    );

    $wgLDAPEncryptionType = array(
    ‘AD’ => ‘clear’
    );

    $wgLDAPProxyAgent =  array(
    ‘AD’ => ‘CN=LDAP User,CN=Users,DC=corp,DC=example,DC=com’
    );

    $wgLDAPProxyAgentPassword = array(
    ‘AD’ => ‘password’
    );

    $wgMinimalPasswordLength = 1;

    # Prevent new user registrations except by sysops
    $wgGroupPermissions[‘*’][‘createaccount’] = false;

    Trying to log in just gave me a blank white page and no LDAP traffic on the DC. I had seen talk of configuring ssl via openldap so I wasn’t sure I had all the right packages since I lacked /etc/openldap.

  6. I installed the ‘libldap-2.3-0’ package which notably said ‘Replacing files in old package libldap2 …’ but with no change.
  7. I restarted apache2 and then on login I got an invalid password error and saw LDAP traffic.
  8. I noticed in the traffic that it was logging in as ‘uid=Username,dc=corp,dc=example,dc=com’
  9. In the changelog for 1.7f there is “Added function in for changing usernames to lowercase to fix: Extension Talk:LDAP_Authentication Username_modified_.28capital_letter.29.2C_authentication_fails (only works in versions 1.6+)” (Link was broken, added spaces). So I added ‘$wgLDAPGroupLowerCaseUsername = true;’ to LocalSettings.php. This made no difference.
  10. I tried ‘$wgLDAPLowerCaseUsername = true;’ which made no difference.
  11. I realized that it was trying to bind to ‘uid=user,dc=corp,dc=example,dc=com’ and went back to take out the SearchString entry and replaced it with:

    $wgLDAPSearchAttributes = array(
    ‘AD’ => ‘sAMAccountName’
    );

  12. I tried both SAMAccountName and uid, both showing “No Such Objects” errors in the LDAP query (via the wireshark traffic dump).
  13. Parsing through the source for LdapAuthentication.php I found the $wgLDAPDebug variable. It’s compared against a debugVal integer, so I set ‘$wgLDAPDebug = 10;’ and got a bunch of text output printed on the login page when I tried to log in.
  14. Looking at the source between ‘Connected successfully’ and ‘Entering getSearchString’ I should have been seeing ‘Lowercasing the username: $username’, but I wasn’t.
  15. I stared at the LdapAuthentication.php code for a while and figured out I need this instead:

    $wgLDAPLowerCaseUsername = array(
    ‘AD’ => true
    );

  16. Finally thinking about the whole references mess, I looked for BaseDN settings in the code and added this:

    $wgLDAPBaseDNs = array(
    ‘AD’ => ‘dc=corp,dc=example,dc=com’
    );

  17. And that worked. So I flipped ‘clear’ to ‘ssl’, and all login attempts started giving me a download for an empty index.php file.
  18. I went to my CA server, grabbed the CA’s cert in DER format
  19. Ran ‘openssl x509 -inform der -in myca.cer -out myca.pem -outform pem’, put this in /etc/ssl/certs
  20. Then I added the following to /etc/ldap/ldap.conf, despite many warnings that openssl/php didn’t read this (I <3 Debian though):

    TLS_REQCERT never
    TLS_CACERT /etc/ssl/certs/myca.pem
    TLS_CACERTDIR /etc/ssl/certs

  21. Then bits started basically timing out this sort of trash in the apache2 error.log:

    apache2: cyrus.c:468: ldap_int_sasl_open: Assertion `lc->lconn_sasl_ctx == ((void *)0)’ failed.
    [Wed Aug 13 12:00:05 2008] [notice] child pid 6139 exit signal Aborted (6)

  22. I remembered that /etc/ssl/certs had lots of certificates in it, so I commented out the last line, and then everything worked!
  23. After you login you’ll get settings populated for your user in the ‘user’ table. You’ll want to grab your UID (likely 1) and run the SQL command:
    INSERT INTO `user_groups` ( `ug_user` , `ug_group` ) VALUES ( ‘1’, ‘sysop’);
    You may need to add a prefix to user_groups if you set your DB up that way.
  24. I used this to basically get rid of the extra URL stuff.
  25. You’ll have no help pages, use this technique to change all of your help links to go to the mediawiki help links.

building virtual ubuntu intrepid installs with u-v-b

Currently building an intrepid install with ubuntu-vm-builder like ‘sudo ubuntu-vm-builder kvm intrepid’ will produce:

Creating /var/run/network
Reading package lists... Done
Building dependency tree... Done
E: Couldn't find package linux-virtual
Error
umount target

This is because there’s no intrepid meta package for linux-virtual, which is what u-v-b tries to install by default to get a kernel. Instead build with ‘sudo ubuntu-vm-builder kvm intrepid –kernel-flavour server.

It looks like the Ubuntu Kernel Team is rebuilding the linux-meta package for intrepid and hasn’t added a new linux-virtual metapackage yet. There was some discussion about this not being done yet in May.

Enterprise networking with kvm and libvirt

My debian/vmware host blades have vlans trunked in, such that host01 would have eth0.2, eth0.3, eth0.4 and so forth. Then in /etc/vmware/locations I set:

answer VNET_0_INTERFACE eth0
answer VNET_2_INTERFACE eth0.2
answer VNET_4_INTERFACE eth0.4

So that when I create a vmware guest I can choose eth0/eth1 to be bridges to the /dev/vmnet2 or so forth, and know that specific interface will be on a particular vlan. It’s pretty simple, and I like it that way.

I’m trying to convert from debian/vmware server to ubuntu/kvm, for a bunch of reasons. Mostly I don’t like the VMware server UI, although I admit I haven’t tried 2.0 yet. I know that VMware Virtual Infrastructure and friends have more advanced UIs, but of course, they cost money. I’m not crazy about the backdoor interfaces that I use to get data between a host and a guest (right now I have a host tell the guest what it’s hostname is, so the guest can report to iclassify what it’s host is, so I have this information automatically so one doesn’t have to track it elsewhere). I want hot-add memory and migration, without going to ESX and whatnot as well.

Lately I’ve been playing with kvm, libvirt, and ubuntu-vm-builder. The interesting thing is that it appears very user oriented. I guess that is to be expected, most of ubuntu users are individuals and not enterprises. I gave a talk last month where someone asked me if my coworkers care about puppet running on my servers, to which I replied with, “What coworkers?” So I’m in a great position where I can mess about with useful, new technology without having to put together a slide deck and give a meeting to managers about why it’s a good idea. People that I work with trust that I have good ideas, and that’s enough for all of us.

Because of all this user focus, libvirt networking leaves a little to be desired. It’s defined in vi /etc/libvirt/qemu/networks/, with the default network in default.xml. There’s a subdirectory of autostart where you can link back to other networks, which brings these up on startup. You can see them in virsh with ‘net-list –all’. I shut it down with ‘net-destroy default’, and saw the ‘vnet0’ interface dissappear from ‘ifconfig -a’. The format of the libvirt network xml file is partially documented.

Your node/guest/domain configuration is in /etc/libvirt/qemu/domain.xml. There’s a section like this:

<interface type='network'>
<mac address='52:54:00:2a:26:25'/>
<source network='default'/>
</interface>

Which creates a NAT’d interface based on the networks/default.xml configuration. I tried changing the interface type to bridge and the source network to eth0, and got: “libvir: QEMU error : Failed to add tap interface ‘vnet%d’ to bridge ‘eth0’ : Operation not supported”.

Changing interface type to bridge and source to “bridge’br0′”, running ‘define domain.xml’ in virsh then ‘start domain’ produced the expected “libvir: QEMU error : Failed to add tap interface ‘vnet%d’ to bridge ‘br0’ : No such device”.

Then running ‘brctl addbr br0 ; brctl addif br0 eth0’ to produce the bridge and starting the domain again allows life to continue as expected.

Most of the HOWTO’s out there for doing briding have you create a bridge interface, bridge it to eth0, and move your ip address to it. You can create subinterfaces, like br0.4 to map to eth0.4.

You may see that if you start a second guest with the same configuration, libvirt is going to create a vnetN interface for each additional domain which it adds to the bridge. There are some notes here, but really, install the libvirt-doc package and take a look at ‘/usr/share/doc/libvirt-doc/format.html’ as it’s the best technical reference I’ve found so far.

# The primary network interface
auto eth0
iface eth0 inet manual
	up ifconfig $IFACE up

auto eth0.2
iface eth0.2 inet manual
        pre-up /sbin/vconfig add eth0 2
	up ifconfig $IFACE up

auto br0.2
iface br0.2 inet manual
	bridge_ports eth0.2
	bridge_stp off
	bridge_maxwait 0
	bridge_fd 0
	bridge_hello 0

Edit: You can bridge multiple interfaces by duplicating your efforts. libvirt/kvm will add an extra vnet interface for each interface element in the domain xml file, and bridge it to the bridge interface you specify. Be sure to run ‘virsh define /path/to/domain.xml’ to read in the changes.

  <devices>
    <interface type='bridge'>
      <source bridge='br0.4'/>
    </interface>
    <interface type='bridge'>
      <source bridge='br0.11'/>
    </interface>
  </devices>

building capistrano v2.2.0 for debian/ubuntu

After a script ran “gem update –system” and blew up a debian box for the nine millionth time, I’ve been on a kick to get rid of gem’s altogether. Here are some directions for build a deb for capistrano:

apt-get  install build-essential fakeroot ruby-pkg-tools
svn checkout svn://svn.debian.org/pkg-ruby-extras/packages-wip/capistrano/trunk/
wget http://rubyforge.org/frs/download.php/33072/capistrano-2.2.0.tgz
tar -xvzf capistrano-2.2.0.tgz
mv trunk/* capistrano-2.2.0/
cd capistrano-2.2.0/
dch –v2.2.0
dpkg-buildpackage –rfakeroot

Or you can just grab my capistrano deb (for etch / hardy, any arch): capistrano_2.2.0_all.deb.

playing with ubuntu, kvm and ubuntu-vm-builder

OSCON 2008 turned me on to some new virtualization tools. Right now I build VMs by using a Capistrano script that connects to a vmware server, builds a vmdk, creates a custom vmx via an erb, and netboots the box for a debian install.

I’m looking at moving to having ubuntu-vm-builder create ubuntu boxes instead. So I built an ubuntu blade first. I don’t know how long it takes to build a vmware box, but it takes a bit for the preseeded debian install to run, maybe 15 minutes. The first box I built with ubuntu-vm-builder made me happy:

apt-get install ubuntu-vm-builder
time sudo ubuntu-vm-builder kvm hardy –mem 2048 –mirror http://ubuntu.widemile.com/ubuntu
[snip]
real    2m21.705s
user    0m33.254s
sys     0m22.941s

I then installed kvm (apt-get install kvm) but got this error when it tried to start the kvm init script:

FATAL: Error inserting kvm_intel (/lib/modules/2.6.20-16-generic/kernel/drivers/kvm/kvm-intel.ko): Operation not supported

LP #104297 made me check the dmesg, and I saw:

  kernel: [ 899.340000] kvm: disabled by bios

I restarted, hit bios, and enabled VT. And then on startup the kvm module was already loaded.

It took me a bit to figure out qemu though. It’s important to read both the kvm and qemu man pages. When I first started it I saw errors about the framebuffer, and realized that since I was ssh’d into a remote machine, there wasn’t one. I found the “-nographic” option, but I later confirmed that the ubuntu install doesn’t configure a serial console by default (too bad). I spent a lot of time messing around with -vnc (an example in the man page would have been awesome). Starting up qemu and switching to the console I checked out ‘info vnc’ and it kept teling me that it wasn’t loaded. I tried ‘-vnc :0,password’ but when I ran ‘change vnc password’ in the console it would also segmentation fault.

Finally I figured out to run kvm/qemu WITH ‘-vnc’ and WITHOUT ‘-nographic’. Also, the standard network format for is ‘-vnc :n’ where an is an integer. The port you connect to becomes 5900+n. So if you start qemu/vnc with ‘-vnc :1’ you would connect to port 5901.

This isn’t any sort of advance performance benchmark, but running ‘time bzip2 -d linux-source.2.6.24.tar.gz2’ on the ubuntu kvm install took 15.967s (real) and on a debian vmware-server install it took 27.870s.

thoughts on making nagios alarms trigger a flashing red light

We’ve been joking for a while about how we need a big red flashing light to be triggered by nagios alarms. I’m not going to do it right now, but I thought about it.

There’s a debian x10 package for controlling x10 modules. Get a big rotating red light that runs on AC. Plug it into an X10 lamp module. I can’t find a “CP-290” module which is what the x10 package calls for. You could use this Powerlinc Serial/TW523 controller. It has source available, so you could figure out the protocol if it wasn’t compatible. Then write a script and keep it running (or cron) to do something like ‘/usr/sbin/nagios2stats -m -d NUMSVCPROB,NUMHSTPROB’ which prints out the number of service and host problems. If either of those are > 0, use the x10 module to turn the light on.

do not use gems, or any other programming languages package management

I recently had to explain to a math researcher why I wouldn’t deploy R packages directly to servers, which are basically tarballs. R has a CPAN like infrastructure too called CRAN. Today I’m screwing around with gems.

CPAN/CRAN/gems/etc are not debs.

/usr/bin/gem:23: uninitialized constant Gem::GemRunner (NameError)

This comes from running ‘gem update –system’ on a debian or ubuntu machine. Don’t. I’ll say that again, don’t. Rails people appear fond of gems, we need to stop this nuisance in it’s tracks. First, go read the debian position on gems, and appreciate it. Then, go and clean out ‘/usr/local/lib/site_ruby’. Run ‘gem -v’ and make sure gem is working again.

Fortunately lenny appears to protect you against running this command, but hardy doesn’t.

Maintaining servers is much more than getting software installed. I feel like I’ve ranted about this before, so I won’t again. Please, debianize your ruby gems, don’t use gem to install them. That goes for all your other forms of distributing language specific modules too.