Author Archives: btm

libvirt + kvm TLS authentication on Ubuntu karmic

I have a number of Windows Server 2008 hosts running under KVM in a remote datacenter and using virt-manager to access libvirt+kvm over SSH for a remote console was disappointly slow, so I set out to try libvirt+kvm over SSL/TLS for comparison. In the process of which I had to upgrade virt-manager to 0.8.0 on my workstation to remove a VNC lag issue in the viewer built into virt-manager on karmic. In the end, I’m quite happy with the end result.

Creating Certificates

The available documentation for configuring TLS authentication for libvirt is a little daunting. My chosen references were these documents for libvirtd and virt-manager.

First create two certificates, one with the hostname for your server (SERVER.EXAMPLE.ORG below) and one for your workstation (CLIENT.EXAMPLE.ORG below), setting the fully qualified domain name (FQDN, the hostname including the domain name) as the Common Name, or CN, when prompted.
openssl genrsa -des3 -out host.example.org.tmp
openssl rsa -in host.example.org.tmp -out host.example.org.key
rm host.example.org.tmp
openssl req -new -key host.example.org.key -out host.example.org.csr

Providing the CSR to your local CA should present you with a signed certificate (host.example.com.cer). Be sure it is in Base64 format (text) and not DER (binary) if interacting with a Microsoft CA. If you are unfamiliar with this process, you’ll want to go and read up a bit first, it’s a useful hoop to learn to jump through.

Installing Certificates

# Libvirt Server
mkdir -p /etc/pki/CA
mkdir -p /etc/pki/libvirt/private
mkdir -p /etc/pki/libvirt-vnc
cp CA.EXAMPLE.ORG /etc/pki/CA/cacert.pem
cp CA.EXAMPLE.ORG /etc/pki/libvirt-vnc/ca-cert.pem
cp SERVER.EXAMPLE.ORG.key /etc/pki/libvirt/private/serverkey.pem
cp SERVER.EXAMPLE.ORG.key /etc/pki/libvirt-vnc/server-key.pem
cp SERVER.EXAMPLE.ORG.cer /etc/pki/libvirt/servercert.pem
cp SERVER.EXAMPLE.ORG.cer /etc/pki/libvirt-vnc/server-cert.pem

# Virt-manager client
# notice the lack of dashes in clientcert.pem
mkdir -p /etc/pki/CA
mkdir -p /etc/pki/libvirt/private
mkdir -p /etc/pki/libvirt-vnc
cp CA.EXAMPLE.ORG /etc/pki/CA/cacert.pem
cp CA.EXAMPLE.ORG /etc/pki/libvirt-vnc/ca-cert.pem
cp CLIENT.EXAMPLE.ORG.key /etc/pki/libvirt/private/clientkey.pem
cp CLIENT.EXAMPLE.ORG.key /etc/pki/libvirt-vnc/clientkey.pem
cp CLIENT.EXAMPLE.ORG.cer /etc/pki/libvirt/clientcert.pem
cp CLIENT.EXAMPLE.ORG.cer /etc/pki/libvirt-vnc/clientcert.pem

Apparmor Fixes

If libvirtd fails to start (which it likely will without these fixes) you’ll see errors in ‘/var/log/user.log’ such as:

Nov 17 17:08:09 lasvirt01 kernel: [69476.008895] type=1503 audit(1258506489.178:77): operation=”open” pid=17104 parent=1 profile=”libvirt-600d5dae-6373-107e-5f1b-5010aff3ffed” requested_mask=”r::” denied_mask=”r::” fsuid=0 ouid=0 name=”/etc/pki/libvirt-vnc/ca-cert.pem”

You’ll need to patch up the apparmor definitions a little:

  • Due to Bug# 462000, upgrade to libvirt-bin=0.7.0-1ubuntu13.1 from karmic-proposed (unless it has made it to karmic-updates when you read this)
  • Due to Bug #484562, Add “/etc/pki/libvirt-vnc/** r,” to “/etc/apparmor.d/abstractions/libvirt-qemu”
  • run /etc/init.d/apparmor reload

Server Configuration

By default libvirt and VNC don’t allow remote connections and rely on unix sockets for connections. You’ll need to enable

  • Add “–listen” to “libvirtd_opts” in /etc/default/libvirt-bin
  • Uncomment “vnc_tls = 1” and “vnc_tls_x509_verify = 1” in “/etc/libvirt/qemu.conf”
  • Edit your guests (virsh edit GUEST) and add “listen=’0.0.0.0′” to the graphics element
  • run /etc/init.d/libvirt-bin restart
  • cold boot any running guests (full stop, not a restart, to pick up guest xml definition changes)

Virt-manager Configuration

Add a new connection, choosing ‘QEMU/KVM’ as the Hypervisor and ‘Remote SSL/TLS with x509 certificate’ as the Connection, with the full hostname of the remote server and choose Connect. Otherwise operate like you used to.

VNC Lag Caveat

I had an issue where the display was lagging every view seconds on the VNC session via virt-manager, but when I connected directly using virt-viewer (virt-viewer –connect qemu://HOST.EXAMPLE.ORG/system GUEST_DOMAIN) there was no lag. Interestingly, when you kept both open at the same and ran a command like ‘ls -lR /’ you could easily see the difference. To correct this, I had to backport virt-manager=0.8.0-2 from debian sid, including ‘virtinst’ and ‘libvirt’, the latter requiring a change of ‘libxen3-dev’ to ‘libxen-dev’ in the Build-Depends. This is a somewhat complicated task for those unfamiliar with debian packaging.

Troubleshooting

I found most of my apparmor related errors by running ‘tail -f /var/log/user.log’. A lot of documentation recommended uncommenting ‘vnc_listen’ in ‘/etc/libvirt/qemu.conf’ but I found that the aforementioned listen xml attribute in the guest configuration overrode that by looking at the ‘-vnc’ options passed to kvm by libvirt by running ‘ps ax’. I’ve had libvirtd on the host segfault a couple times when connecting, perusing the Changelog in the version of libvirt I backported on my desktop noted some fixes that may be relevant to that.

Opscode Cookbooks community announced

It’s no secret that I think configuration management is epic. I spent yesterday hopping around the office in excitement due to Opscode’s alpha announcement of their new Cookbooks site. This is exactly the next step needed in the CM community. Shortly thereafter, while talking with another developer about how awesome Cookbooks is, he quipped about it being second in awesome to chef itself. I argue that chef exists for cookbooks, not the other way around. It was later said Chef is the tool, cookbooks are the art”. Cookbooks is all about community.

Chef is a tool, and an open-source one, so it does have a community. A vibrant one at that; a recent ohloh factoid claims “This is one of the largest open-source teams in the world, and is in the top 2% of all project teams on Ohloh.” Cookbooks is the other way around, it is a community first and at tool second. Cookbooks has already been compared to github and launchpad (with PPAs), because like these sites it brings people with a common task together around a tool like git or ubuntu. It has been noted that every configuration management project needs something like Cookbooks, and I agree.

The community that builds around Cookbooks will be a vanguard of agile system administrators looking to reduce their undifferentiated heavy lifting. These include the people who recognized that it is their product that they are selling and no longer their infrastructure, and took up cloud computing as a result. They will soon find that the configuration management community will help them continue to spend less time reinventing the operations wheel, and more time innovating it. Cookbooks will make it easier to share infrastructure-code with like-minded individuals and is the next step beyond github and wikis for this material. These models still required concerted effort to share back with the community when you finished up your recipe, but the Cookbooks site  and it’s open API stands to change that. We are now poised to leverage the open source model to increase the rise of infrastructure as code.

Wireshark + Winpcap beta on Windows Server 2008 R2

Windows Server 2008 R2 currently requires the beta version of winpcap, and you need to run the installer in compatibility mode to install it.

  1. Download wireshark
  2. Down the winpcap 4.1 beta
  3. Right click the winpcap installer, properties, compatibility mode, set to server 2008
  4. Install wincap
  5. Install wireshark
  6. Open an administrative command prompt (right click, run as administrator)
  7. “sc start npf”
  8. Run wireshark

PXE booting Ubuntu KVM Guests off WDS

Best I can tell, there is some functionality missing from traditional etherboot images that you can install on Ubuntu via the ‘kvm-pxe’ package that prevents a KVM guest from booting off of a Windows Deployment Services (WDS) server. Perhaps UNDI or such, it isn’t clear. The guest will accept an DHCP offer, but go no further. Occasionally looking for more offers, it obviously isn’t getting the traditional DHCP options it is looking for and I’m willing to bet that WDS doesn’t recognize it as a PXE client enough to speak up for itself.

Fortunately the etherboot project is alive and well under a massive rewrite coined gPXE, and there is much exciting development going on there, like iSCSI boot support. I tried creating different gPXE ROMs from rom-o-matic for the different NICs that KVM supports and replacing the roms placed in /usr/share/kvm by the kvm-pxe package, but didn’t get very far. I wasn’t sure if I had to match up the PCI id’s that KVM produced with the options on rom-o-matic or not. The e1000 ROM hit the “Too many option ROMS” error, apparently common when your PXE ROM exceeds about 60k. You can toggle different KVM NIC models via libvirt by using ‘virsh edit guestName’ and then adding a “<model type=’e1000’/>” line in the interface section of the guest definition. change e1000 to virtio, pcnet, whatever. There’s a list on the Ubuntu wiki KVM page that may provide you with further help choosing a functional gPXE ROM.

I ended up using the gPXE ISO (gPXE:all-drivers from the first pulldown) and booting off that by placing it in /var/lib/libvirt/images and adding it as an IDE CDROM storage device to the guest. Initially it gave up far two quickly when left to its own devices but I found that accessing the command prompt and running the ‘autoboot’ commant worked for me.

Increasing a Win7 disk/partition under KVM

kvm-img convert small.img small.raw # this is your old image
kvm-img create large.raw 15G # or whatever size
losetup /dev/loop0 small.raw
losetup /dev/loop1 large.raw
dd if=/dev/loop0 of=/dev/loop1
losetup -d /dev/loop0
losetup -d /dev/loop1
kvm-img convert large.raw large.qcow2

Start KVM up again with the new large image. Go to the ‘Computer Management’ mmc applet under ‘Administrative Tools’. Choose ‘Disk Management’ under ‘Storage’. Right click your existing volume and choose extend. Step through the wizard. I got a message that made it appear it didn’t work, but I guess I didn’t read the fine print right and it did work fine enough. There’s always the old small image file if you run into problems. Speaking of which, be careful to not be dyslexic when inputting dd options.

Setting password never expires’ programmatically in AD

Needing to set ‘Password Never Expires’ across an entire OU in Active Directory, I managed to write a powershell script to accomplish as much. Sure is nice having scripting languages on Windows machines beyond BASIC.

# Finds all user objects in the searchroot and forces the password never expires value in user account control to set
# 2009-09-04 -- Bryan McLellan <btm@loftninjas.org>

$Never_Expire=0x10000

$objou = New-Object System.DirectoryServices.DirectoryEntry("LDAP://ou=test,dc=example,dc=com")
$objSearcher = New-Object System.DirectoryServices.directorySearcher
$objsearcher.searchroot = $objou
$objsearcher.filter = '(&(objectCategory=User)(Objectclass=user)(!isCriticalSystemObject=*))'
$objsearcher.searchscope = "subtree"

$results = $objsearcher.findall()

foreach ($result in $results) { 
  $user = [adsi]$result.path
  $value = $user.useraccountcontrol.item(0)
  $value = $value -bor $Never_Expire
  $user.useraccountcontrol = $value
  $user.name
  $user.setinfo()
}

Using openid-ldap as an OpenID provider

The openid-ldap project provides a simple OpenID to LDAP gateway that supports Active Directory so you can leverage your existing SSO database hosted in LDAP to provide OpenID logins.

All the documentation lives in docs/README. Configuration is mostly a matter of unpacking the source into an apache hosted directory, editing ldap.php as described to contain the correct ldap URIs, and configuring apache. The LDAP configuration is relatively straight-forward if you’re familiar with setting up LDAP authentication elsewhere. The apache part took some tinkering for my setup.

I ran into three problems, the first was needing to modify the filter to remove ‘(mail=*)’ since these weren’t mail enabled accounts. I used ldapsearch (example in the README) based on my settings in ldap.php to see that no accounts were getting returned and realized these accounts weren’t mail enabled.

The next problem was because my production webservers are behind a load balancer and the configuration wants to use mod_proxy to connect back to itself, which would try to go back out the backside of the load balancer and cause all sorts of confusion. I used an internal hostname to pass the proxied requests directly back to the server. You’ll see this in the attached apache configuration below

The third was because of the load balancer and I discovered this by turning debug to true in index.php and dumping a log file in /tmp. Part of the authentication request was going to different servers. Only having a single server in this particular pool resolved that.

The test page on openid-ldap.org didn’t work for me and failed with “Authentication error; not a valid OpenID”, but logging into livejournal worked okay.

<VirtualHost *:80>
	ServerAdmin webmaster@example.org
	ServerName openid.example.org

  RewriteEngine On

  RewriteRule ^/(.*) https://openid.example.org/$1 [R,L]
</VirtualHost>

<VirtualHost *:80>
  ServerName openid

  RewriteEngine On

  RewriteCond %{HTTPS} !=on

  RewriteRule ^/(.*) https://openid.example.org/$1 [R,L]
</VirtualHost>

<VirtualHost *:443>
	ServerAdmin webmaster@example.org
	ServerName openid.example.org
  ServerAlias openid
	
	DocumentRoot /var/www/example.org/openid
	
	<Directory />
		Options FollowSymLinks
		AllowOverride None
	</Directory>
	<Directory /var/www/example.org/openid>
		Options Indexes FollowSymLinks MultiViews
		AllowOverride None
		Order allow,deny
		allow from all
	</Directory>

	ErrorLog /var/log/apache2/openid.example.org-error.log
	LogLevel warn

	CustomLog /var/log/apache2/openid.example.org-access.log combined

  <Proxy https://openid-internal.example.org/*>
    Order allow,deny
    Allow from all
  </Proxy>

  ServerSignature On
  RewriteEngine On

  RewriteCond %{REQUEST_URI}      !^/(.+)\.php(.*)$
  RewriteCond %{THE_REQUEST}      ^[A-Z]{3,9}\ /([A-Za-z0-9]+)\?(.*)\ HTTP/
  RewriteRule ^/(.*)$         https://openid-internal.example.org/index.php?user=%1&%2 [P]

  RewriteCond %{REQUEST_URI}         !^/(.+)\.php(.*)$
  RewriteRule ^/([A-Za-z0-9]+)$  https://openid-internal.example.org/index.php?user=$1 [P]

</VirtualHost>

Monitoring which mysql databases are being accessed

I’m migrating a number of internal web application databases off of a mysql server and I wanted a way to see which databases are being accessed and by which hosts.

# tshark -R "mysql.opcode == 2" -e ip.src -e mysql.schema -T fields port mysql

When run on the mysql server this produces a tab separated list of values compromised of the client ip address and the database name when a mysql client specifies a database. See the man page for tshark for more information.

Update:

This catches the circumstance where database is set on login as well:
# tshark -R "mysql.schema" -e ip.src -e mysql.schema -T fields port mysql

Making sense of MySQL HA options

I’ve amassed enough mysql databases that it’s time there should be some high availability. Note that this isn’t a single huge database, it’s a pile of wordpress, request-tracker, mediawiki, etc databases. Performance isn’t the goal here, it is automatic failover in case of impending doom.

I happen to have an iSCSI san, but in the efforts of simplicity I’m looking at Heartbeat+DRBD or Heartbeat+Replication.

Most tutorials, and comments from colleagues lean towards using Heartbeat+DRBD. There is good discussion of the two, and more recent followup regarding when to use DRBD. There’s a nice little table at the bottom of this page. If you dig deeper, there are respectable comments about using what’s appropriate to the situation, the exercise of which is left up to the reader.

The problem is that mysql defaults to using MyISAM as the storage engine, which lacks a transactional journal. When your primary host crashes and your secondary host comes up, unless there’s a journal to replay you’re just assuming everything isn’t corrupt without some kind of through consistency check. Which sounds time consuming. So switch all your tables to a transactional storage engine like InnoDB?

Replication has both a slave IO and a SQL process running, which I believe avoids this, since the replication slave isn’t going to run an incomplete SQL statement if the master dies while sending it to the slave it is dropped. Which leaves you possibly behind the master, but consistent.

So I’m going to try to configure heartbeat with mysql running replication between two guests. The best information I’ve found so far is from Charles Bennington. I’ll post a followup when I’m done with that project.

a couple notes on drbd on ubuntu

Playing with drbd8 on Ubuntu, loosely following these instructions, and I ran into a couple problems.

First, you need to use a kernel that has the drbd module as there is no drbd8-module-source, -server definitely has the drbd module, -virtual did not. Instructions about building the drbd module are old.

My secondary was also stuck in a connection state of “WFBitMapT”. I noticed the secondary was Ubuntu jaunty while the primary was Ubuntu intrepid. Upgrading the primary to jaunty resolved this.

I saw the error “local disk flush failed with status -95” in the logs and wasn’t entirely sure about it but eventually found an explanation that made some sense and made me not worry about it.

drbd (/etc/init.d/drbd) doesn’t start on startup on it’s own. Most of the debugging information you’re looking for is in /proc/drbd or in your syslog output in /var/log. The only trouble is deciphering what is good and what is bad.

Infrastructure as a code sample

Upon returning from Open Source Bridge in Portland last week, I collected my thoughts from the convergence of configuration management developers and wrote The Configuration Management Revolution, centered around the idea the something bigger is happening than we’re acknowledging.

Today Tim O’Reilly posted a blog entry about the origins of Velocity. He says “I had been thinking in the abstract about the fact that as we move to a software as a service world, one of the big changes was that applications had people “inside” of them, managing them, tuning them, and helping them respond to constantly changing conditions.” which builds on his post three years ago about operations becoming the “elephant in the room”.

That article is worth revisiting. It tails off commenting on the lack of open source deployment tools. That has definitely changed, as we have a number of open source options in the operations tool space now. O’Reilly has published a few books on operations as well, although hasn’t taken the step of considering it a category in their book list yet.

The web is full of howtos, blog posts and assorted notes on piecing together open source software to build a server. One doesn’t have to be an expert on all of the ingredients, but rather be able to figure out how to assemble them. As time goes on, the problems of the past become easier to solve; former creative solutions become mainstream and the industry leverages those advantages. This frees up mindshare for something new. I’ll emphasize that this doesn’t mean one no longer needs to have some understanding of why the server works, but the time spent engineering that particular solution is reduced because we already have the wheel, so to speak.

Writing configuration management and thus infrastructure howtos may get one started, but it’s the old way of thinking. If you can write infrastructure as code, you can share infrastructure as code. It is essential that this is achieved in a format that both promotes sharing and is relatively easy. Take the Munin and Ganglia plugin sites for instance. Munin is relatively easy to get started with and has a simple enough site for exchanging plugins. While I consider Ganglia technically superior, it’s community is not. I tried submitting to Ganglia’s plugin site once and failed. This step has to be more than a site where files are dumped, it needs community support.

I asked Luke about this at OSBridge and he said Reductive Labs plans to have a module sharing website online soon for puppet. For now, you can find an number of puppet modules in the wiki. Opscode is on track, with their chef cookbooks available as a git repository on github, combined with a ticketing system allowing users to fork, modify and contribute changes. There’s even a wiki page helping to instruct how to leverage these.

Of course, you’ll always need experienced engineers to design and tune your infrastructure. However, the time and mindshare savings from creating a LAMP stack by setting a tag or role to ‘lamp’ is immense. As Opscode produces more open APIs between parts of their product, my mind imagines the offspring of the Chef UI and virt-manager. How long until the popup touting “New features are available for your web cluster”?

The Configuration Management Revolution

The revolution is coming, and it’s about time I wrote about it.

About a year and a half ago I was settling in to a new system administration job at a startup. I was told a consulting company would be coming in to bootstrap configuration management for us. I had previously glanced at cfengine out of curiosity, but ended up spending only a couple of hours looking at it. In my mind configuration management was analogous to unattended software installation, which I was definitely in support of, but had yet to perceive how it was going to change how I viewed infrastructure.

That consulting company was HJK Solutions. Some of my coworkers had previously established relationships with a couple of the partners of HJK, but I didn’t know anything about them myself. I was along for the ride. They gave us a presentation where they showed iClassify and puppet working together to automate infrastructure for other clients, but it wasn’t until the next meeting where we made technical decisions about the implementation that I really came to appreciate their insight. It is much more interesting why someone makes a choice than the choice itself, and this was my first of many since opportunities to incite the opinions of Adam Jacob.

A year of using puppet later, not only was I hooked but my excitement about the possibilities of configuration management had grown beyond what the software could do at the time. Both my excitement and frustration was apparent and got me a sneak peak at Opscode’s Chef. The design of Chef embodies “the unix way” of chaining many tools together insofar that it allows us to take building blocks that are essentially simple on their own but from behind our backs present a system that is revolutionary enough we almost fail to recognize the familiar pieces of it.

Chef is a systems integration framework, built to bring the benefits of configuration management to your entire infrastructure.

This is not an article about Chef, this is about the big picture. However, if you take enough steps back from that statement it becomes apparent that Opscode is building toward that picture. I want to share with you the excitement that short description garners inside of me.

Configuration management alone is the act of programmatically configuring your systems. Often the benefits are conveyed in support of process, but in more agile communities different advantages are touted; such as allowing one to wrangle larger number of servers by reducing build times in the name of vertical scalability, building more maintainable infrastructures by leveraging the self-documenting side-affect of configuration languages, and reducing administrator burnout by cutting a swath in the number of repetitive tasks one must perform. These are unarguably significant boons. Nevertheless, one does not have to look hard to find a curmudgeon reluctant to change, claiming they don’t want to learn another language, that having systems run themselves will surely cause failure, or perhaps some skynet-esque doomsday scenario. History is not short of examples of luddites holding steadfast against new technology, but it is somewhat paradoxical to see this mentality held in such a technologically oriented field.

The recent Configuration Management Panel at the Open Source Bridge conference in Portland amassed many relevant core developers in one city long enough to provide a good vibe for the direction of the available tools and underscore our common charge. But the focus was more about how we will get more users of configuration management tools than why they are going to have to use them. In retrospect, perhaps I should have asked of the panel their views of how configuration management will reshape systems administration.

Configuration management is about more than automation. Some who have foreseen this have started to convey this by discussing managing infrastructures rather than systems. In analogy, the power loom, Gutenberg press, and intermodal shipping container were not merely time saving tools of automation. These inventions reshaped more than their workforce and industry, but also the global economy.

I’m fully aware of the tone set by such a call of prophecy. How will a tool that helps us configure multiple machines at once make such significant ripples in our day to day lives of the future? It will because we will be enabled to solve new problems that we did not yet realize existed. As other technological advances served as a catalyst for globalization, the industrial and scientific revolutions; changing how we build our information infrastructure leaves us poised for an exciting set of challenges that do not yet exist.

LSI mptlinux / mptsas 3.12.29 on ubuntu

I recently upgraded Dell OMSA to 6.0.1 on a number of Ubuntu Intrepid and Jaunty hosts using sara.dl’s packages and got a warning that the mptsas driver version 3.04.07 was below the minimum supported version. The version from ‘modinfo mptsas’ confirmed I was on the right track looking at this driver. A quick look revealed no update in 2.6.29.4 or 2.6.30-rc8, so I went searching for the drivers source.

LSI’s site is terrible. I have Dell 1955 blades, and the Dell SAS5/iR chipsets are really LSI SAS1068s. I searched the drivers page for SAS1068 eventually and found the right download page. I grabbed the 4.18.00 archive file.

After decompressing it I found a dkms folder and rpm. I eventually gave up and used this to build a dkms deb with the following commands:

sudo apt-get install dkms
sudo rpm -i mptlinux-4.18.00.00-1dkms.noarch.rpm --no-deps
sudo dkms mkdeb -m mptlinux -v 4.18.00.00
scp /var/lib/dkms/mptlinux/4.18.00.00/deb/mptlinux-dkms_4.18.00.00_all.deb OTHERHOST:

Then install that deb on the otherhost (with the LSI based chipset) and it will install the correct modules via dkms. I rebooted and used modinfo to verify that mptsas was now version 4.18.00 and ‘omreport storage controller’ now reports ‘Ok’ instead of ‘Degraded’ again.

Recovering from a Windows Server 2003 mirrored dynamic disk failure

I’m no fan of software raid. Pretty much, ever. At my last job, for whom I still consult, my predecessor was really into technology creep. All of the workstations used that awesome fake raid that is actually implemented in the mass storage driver and is therefore pretty useless and can actually reduce your paths to recovery from disk failure. I’ll leave out the list of arguments against software raid. It just simply isn’t worth it.

I showed up to a call with a server with an 0x7b error. Of course, Microsoft has this cool feature by default where servers automatically reboot when they blue screen. So nobody knew this was the error until I showed up and tried the “don’t automatically restart on BSOD” option under the F8 startup menu. I’m used to this error from moving system images between hardware, especially with virtual machines. As it turns out, the other values inside the parenthesis are actually useful. If the second value inside the parenthesis is 0x00000010, then you’re likely dealing with a disk in a software raid mirror set (dynamic disk) that Windows has marked as failed, and thus won’t start from.

The trick, which took me a while to nail down, is getting a boot.ini setup to boot from another disk. Since you can’t actually access this partition even in the Recovery Console, you can’t edit the boot.ini to tell it to start from the other disk. In the end, I formated a floppy using simply ‘format A:’ on an XP desktop (would you believe this entire data center lacks a Windows server with a floppy drive?), then copied ntldr, ntdetect.com and boot.ini from another Server 2003 machine with the same service pack to this floppy. Then I changed the boot.ini to contain:

[boot loader]
timeout=60
default=multi(0)disk(0)rdisk(0)partition(1)\WINDOWS
[operating systems]
multi(0)disk(0)rdisk(0)partition(1)\WINDOWS="DISK 0" /noexecute=optout /fastdetect /3GB
multi(0)disk(0)rdisk(1)partition(1)\WINDOWS="DISK 1" /noexecute=optout /fastdetect /3GB
multi(0)disk(0)rdisk(2)partition(1)\WINDOWS="DISK 2" /noexecute=optout /fastdetect /3GB
multi(0)disk(0)rdisk(3)partition(1)\WINDOWS="DISK 3" /noexecute=optout /fastdetect /3GB

If you’re not familiar with this file, you may want to read about ARC paths. Remember that ntldr and ntdetect.com are hidden, system and read-only by default, although it’s fine to leave this options unset. ‘attrib -s -h -r C:\ntldr’ will make the file accessible so you can copy it to a floppy. I have to assume when you format a floppy from an NT based operating system it puts a bit of code in the bootsector to look for these files.

I then booted from the floppy and for me I then chose ‘DISK 1’ and the system started up fine. I went pulled the failed disk (carefully guessed which disk it was by the disk order in disk management and the scsi id jumper settings) and replaced it. In disk management, right click the good disk, “remove mirror” and choose the missing disk. Then right click again, “add mirror” and choose the new disk. Drink coffee.

It’s late and I can’t figure out how to run ‘fixboot’ and ‘fixmbr’ with a disk mirror, so I’m still using the floppy disk to boot and choose either disk to start from.

Migrating Virtual PC Windows servers to KVM

Windows XP / 2003 have always been treacherously unstable when moving them between hardware, so much so that fresh installs are wired into my head as being the only option. I’m tired of having three different virtualization platforms, and I don’t want to rebuild these machines. ‘Virtual Server 2005 R2’ is the first to go.

To deal with getting 0x000000ce / 0xce BSOD’s with processr.sys, set both/either of these key/values to ‘4’ in the registry before migrating the machine, or on the first startup:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Processor\Start
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Intelppm\Start

To avoid getting a 0x0000007b / 0x7b BSOD on startup due to the mass storage controller changing, run the registry modifications in MS KB 314082. I save the text inside the “copy here” block to a .reg file and ran it before converting below, and it was enough.

Then use the free VMWare vCenter Converter to convert the Virtual PC image (it has to be off) to a vmware image. On the third step of the wizard, under options, I set the disk controller from automatic to ide. Copy the resulting vmdk file to your KVM host.

Use qemu-img or kvm-img to convert the disk image from vmdk to qcow2:

kvm-img convert -O qcow2 server-ide.vmdk server-ide.qcow2

You can then use KVM to run this disk image. I use libvirt, I simply copied another libvirt xml file, removed the MAC addresses from it, removed the uuid, updated the guest name, and point it to this disk, specifying ‘ide’ as the ‘bus’ in the disk’s target element.

It takes the keyboard and mouse a few moments to work on the console the first time, I assume the devices are being detected in the background. I had one ‘PCI device’ detected without a driver, I let it stay that way. I set the resolution up by hand. On an early run I had issues with the VGA driver, but I can no longer recreate this.

If you were using a static address, you may need to follow the directions in MS KB 269155 to delete the old network interface that is now hidden or you’ll get an error about the address being in use.

Definitely leave a comment about how this works for you. It’s like playing with fire.

Here’s a libvirt xml for kicks:

<domain type='kvm'>
  <name>server</name>
  <memory>786432</memory>
  <currentMemory>786432</currentMemory>
  <vcpu>1</vcpu>
  <os>
    <type arch='i686' machine='pc'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
  </features>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <devices>
    <emulator>/usr/bin/kvm</emulator>
    <disk type='file' device='disk'>
      <source file='/srv/kvm/server/server-ide.qcow2'/>
      <target dev='hda' bus='ide'/>
    </disk>
    <interface type='bridge'>
      <source bridge='br0'/>
    </interface>
    <input type='mouse' bus='ps2'/>
    <graphics type='vnc' port='-1' autoport='yes' listen='127.0.0.1'/>
  </devices>
</domain>