Monthly Archives: June 2009

Infrastructure as a code sample

Upon returning from Open Source Bridge in Portland last week, I collected my thoughts from the convergence of configuration management developers and wrote The Configuration Management Revolution, centered around the idea the something bigger is happening than we’re acknowledging.

Today Tim O’Reilly posted a blog entry about the origins of Velocity. He says “I had been thinking in the abstract about the fact that as we move to a software as a service world, one of the big changes was that applications had people “inside” of them, managing them, tuning them, and helping them respond to constantly changing conditions.” which builds on his post three years ago about operations becoming the “elephant in the room”.

That article is worth revisiting. It tails off commenting on the lack of open source deployment tools. That has definitely changed, as we have a number of open source options in the operations tool space now. O’Reilly has published a few books on operations as well, although hasn’t taken the step of considering it a category in their book list yet.

The web is full of howtos, blog posts and assorted notes on piecing together open source software to build a server. One doesn’t have to be an expert on all of the ingredients, but rather be able to figure out how to assemble them. As time goes on, the problems of the past become easier to solve; former creative solutions become mainstream and the industry leverages those advantages. This frees up mindshare for something new. I’ll emphasize that this doesn’t mean one no longer needs to have some understanding of why the server works, but the time spent engineering that particular solution is reduced because we already have the wheel, so to speak.

Writing configuration management and thus infrastructure howtos may get one started, but it’s the old way of thinking. If you can write infrastructure as code, you can share infrastructure as code. It is essential that this is achieved in a format that both promotes sharing and is relatively easy. Take the Munin and Ganglia plugin sites for instance. Munin is relatively easy to get started with and has a simple enough site for exchanging plugins. While I consider Ganglia technically superior, it’s community is not. I tried submitting to Ganglia’s plugin site once and failed. This step has to be more than a site where files are dumped, it needs community support.

I asked Luke about this at OSBridge and he said Reductive Labs plans to have a module sharing website online soon for puppet. For now, you can find an number of puppet modules in the wiki. Opscode is on track, with their chef cookbooks available as a git repository on github, combined with a ticketing system allowing users to fork, modify and contribute changes. There’s even a wiki page helping to instruct how to leverage these.

Of course, you’ll always need experienced engineers to design and tune your infrastructure. However, the time and mindshare savings from creating a LAMP stack by setting a tag or role to ‘lamp’ is immense. As Opscode produces more open APIs between parts of their product, my mind imagines the offspring of the Chef UI and virt-manager. How long until the popup touting “New features are available for your web cluster”?

The Configuration Management Revolution

The revolution is coming, and it’s about time I wrote about it.

About a year and a half ago I was settling in to a new system administration job at a startup. I was told a consulting company would be coming in to bootstrap configuration management for us. I had previously glanced at cfengine out of curiosity, but ended up spending only a couple of hours looking at it. In my mind configuration management was analogous to unattended software installation, which I was definitely in support of, but had yet to perceive how it was going to change how I viewed infrastructure.

That consulting company was HJK Solutions. Some of my coworkers had previously established relationships with a couple of the partners of HJK, but I didn’t know anything about them myself. I was along for the ride. They gave us a presentation where they showed iClassify and puppet working together to automate infrastructure for other clients, but it wasn’t until the next meeting where we made technical decisions about the implementation that I really came to appreciate their insight. It is much more interesting why someone makes a choice than the choice itself, and this was my first of many since opportunities to incite the opinions of Adam Jacob.

A year of using puppet later, not only was I hooked but my excitement about the possibilities of configuration management had grown beyond what the software could do at the time. Both my excitement and frustration was apparent and got me a sneak peak at Opscode’s Chef. The design of Chef embodies “the unix way” of chaining many tools together insofar that it allows us to take building blocks that are essentially simple on their own but from behind our backs present a system that is revolutionary enough we almost fail to recognize the familiar pieces of it.

Chef is a systems integration framework, built to bring the benefits of configuration management to your entire infrastructure.

This is not an article about Chef, this is about the big picture. However, if you take enough steps back from that statement it becomes apparent that Opscode is building toward that picture. I want to share with you the excitement that short description garners inside of me.

Configuration management alone is the act of programmatically configuring your systems. Often the benefits are conveyed in support of process, but in more agile communities different advantages are touted; such as allowing one to wrangle larger number of servers by reducing build times in the name of vertical scalability, building more maintainable infrastructures by leveraging the self-documenting side-affect of configuration languages, and reducing administrator burnout by cutting a swath in the number of repetitive tasks one must perform. These are unarguably significant boons. Nevertheless, one does not have to look hard to find a curmudgeon reluctant to change, claiming they don’t want to learn another language, that having systems run themselves will surely cause failure, or perhaps some skynet-esque doomsday scenario. History is not short of examples of luddites holding steadfast against new technology, but it is somewhat paradoxical to see this mentality held in such a technologically oriented field.

The recent Configuration Management Panel at the Open Source Bridge conference in Portland amassed many relevant core developers in one city long enough to provide a good vibe for the direction of the available tools and underscore our common charge. But the focus was more about how we will get more users of configuration management tools than why they are going to have to use them. In retrospect, perhaps I should have asked of the panel their views of how configuration management will reshape systems administration.

Configuration management is about more than automation. Some who have foreseen this have started to convey this by discussing managing infrastructures rather than systems. In analogy, the power loom, Gutenberg press, and intermodal shipping container were not merely time saving tools of automation. These inventions reshaped more than their workforce and industry, but also the global economy.

I’m fully aware of the tone set by such a call of prophecy. How will a tool that helps us configure multiple machines at once make such significant ripples in our day to day lives of the future? It will because we will be enabled to solve new problems that we did not yet realize existed. As other technological advances served as a catalyst for globalization, the industrial and scientific revolutions; changing how we build our information infrastructure leaves us poised for an exciting set of challenges that do not yet exist.

LSI mptlinux / mptsas 3.12.29 on ubuntu

I recently upgraded Dell OMSA to 6.0.1 on a number of Ubuntu Intrepid and Jaunty hosts using sara.dl’s packages and got a warning that the mptsas driver version 3.04.07 was below the minimum supported version. The version from ‘modinfo mptsas’ confirmed I was on the right track looking at this driver. A quick look revealed no update in 2.6.29.4 or 2.6.30-rc8, so I went searching for the drivers source.

LSI’s site is terrible. I have Dell 1955 blades, and the Dell SAS5/iR chipsets are really LSI SAS1068s. I searched the drivers page for SAS1068 eventually and found the right download page. I grabbed the 4.18.00 archive file.

After decompressing it I found a dkms folder and rpm. I eventually gave up and used this to build a dkms deb with the following commands:

sudo apt-get install dkms
sudo rpm -i mptlinux-4.18.00.00-1dkms.noarch.rpm --no-deps
sudo dkms mkdeb -m mptlinux -v 4.18.00.00
scp /var/lib/dkms/mptlinux/4.18.00.00/deb/mptlinux-dkms_4.18.00.00_all.deb OTHERHOST:

Then install that deb on the otherhost (with the LSI based chipset) and it will install the correct modules via dkms. I rebooted and used modinfo to verify that mptsas was now version 4.18.00 and ‘omreport storage controller’ now reports ‘Ok’ instead of ‘Degraded’ again.

Recovering from a Windows Server 2003 mirrored dynamic disk failure

I’m no fan of software raid. Pretty much, ever. At my last job, for whom I still consult, my predecessor was really into technology creep. All of the workstations used that awesome fake raid that is actually implemented in the mass storage driver and is therefore pretty useless and can actually reduce your paths to recovery from disk failure. I’ll leave out the list of arguments against software raid. It just simply isn’t worth it.

I showed up to a call with a server with an 0x7b error. Of course, Microsoft has this cool feature by default where servers automatically reboot when they blue screen. So nobody knew this was the error until I showed up and tried the “don’t automatically restart on BSOD” option under the F8 startup menu. I’m used to this error from moving system images between hardware, especially with virtual machines. As it turns out, the other values inside the parenthesis are actually useful. If the second value inside the parenthesis is 0x00000010, then you’re likely dealing with a disk in a software raid mirror set (dynamic disk) that Windows has marked as failed, and thus won’t start from.

The trick, which took me a while to nail down, is getting a boot.ini setup to boot from another disk. Since you can’t actually access this partition even in the Recovery Console, you can’t edit the boot.ini to tell it to start from the other disk. In the end, I formated a floppy using simply ‘format A:’ on an XP desktop (would you believe this entire data center lacks a Windows server with a floppy drive?), then copied ntldr, ntdetect.com and boot.ini from another Server 2003 machine with the same service pack to this floppy. Then I changed the boot.ini to contain:

[boot loader]
timeout=60
default=multi(0)disk(0)rdisk(0)partition(1)\WINDOWS
[operating systems]
multi(0)disk(0)rdisk(0)partition(1)\WINDOWS="DISK 0" /noexecute=optout /fastdetect /3GB
multi(0)disk(0)rdisk(1)partition(1)\WINDOWS="DISK 1" /noexecute=optout /fastdetect /3GB
multi(0)disk(0)rdisk(2)partition(1)\WINDOWS="DISK 2" /noexecute=optout /fastdetect /3GB
multi(0)disk(0)rdisk(3)partition(1)\WINDOWS="DISK 3" /noexecute=optout /fastdetect /3GB

If you’re not familiar with this file, you may want to read about ARC paths. Remember that ntldr and ntdetect.com are hidden, system and read-only by default, although it’s fine to leave this options unset. ‘attrib -s -h -r C:\ntldr’ will make the file accessible so you can copy it to a floppy. I have to assume when you format a floppy from an NT based operating system it puts a bit of code in the bootsector to look for these files.

I then booted from the floppy and for me I then chose ‘DISK 1’ and the system started up fine. I went pulled the failed disk (carefully guessed which disk it was by the disk order in disk management and the scsi id jumper settings) and replaced it. In disk management, right click the good disk, “remove mirror” and choose the missing disk. Then right click again, “add mirror” and choose the new disk. Drink coffee.

It’s late and I can’t figure out how to run ‘fixboot’ and ‘fixmbr’ with a disk mirror, so I’m still using the floppy disk to boot and choose either disk to start from.