Setting a permanent Windows Hostname on EC2

In less than obvious fashion, Amazon EC2 Windows Server AMIs reset their hostname on startup due to the Ec2ConfigService. To disable this feature, select ‘EC2ConfigService Settings’ from the start menu, and uncheck the first checkbox under ‘Set Computer Name’

Configuration Management vs Meatcloud: 5 reasons CM wins

First, a bit of social commentary.

Sometimes we refer to the way something ought to be accomplished as the RightWay[tm], sarcastically noting that every best practice contains a certain degree of opinion. As we dedicate more time to doing something our way, we become invested in it being the RightWay, and risk becoming defensive about our choices. Adam Jacob calls this “survivorship-bias,” and I’ve spent some time listening to him think about what he feels the risks are, and considering them myself. When we make significant personal investment in a choice, it becomes a personal challenge to remain impartial about the merits of that choice over time.

I’ve previously said that Configuration Management is the act of programatically configuring your systems. Cloud computing says that building servers is undifferentiated heavy lifting; unless your service is building servers, you should really let someone else do it and focus on the product or service you’re actually trying to sell. Configuration Management is the first step in bringing this same ideology to configuring systems. We are not in the business of selling configured servers any more than we are in the business of providing coffee to our employees, we happen to do both. We build our systems to enable our business to conduct business. In my case, operations is enabling our customers to use the web product that we develop.

We are all members of the Configuration Management community, because we believe it is ultimately a better process for building systems. We naturally have different ideas about how that process should execute, which among other differentiating factors is often that “goals are different but are left unstated” in the community. The level of preference here and resulting fragmentation is not any different than holding an opinion over what open source operating system one should use for a specific task. The merits of our choices are worth discussing, but the implication that tools and libraries should all come to the same conclusions about design is like implying that the world only needs one type of hammer.

So, defining the meatcloud as the established notion that having your internet presence grow forms a direct relationship with hiring more people to rack servers, install software, and update configuration files; I asked around a little, why do we think Configuration Management is better?

  • Abstractation
  • You don’t need to be a mechanic to drive a car, should you need to be a subject matter expert on Apache to run a webserver? Infrastructure as code shows us how and the resulting communities are starting to implement this. When we spend less time getting the parts working, we can spend more time innovating better solutions with the parts.

  • Self-documenting
  • Ever rebuild a server that someone built long ago and is no longer with the organization, and find many small parts necessary to make it work that nobody bothered to write down? Placing those small changes and required files in configuration management ensures that even if the code doesn’t run flawlessly on an upgraded operating system, you know everything that went in to making it work. Since you’re configuring the system through configuration management, a lot less falls through the cracks because documentation is often an afterthought to getting the system working.

  • Consistency and Repeatability
  • What is on that system? Everything you told CM to put there. ‘Golden image‘ disk images often leave you in the aforementioned situation where you don’t know how to recreate them, but often you don’t know what else ended up there. Configuration Management allows you to build many copies of the same system easily, from scratch every time.

  • Agility
  • Did sales tell you they had fifty customers and it turned out to be five hundred? How long will it take you to build the extra servers by hand? How many extra people do you have to put into the meatcloud to get that done in time? Business requirements always change, and slow moving businesses are at a disadvantage to dealing with this. The inability to build and deploy servers fast enough should never be a reason your business fails.

  • Flexibility, or Don’t Repeat Yourself
  • Three applications on one server? Or one application on three servers? Apache doing different jobs on multiple servers? Moving applications between servers and leveraging existing infrastructure code for new projects is easy. We automate tasks that are repeatable, but often scripts are written to accomplish one repeatable task. Here we say, how can we treat configuration as sets of modular tasks that we can mix and match?

Moar unofficial chef 0.7.16.2

Well a couple people used the 0.7.16wt1 release, including Thom May who had issues with that version number. Consequently I took out the ‘wt’ from the version, but this is still an unofficial, unsupported release.

I pulled in the changes on Thom’s internal branch and grabbed some more low hanging fruit. Don’t ask what methodology I used. It’s magic.

Grab the github branch, or the debs in my launchpad ppa. I’ve only tested this on Ubuntu Karmic, as it goes.

** Bug
* [CHEF-454] - Centos4 yum provider failure
* [CHEF-616] - rake install in chef-repo breaks if there is no git origin
* [CHEF-637] - duplicate copies of FileEdit: file_edit.rb & fileedit.rb
* [CHEF-639] - git resource fails on subsequent checkouts of the same repostiry
* [CHEF-642] - Services will always issue a WARN when status is not present
* [CHEF-684] - Should be possible for roles to be created without anything in the run_list
* [CHEF-704] - Ruby block device does not have a default action
* [CHEF-722] - Link provider doesn’t understand paths requiring expansion
* [CHEF-843] - FileEdit permission issues

** Improvement
* [CHEF-598] - Upstart service provider
* [CHEF-617] - Install to chef repository on a remote machine
* [CHEF-709] - Support for backup up files in another directory than the original file
* [CHEF-752] - No way to pass parameters to useradd provider

Got recursion not available and Cisco SSL VPN

I’ve periodically been having DNS lookup issues with internal domains and isolated them to remote SSL VPN clients connecting to a Cisco ASA 5520 using the Anyconnect SSL VPN client. I eventually got frustrated and troubleshooted the issue by using the command line ‘vpn’ client to initiate a connection on a remote Ubuntu Linux machine while here in the office. nslookup would produce the error “Got recursion not available from x.x.x.x, trying next server” and dig would respond with “status: REFUSED” and “;; WARNING: recursion requested but not available”. I noticed traffic was not making it to the Windows Server 2008 DNS server by watching wireshark and enabling DNS debugging.

Having been acquired six months ago our list of internal domains increased quite a bit. I found the ’split-dns’ setting in the default group access policy set to the old list of internal domains and set this to ’split-dns none’. This resolved the issue. Apparently the client was comparing the query to its list of split-dns domains, and the match was failing so it was sending the resolver (operating system) an error message so it would go through the list of DNS servers until it tried the local server. Rather than trying to make a list of all the possible domain names in the company, I’m going to leave this off since the internal DNS servers have recursion enabled and can handle DNS lookups just fine for the remote clients.

Chef 0.7.16wt1 fork and more 0.8 alpha notes

I sent an email out to the chef list today about releasing an unofficial Chef 0.7.16 fork to tide some of us over until 0.8 ships as Opscode does not plan any more 0.7.x releases. I use this internally at $job and pushed it out to github by request. I have no plans to push another release but will if there is demand internally or externally. This includes the security patches in 0.7.16 and a couple bug fixes:

* [CHEF-706] - mount provider fails to mount samba/cifs devices (Device does not exist)
* [CHEF-634] - UI fails silently when unable to save node
* [CHEF-659] - UI expands some escaped characters from JSON, then fails to encode them again

You can grab this tree from github, or use the debs built in my launchpad PPA.

Rather than making additional blog posts as the build for Chef 0.8 alpha has progressed, I’ve been updating this gist each time I test another build cycle. My original instructions are here. There is another blog posts here and some instructions for running under screen as well.

knife, or, my tool is actually a library

The Chef site starts out with, “Chef is a systems integration framework, built to bring the benefits of configuration management to your entire infrastructure.” There is an important point hidden in that description; Chef is not a CM tool. Of course it can be used as one, and many do, but from its beginning it has been leveraged by others such as Engine Yard inside of their own infrastructure. You can safely bet it will be an integral part of the Opscode platform when released as well.

While I was off dealing with my startup’s acquisition this fall, a chef user wrote knife. It was a command line client for interacting with a chef server. An impressively simple prototype, made possible and powered by the chef libraries and API. This has happened before with chef; for instance a while ago in the 0.6 era, an OS X GUI client called Casserole was written by another chef user with an itch to scratch. However, something has happened with knife that is interesting enough I’d like to specifically point out; it got mainlined and heavily expanded.

This happened for a handful of reasons. For one, it was simply a great idea. The kind of user who would be attracted to chef as a tool is very likely to be a power user who would rather not spend their time clicking around a graphical interface. It’s much easier to script a command line tool where needed, passing data in and out for quick hacks to your infrastructure. The folks at Opscode saw this, agreed, and set out to add full functionality to it for the upcoming 0.8 release.

What I think is most important is the planning a full API from the start. From hearing Adam talk about other tools being “first-class citizens” in chef-land, and knowing his experience writing iClassify as an early open source external node tool for puppet, I know this design was intentional. Using iClassify to tell puppet about your nodes was great, but puppet assumed that you only wanted this tool to classify nodes in the way puppet thought about nodes. Consequentially, when you wanted to to use data in iClassify about your nodes to make decisions about your infrastructure on the fly, you were forced to do it in templates. This created the expected repetition of loading the iClassify library and accessing iClassify in many templates, but also required you at times to do some fancy footwork to get data between templates when you really wanted puppet to know about the data itself.

Reductive Labs recently announced a dashboard for puppet. I was hoping this meant those barriers had been removed. It certainly creates really nice graphs from your puppet report data. However from the README it looks like you’re still pushing limited data into puppet using the external node interface. Reductive is going to have to expand this interface greatly if dashboard is to have any meaningful node integration benefits that we didn’t already have two years ago with iClassify.

Just as you can see some concepts from other configuration management tools in chef, you can see parts of iClassify. It was a great start and it was disappointing that the puppet community didn’t engage it further. Perhaps it was simply before its time, but I believe it was that there were too few doors into puppet-land to let you really take advantage of and grow external tools.

I think this was the lesson that Opscode learned, and consequently chef was born with an API. With it we can accomplish nearly anything we dream up. What is most exciting about this is that we can do whatever everyone else dreams up. I can’t wait to see what that is.

Error 80070005 accessing SSIS/DTS on SQL 2008 and Server 2008

Retrieving the COM class factory for component with CLSID {BA785E28-3D7B-47AE-A4F9-4784F61B598A} failed due to the following error: 80070005. (Microsoft.SqlServer.ManagedDTS)

Trying to access SSIS (DTS) on Microsoft SQL 2008 with SSMS (SQL Server Management Studio) on Microsoft Windows Server 2008 gave the above error. Trying to create a maintenance plan provided the same error, since it uses SSIS. There were indications online that I should try running SSMS with elevated permissions using the ‘Run as administrator’ option on the context (right-click) menu, however that provided a “The parameter is incorrect” error on startup. Eventually I discovered that the disk that the SQL tools were installed on did not have default (R+X) permissions to the local users group. Once I added this group, I was able to connect to SSIS and create a maintenance plan without issue.

Scripting the root password on Ubuntu 9.10 (karmic)

On Ubuntu 9.04 (jaunty) I had been generating and setting the root password in a bootstrapping script using:

# Generated MD5 encrypted password
/usr/bin/openssl passwd -1
# Set the password
/bin/echo 'root:ENCRYPTED_PASSWORD' | /usr/sbin/chpasswd -e

With shadow 4.1.4, chpasswd now uses PAM, and has dropped the -e option used above, as well as the -c option that I’d used to generate sha512 encrypted passwords. You’ll want to use mkpasswd from the whois package (yeah, weird) for that now, such as:

mkpasswd -m sha-512 -s

The password can be presented to useradd / usermod in encrypted format, such as:

/usr/sbin/useradd -m -p 'ENCRYPTED_PASSWORD' -G admin -s /bin/bash toor

Installing Chef 0.8 alpha on Ubuntu Karmic

Theres push to get Chef 0.8 out the door because we’re all anxious for its awesome list of features and fixes, so we’re all hunkering down on fixing bugs. Scott Likens has similar notes and theres some more to be found in Dan Deleo’s 08boot bootstrap recipe. This should help get you going.

On a fresh Ubuntu Karmic install (a VM makes this easy of course):
# Add the Canonical Ubuntu ‘multiverse’ repository for Java.
sudo vi /etc/apt/sources.list # add multiverse to your ‘deb’ lines if it is not there
sudo apt-get update

# Start Chef Gem bootstrap with some notes
# note that I don’t like to install rubygems from source and use the packages instead. this adds a step or two.
sudo apt-get install ruby ruby1.8-dev libopenssl-ruby1.8 rdoc ri irb build-essential wget ssl-cert rubygems git-core -y
sudo gem sources -a http://gems.opscode.com
sudo gem sources -a http://gemcutter.org # for nanite
sudo gem install ohai chef json –no-ri –no-rdoc

We now have enough chef to bootstrap ourselves
# Create ~/chef.json:

{
  "bootstrap": {
    "chef": {
      "url_type": "http",
      "init_style": "runit",
      "path": "/srv/chef",
      "serve_path": "/srv/chef",
      "server_fqdn": "localhost"
    }
  },
  "recipes": "bootstrap::server"
}
# End of file

# Create ~/solo.rb:

file_cache_path "/tmp/chef-solo"
cookbook_path "/tmp/chef-solo/cookbooks"
# End of file

mkdir /tmp/chef-solo
cd /tmp/chef-solo
# Get kallistec’s 08boot bootstrap cookbook
git clone git://github.com/danielsdeleo/cookbooks.git
cd cookbooks
git checkout 08boot
# Bootstrap chef
sudo /var/lib/gems/1.8/bin/chef-solo -j ~/chef.json -c ~/solo.rb
# If the bootstrap hangs for more than a minute after “Installing package[couchdb] version 0.10.0-0ubuntu3″ then hit ctrl+c and run again

Now prepare to install the development versions
# install some development tools
sudo apt-get install rake librspec-ruby -y
sudo gem install cucumber merb-core nanite jeweler uuidtools
# install missing dependencies
sudo apt-get install libxml-ruby thin -y
# get chef from the repository
mkdir ~/src
cd ~/src
git clone git://github.com/opscode/chef.git
cd chef
rake install
# remove the old version of chef
sudo gem uninstall chef -v0.7.14
# patch up some runit paths
sudo sed -i s_chef-_/var/lib/gems/1.8/gems/chef-solr-0.8.0/bin/chef-_ /etc/sv/chef-solr*/run
# allow access to futon for development purposes (http://IPADDRESS:5984/_utils)
sudo sed -i ’s/;bind_address = 127.0.0.1/bind_address = 0.0.0.0/’ /etc/couchdb/local.ini
sudo apt-get install psmisc # for killall
sudo /etc/init.d/couchdb stop
sudo killall -15 couchdb # stubborn
sudo killall -15 beam.smp # yup
# shut it all down
sudo /etc/init.d/chef-solr stop
sudo /etc/init.d/chef-solr-indexer stop
sudo /etc/init.d/chef-solr-client stop
sudo /etc/init.d/chef-client stop
sudo /etc/init.d/chef-server stop
sudo killall -15 chef-server

Build some data and start up Chef
# start up the integration environment
cd ~/src/chef
sudo rake dev:features
# this will create a database
# now hit ctrl+c
sudo mv /var/lib/couchdb/0.10.0/chef_integration.couch /var/lib/couchdb/0.10.0/chef.couch
sudo chown couchdb:couchdb /var/lib/couchdb/0.10.0/chef.couch
# start it all up
sudo /etc/init.d/couchdb start
sudo /etc/init.d/rabbitmq-server start
sudo /etc/init.d/chef-solr start
sudo /etc/init.d/chef-solr-indexer start
sudo /etc/init.d/chef-server start

Start the web server
# the web server is now a separate application and uses the API to reach the server
sudo cp /tmp/chef_integration/webui.pem /etc/chef
cd ~/src/chef/chef-server-webui
sudo /var/lib/gems/1.8/bin/slice -p 4002

Using knife
From the user interface you can create a client keypair to use knife from the web interface. I recommend using ‘view source’ to copy the private key, and remember to save it without any leading whitespace and run knife like so:

OPSCODE_USER=’btm’ OPSCODE_KEY=’/home/btm/btm.key’ /var/lib/gems/1.8/bin/knife

If you can’t get it to work, you can always use the webui’s key:

sudo OPSCODE_USER=’chef-webui’ OPSCODE_KEY=’/etc/chef/webui.pem’ /var/lib/gems/1.8/bin/knife

Hopefully that is enough to get you going. Jump on #chef on irc.freenode.net or join the chef list if you have any problems. Tickets/bugs/features are tracked in JIRA, and all sorts of other useful information is in the wiki.

HP SMH On Ubuntu Karmic

I recently had to install HP’s System Management Homepage (SMH) on Ubuntu Karmic (9.10) on hardware I had never touched for Hosted Operations to monitor. The hardware wasn’t my choice, but I’m indifferent to it. The operating system is my choice. Apparently they support Debian Lenny (5.0) and Ubuntu Jaunty (9.04), but ours was too new. However, while I commend them for building debs, they’re a little sketchy and broken. Granted, I wasn’t deploying to a supported release, but nonetheless. Here’s a link to download options for the DL360 G6, that may never work because the HP site isn’t meant to be linked to.

Downloading the provided Ubuntu Jaunty iso and mounting it produced a standard debian repository tree for both lenny and jaunty.
sudo mount -o loop HP_ProLiant_Value_Add_Software-8.25-19-12.iso /mnt

I added these packages to our local respository, but you can copy them to every server and install them by hand using ‘dpkg -i DEB’ instead of ‘apt-get install PACKAGE’. You’ll end up installing all of them really. The HP SMH package is mostly an apache fork and a ton of included/vendored libraries.

You’ll log in to HP SMH on port 2381 over HTTPS. As usual, if you get a data stream, you are likely connecting over HTTP by accident. By default a user must be in the ‘root’ group. You can use ‘vigr’ to add another user to the root group as you usually don’t have a root user on Ubuntu. You can edit ‘/opt/hp/hpsmh/conf/smhpd.xml’ and put another group in the ‘admin-group’ element. I put ‘domain-admins’ there because we use likewise to authenticate against a windows domain. I couldn’t figure out how to have groups added via the web interface to save, but that really was a hoop anyway since I wanted to push the configuration files out via configuration management.

I don’t know if HP SMH reads snmpd.conf to figure out how to connect back to snmp locally, but I had to initially run ‘/sbin/hpsnmpconfig’ to generate a few wizardy lines in /etc/snmp/snmpd.conf’. I later pushed this out via configuration management, but if you check that script it does create an “answer file” which looked like a bunch of variables you could export before you run the script non-interactively.

HP SMH gets its information from the HP SNMP agents, so if you log in and don’t see any data, it can not contact the SNMP source. You should see a page like this. Because so many libraries are shipped in the debs rather than being required, libraries are the most common source of issues. I had to restart ‘hp-snmp-agents’ after installation, getting this error on the initial startup in ‘/var/log/hp-snmp-agents/cma.log’:

libcmacommon.so.1: cannot open shared object file: No such file or directory

Another way to say all of this is via my chef recipe:

#
# Cookbook Name:: hpsmh
# Recipe:: default
#
# Copyright 2009, Webtrends
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# Restart hp-snmp-agents later. it is buggy and has issues with its own libraries when started on package installation
service "hp-snmp-agents" do
  action :nothing
end

package "hp-health"
package "hpacucli"
package "cpqacuxe"
package "hp-snmp-agents" do
  notifies :restart, resources(:service => "hp-snmp-agents")
end
package "hp-smh-templates"
package "hpsmh"

service "hpsmhd" do
  action [ :start, :enable ]
end

service "snmpd" do
  action [ :start, :enable ]
end

remote_file "/opt/hp/hpsmh/conf/smhpd.xml" do
  source "smhpd.xml"
  owner "root"
  group "root"
  mode 0644
  notifies :restart, resources(:service => "hpsmhd")
end

remote_file "/etc/snmp/snmpd.conf" do
  source "snmpd.conf"
  owner "root"
  group "root"
  mode 0644
  notifies :restart, resources(:service => "snmpd")
end

Talent is Human

As I look back at growing up in a small town, there was a surprising lack of everyone wanting to move to the city as soon as they could. Perhaps that was because there is not a recognizible city anywhere near coastal eastern Maine. Despite, there still was a lingering belief that people were different elsewhere. Granted, they’re different, but in the same ways.

The majority of those I consider my colleagues have not worked for the same companies that I have. While our projects are of importance to our companies, it is usually our passion and not our employment that drive them. Some days I feel certain this is commonly understood, but it only takes a personal blog policy or a social media marketing drive to remind me that I’m actually isolated on an island of like-minded individuals hiding under the radar like stowaways. You can’t escape culture, but you can find different ones.

In Paul Graham’s recent essay about Apple, he markedly warns of mistreating the developers of your platform, lest they form a distaste for your brand altogether. Before I read the essay I was feeling quite sure that it was commonly understood today that developers are your greatest asset. Perhaps more valuable than even your big idea. Likely due to being mentioned by name in the essay, I was reminded of the great Google interview algorithm; commonly known for streamlining their processes at the cost of the interviewee. This seems to only alienate the prospect, unless they happen to enjoy passing tests over creating value. As the strengths of mass-collaboration become more accepted, it strikes me odd that on the whole we’re still missing that it is made up of individual human talent.

The product of our creativity is no longer hidden behind towering walls of corporations. We are global citizens innovating for the sake of it. You won’t see this on a college transcript, in ones knowledge of inodes, or in a six month product road map of release stability. The pieces are not exactly hidden either. I’m tempted to point out how slowly we’re changing by example with the United States’ difficulty transitioning from educating factory workers to innovators now that globalization has helped much of the the rest of the world catch up as industrial nations. However I can’t help but remember that we’ve gotten this far on our own.

Despite reminding us that we are living in a small town, the murmuring you’ve heard from pundits and rabble-rousers but could not make out sounds perfectly clear here. We are not going to wait for you to get it. The catch is that we no longer need to move to the city, because we’re building it every day. Coming?

libvirt + kvm TLS authentication on Ubuntu karmic

I have a number of Windows Server 2008 hosts running under KVM in a remote datacenter and using virt-manager to access libvirt+kvm over SSH for a remote console was disappointly slow, so I set out to try libvirt+kvm over SSL/TLS for comparison. In the process of which I had to upgrade virt-manager to 0.8.0 on my workstation to remove a VNC lag issue in the viewer built into virt-manager on karmic. In the end, I’m quite happy with the end result.

Creating Certificates

The available documentation for configuring TLS authentication for libvirt is a little daunting. My chosen references were these documents for libvirtd and virt-manager.

First create two certificates, one with the hostname for your server (SERVER.EXAMPLE.ORG below) and one for your workstation (CLIENT.EXAMPLE.ORG below), setting the fully qualified domain name (FQDN, the hostname including the domain name) as the Common Name, or CN, when prompted.
openssl genrsa -des3 -out host.example.org.tmp
openssl rsa -in host.example.org.tmp -out host.example.org.key
rm host.example.org.tmp
openssl req -new -key host.example.org.key -out host.example.org.csr

Providing the CSR to your local CA should present you with a signed certificate (host.example.com.cer). Be sure it is in Base64 format (text) and not DER (binary) if interacting with a Microsoft CA. If you are unfamiliar with this process, you’ll want to go and read up a bit first, it’s a useful hoop to learn to jump through.

Installing Certificates

# Libvirt Server
mkdir -p /etc/pki/CA
mkdir -p /etc/pki/libvirt/private
mkdir -p /etc/pki/libvirt-vnc
cp CA.EXAMPLE.ORG /etc/pki/CA/cacert.pem
cp CA.EXAMPLE.ORG /etc/pki/libvirt-vnc/ca-cert.pem
cp SERVER.EXAMPLE.ORG.key /etc/pki/libvirt/private/serverkey.pem
cp SERVER.EXAMPLE.ORG.key /etc/pki/libvirt-vnc/server-key.pem
cp SERVER.EXAMPLE.ORG.cer /etc/pki/libvirt/servercert.pem
cp SERVER.EXAMPLE.ORG.cer /etc/pki/libvirt-vnc/server-cert.pem

# Virt-manager client
# notice the lack of dashes in clientcert.pem
mkdir -p /etc/pki/CA
mkdir -p /etc/pki/libvirt/private
mkdir -p /etc/pki/libvirt-vnc
cp CA.EXAMPLE.ORG /etc/pki/CA/cacert.pem
cp CA.EXAMPLE.ORG /etc/pki/libvirt-vnc/ca-cert.pem
cp CLIENT.EXAMPLE.ORG.key /etc/pki/libvirt/private/clientkey.pem
cp CLIENT.EXAMPLE.ORG.key /etc/pki/libvirt-vnc/clientkey.pem
cp CLIENT.EXAMPLE.ORG.cer /etc/pki/libvirt/clientcert.pem
cp CLIENT.EXAMPLE.ORG.cer /etc/pki/libvirt-vnc/clientcert.pem

Apparmor Fixes

If libvirtd fails to start (which it likely will without these fixes) you’ll see errors in ‘/var/log/user.log’ such as:

Nov 17 17:08:09 lasvirt01 kernel: [69476.008895] type=1503 audit(1258506489.178:77): operation=”open” pid=17104 parent=1 profile=”libvirt-600d5dae-6373-107e-5f1b-5010aff3ffed” requested_mask=”r::” denied_mask=”r::” fsuid=0 ouid=0 name=”/etc/pki/libvirt-vnc/ca-cert.pem”

You’ll need to patch up the apparmor definitions a little:

  • Due to Bug# 462000, upgrade to libvirt-bin=0.7.0-1ubuntu13.1 from karmic-proposed (unless it has made it to karmic-updates when you read this)
  • Due to Bug #484562, Add “/etc/pki/libvirt-vnc/** r,” to “/etc/apparmor.d/abstractions/libvirt-qemu”
  • run /etc/init.d/apparmor reload

Server Configuration

By default libvirt and VNC don’t allow remote connections and rely on unix sockets for connections. You’ll need to enable

  • Add “–listen” to “libvirtd_opts” in /etc/default/libvirt-bin
  • Uncomment “vnc_tls = 1″ and “vnc_tls_x509_verify = 1″ in “/etc/libvirt/qemu.conf”
  • Edit your guests (virsh edit GUEST) and add “listen=’0.0.0.0′” to the graphics element
  • run /etc/init.d/libvirt-bin restart
  • cold boot any running guests (full stop, not a restart, to pick up guest xml definition changes)

Virt-manager Configuration

Add a new connection, choosing ‘QEMU/KVM’ as the Hypervisor and ‘Remote SSL/TLS with x509 certificate’ as the Connection, with the full hostname of the remote server and choose Connect. Otherwise operate like you used to.

VNC Lag Caveat

I had an issue where the display was lagging every view seconds on the VNC session via virt-manager, but when I connected directly using virt-viewer (virt-viewer –connect qemu://HOST.EXAMPLE.ORG/system GUEST_DOMAIN) there was no lag. Interestingly, when you kept both open at the same and ran a command like ‘ls -lR /’ you could easily see the difference. To correct this, I had to backport virt-manager=0.8.0-2 from debian sid, including ‘virtinst’ and ‘libvirt’, the latter requiring a change of ‘libxen3-dev’ to ‘libxen-dev’ in the Build-Depends. This is a somewhat complicated task for those unfamiliar with debian packaging.

Troubleshooting

I found most of my apparmor related errors by running ‘tail -f /var/log/user.log’. A lot of documentation recommended uncommenting ‘vnc_listen’ in ‘/etc/libvirt/qemu.conf’ but I found that the aforementioned listen xml attribute in the guest configuration overrode that by looking at the ‘-vnc’ options passed to kvm by libvirt by running ‘ps ax’. I’ve had libvirtd on the host segfault a couple times when connecting, perusing the Changelog in the version of libvirt I backported on my desktop noted some fixes that may be relevant to that.

Opscode Cookbooks community announced

It’s no secret that I think configuration management is epic. I spent yesterday hopping around the office in excitement due to Opscode’s alpha announcement of their new Cookbooks site. This is exactly the next step needed in the CM community. Shortly thereafter, while talking with another developer about how awesome Cookbooks is, he quipped about it being second in awesome to chef itself. I argue that chef exists for cookbooks, not the other way around. It was later said Chef is the tool, cookbooks are the art”. Cookbooks is all about community.

Chef is a tool, and an open-source one, so it does have a community. A vibrant one at that; a recent ohloh factoid claims “This is one of the largest open-source teams in the world, and is in the top 2% of all project teams on Ohloh.” Cookbooks is the other way around, it is a community first and at tool second. Cookbooks has already been compared to github and launchpad (with PPAs), because like these sites it brings people with a common task together around a tool like git or ubuntu. It has been noted that every configuration management project needs something like Cookbooks, and I agree.

The community that builds around Cookbooks will be a vanguard of agile system administrators looking to reduce their undifferentiated heavy lifting. These include the people who recognized that it is their product that they are selling and no longer their infrastructure, and took up cloud computing as a result. They will soon find that the configuration management community will help them continue to spend less time reinventing the operations wheel, and more time innovating it. Cookbooks will make it easier to share infrastructure-code with like-minded individuals and is the next step beyond github and wikis for this material. These models still required concerted effort to share back with the community when you finished up your recipe, but the Cookbooks site  and it’s open API stands to change that. We are now poised to leverage the open source model to increase the rise of infrastructure as code.

Wireshark + Winpcap beta on Windows Server 2008 R2

Windows Server 2008 R2 currently requires the beta version of winpcap, and you need to run the installer in compatibility mode to install it.

  1. Download wireshark
  2. Down the winpcap 4.1 beta
  3. Right click the winpcap installer, properties, compatibility mode, set to server 2008
  4. Install wincap
  5. Install wireshark
  6. Open an administrative command prompt (right click, run as administrator)
  7. “sc start npf”
  8. Run wireshark

PXE booting Ubuntu KVM Guests off WDS

Best I can tell, there is some functionality missing from traditional etherboot images that you can install on Ubuntu via the ‘kvm-pxe’ package that prevents a KVM guest from booting off of a Windows Deployment Services (WDS) server. Perhaps UNDI or such, it isn’t clear. The guest will accept an DHCP offer, but go no further. Occasionally looking for more offers, it obviously isn’t getting the traditional DHCP options it is looking for and I’m willing to bet that WDS doesn’t recognize it as a PXE client enough to speak up for itself.

Fortunately the etherboot project is alive and well under a massive rewrite coined gPXE, and there is much exciting development going on there, like iSCSI boot support. I tried creating different gPXE ROMs from rom-o-matic for the different NICs that KVM supports and replacing the roms placed in /usr/share/kvm by the kvm-pxe package, but didn’t get very far. I wasn’t sure if I had to match up the PCI id’s that KVM produced with the options on rom-o-matic or not. The e1000 ROM hit the “Too many option ROMS” error, apparently common when your PXE ROM exceeds about 60k. You can toggle different KVM NIC models via libvirt by using ‘virsh edit guestName’ and then adding a “<model type=’e1000′/>” line in the interface section of the guest definition. change e1000 to virtio, pcnet, whatever. There’s a list on the Ubuntu wiki KVM page that may provide you with further help choosing a functional gPXE ROM.

I ended up using the gPXE ISO (gPXE:all-drivers from the first pulldown) and booting off that by placing it in /var/lib/libvirt/images and adding it as an IDE CDROM storage device to the guest. Initially it gave up far two quickly when left to its own devices but I found that accessing the command prompt and running the ‘autoboot’ commant worked for me.