Knife Reporting: apt + updates

Nathan and I were discussing yesterday the lack of a good way to visualize all of the updates waiting to be installed across a server cluster. I wrote a another knife script to do this, and Seth Falcon helped me clean it up.

# Knife exec script to search for and describe systems needing updates
# 2011-01-11 - Bryan McLellan <btm@loftninjas.org>

gem "net-ssh", ">= 2.0.23"
require 'net/ssh/multi'

class AptSsh < Chef::Knife::Ssh
  # Override configure_session so we can specify where to get the query
  def configure_session
    @longest = 0 # Set in Chef::Knife::Ssh.run
    q = Chef::Search::Query.new
    @action_nodes = q.search(:node, ARGV[2])[0]
    fqdns = @action_nodes.map { |item| item.fqdn }
    if fqdns.empty?
      Chef::Log.fatal("No nodes returned from search!")
      exit 10
    end
    session_from_list(fqdns)
  end

  def capture_command(command, subsession=nil)
    host_data = Hash.new { |h, k| h[k] = "" }
    subsession ||= session
    command = fixup_sudo(command)
    subsession.open_channel do |ch|
      ch.request_pty
      ch.exec command do |ch, success|
        raise ArgumentError, "Cannot execute #{command}" unless success
        ch.on_data do |ichannel, data|
          host_data[ichannel[:host]] << data
          if data =~ /^knife sudo password: /
            ichannel.send_data("#{get_password}\n")
          end
        end
      end
    end
    session.loop
    return host_data
  end
end

abort("usage: knife exec apt.knife QUERY") unless ARGV[2]
ssh = AptSsh.new
ssh.configure_session

# install apt-show-versions if it isn't installed
install_show_versions = <<EOH
if [ ! -e /usr/bin/apt-show-versions ] ; then
  echo INSTALLING APT-SHOW-VERSIONS ; sudo apt-get install apt-show-versions -y
fi
EOH
ssh.ssh_command(install_show_versions)

apt_data = ssh.capture_command('apt-show-versions -u -b')

apt_data.each do |host, data|
  puts "#{host} - #{data.count("\n")} updates, #{data.scan("-security").length} of which are security updates"
  data.each_line do |line|
    puts "  #{line}"
  end
end

# Prevents knife from trying to execute any command line arguments as addtional script files, see CHEF-1973
exit 0

Given a search query, this provides an output of:

$ knife exec apt.knife role:dev
webui-dev.example.org - 6 updates, 6 of which are security updates
  libc-bin/lucid-security
  libc-dev-bin/lucid-security
  libc6/lucid-security
  libc6-dev/lucid-security
  libc6-i686/lucid-security
  libc6-xen/lucid-security
monitoring-dev.example.orgs - 6 updates, 6 of which are security updates
  libc-bin/lucid-security
  libc-dev-bin/lucid-security
  libc6/lucid-security
  libc6-dev/lucid-security
  libc6-i686/lucid-security
  libc6-xen/lucid-security
rabbitmq-dev.example.org - 6 updates, 6 of which are security updates
  libc-bin/lucid-security
  libc-dev-bin/lucid-security
  libc6/lucid-security
  libc6-dev/lucid-security
  libc6-i686/lucid-security
  libc6-xen/lucid-security
couchdb-dev.example.org - 7 updates, 7 of which are security updates
  libc-bin/lucid-security
  libc-dev-bin/lucid-security
  libc6/lucid-security
  libc6-dev/lucid-security
  xulrunner-1.9.2/lucid-security
  xulrunner-1.9.2-dev/lucid-security
  xulrunner-dev/lucid-security

Lets say that you didn’t want to upgrade the couch box, you could modify the search query to not include that box and run again to confirm. Then reuse that search string to trigger the update.

$ knife exec apt.knife "role:dev NOT hostname:couchdb-dev"
$ knife ssh "role:dev NOT hostname:couchdb-dev" "sudo apt-get upgrade -y"

Reporting using Chef’s Knife

We have a table in our corporate Confluence wiki that looks something like this. It was a product of a few quick notes to allow the team to build out VMs in parallel, distributed across a number of virtual hosts, and not rely on luck for proper resource utilization. The number fields are the amount of gigabytes of RAM allocated to the guests. As long as the total didn’t exceed a magic number for the entire host, we could keep building and the team remained unblocked. It got the job done, but it is no way to keep track of guests and resources. First, wiki’s have a tendency to get out of date and rot. It takes a fair amount of work to know what needs to be updated and keep it that way on a daily basis. Also, tables in Confluence are not all that great. They are far from Excel. The total row contains no formula to autosum the column, and you find yourself regularly switching between editor modes depending on how you are entering data, such as by hand or using cut and paste.

So, what if your “back of the napkin” calculations could be sourced from real data? This is usually unrealistic because you don’t know what data you need until you need it, so it hasn’t been captured. But we do capture a lot of data about nodes in Chef, so it is sitting there waiting for you to have that bright idea. In this case, I wanted to reconcile the memory usage on the VM hosts. I could ssh to each host, and collect this information from libvirt by hand, and put it in a spreadsheet somewhere or add it up myself for Confluence. But what happens when a teammate builds another server tomorrow? Will they update the documentation? Is that a step we want to keep doing by hand, as we build and destroy VMs on a regular basis? Is it a step we should be doing by hand, these days?

Chef::Log.level= :fatal
printf "%-10s %-12s %-8s %s\n", "host", "guest", "MB RAM", "Run List"
search(:node, 'role:virt').each do |host|
  total_mem = 0
  host[:virtualization][:domains].each do |domain,attribs|
    begin
      guest = nodes.show(domain)
    rescue
      guest = search(:node, "hostname:#{domain}")[0]
    end
    run_list = guest.run_list if guest
    printf "%-10s %-12s %-8s %s\n", host.name, domain, attribs[:memory] / 1024, run_list
    total_mem += attribs[:memory]
  end
  printf "%-10s %-12s %-8s %s\n", host.name, "TOTAL", total_mem / 1024, ""
end

This example is a knife exec script. If you saved this to a file named virt_ram.knife, then you could run it with knife exec virt_ram.knife. While Chef has full blown APIs you can interface with, that can raise the cost of a small project higher than its worth. With knife exec, small proof of concept projects done on the side of your desk are approachable with ease.

Let us take a moment to step through the code.

1 — Set the Chef log level to fatal to surpress warnings generated my line 7 when we look a non-existent node.
2 — Print out a header describing the columns of data we are going to generate
3 — Search chef for all of the nodes with the role “virt” and loop through them, naming the node object ‘host’
5 — Each virtual host object contains a hash of domains in host[:virtualization][:domains]. Step through these assigning the key to ‘domain’ and the value (another hash) to ‘attribs’
6-10 —  Look to see if we have a node in Chef whose name matches the domain name in libvirt. If not, rescue and trap that failure and try to search for a node with that hostname. Your node names in chef don’t have to be your hostnames or fqdns. At Opscode we use short unique identifiers such as EC2 instance IDs, portions of randomly generated GUIDs, and asset tracking numbers.
11 — If we did find a matching node, get its run_list. This really explains what a host does at Opscode, as we tend two only have two or three meta roles applied to a node. Usually one represents the environment it is in, such as “prod” or “dev” and the other is its role like “webserver” or “couchdb”
12 — Print out the information we known about this guest
13 — Then add the memory used by that guest to the running total for the host.
15 — Finally, print out the total memory we’ve calculated for that host.
16 — Go back around and do it all again for the next host.

$ knife exec virt_ram.knife
host guest        MB RAM   Run List
vm1  rv-735a342e  2048     role[prod], role[web]
vm1  rv-8ef1f3d1  4096     role[prod], role[database]
vm1  rv-eb574386  512      role[prod], role[dns]
vm1  TOTAL        6656
vm2  rv-91ba412e  2048     role[prod], role[web]
vm2  rv-8e342d11  4096     role[prod], role[database]
vm2  rv-e3829f86  512      role[prod], role[dns]
vm2  TOTAL        6656
vm3  cobbler1     1024
vm3  rv-e3829f86  512      role[prod], role[dns]
vm3  TOTAL        1536

This data is made up, but I’ve shown on vm3 something that I found in my own infrastructure; there were guests left over from testing that weren’t named properly and never made it into the chef server. I wouldn’t know they were there if I hadn’t done an audit of the servers this way. This exemplifies the Chef philosophy that it should help you do what you want, not model what it thinks you should be doing. This isn’t a carefully engineered reporting feature built around a common practice of virtualization management. This is a script I hacked on with Dan‘s endless helpful guidance while I was waiting for an rsync to finish. I know others have written similar scripts to reconcile EC2 instances by comparing Chef and EC2 via Fog.

I love it. Do you have some spare time? What do you need? Chef will get you there.

Knife one-liners

Knife’s exec sub-command makes it easier to interact with a Chef server from the command line. Let’s assume I’ve created a data bag named cluster as follows:

{
  "id": "www1",
  "cats": "lol",
  "hostname": "www1.example.org"
}
{
  "id": "www2",
  "cats": "lol",
  "hostname": "www2.example.org"
}
{
  "id": "www3",
  "cats": "plz",
  "hostname": "www3.example.org"
}

If I wanted to get a list of hostnames for each data bag item where the value of ‘cats’ is ‘lol’, I would run:

$ knife exec -E "search(:cluster, 'cats:lol').each {|host| puts host[:hostname] }"
www2.example.org
www1.example.org

Granted, I could get this data from the search sub-command as well:

$ knife search cluster cats:lol
{
  "start": 0,
  "total": 2,
  "rows": [
    {
      "id": "www2",
      "cats": "lol",
      "hostname": "www2.example.org"
    },
    {
      "id": "www1",
      "cats": "lol",
      "hostname": "www1.example.org"
    }
  ]
}

However, it is hard to manipulate the result of this data. For instance, if I wanted to to check the status of ntp on each of these nodes as a “one-off command”, I could run:

 
$ knife ssh -m \
"`knife exec -E "search(:cluster, 'cats:plz').each {|host| puts host[:hostname] }" | xargs`" \
'/etc/init.d/ntp status'
www1.example.org  * NTP server is running
www2.example.org  * NTP server is running

The quoting can get pretty tricky fast. Instead, if you leave off the -E flag to knife exec, you can pass a script file to knife where you can write clearer scripts, which makes it easier to do more.

# Script contents
$ cat /tmp/knife.exec
targets = Array.new
search(:cluster, 'cats:lol').each do |host| 
  targets << host[:hostname]
end
puts targets.join(' ')

# Execute the script
$ knife exec /tmp/knife.exec 
www2.example.org www1.example.org

What if you needed to reconcile your hardware support contracts with the systems currently deployed? It is no problem to get a list of hardware with chef and knife.

# Script contents
$ cat /tmp/dell.exec
search(:node, 'dmi_system_manufacturer:Dell*').each do |node| 
  puts node[:dmi][:system][:serial_number] + "\t" + node[:fqdn] 
end

# Execute the script
$ knife exec /tmp/dell.exec
XJS1NF1 www1.example.org
XJS1NF2 www2.example.org
XJS1NF3 www3.example.org

These are pretty simple examples, but hopefully you can see how easy it is with Chef to use knife scripts to create reports, collect data, and execute one-off commands.

Wrangling 32bit debs on a 64bit system

Typically directions for downloading a i386 version of a library for a x86_64 system link to a specific deb package and tell you to download it with wget. A new release of that package often breaks the link, so I wanted to document how to do this using apt. Unfortunately, it looks like apt won’t download a single deb if it can’t resolve dependencies, but aptitude will, so we use them together.

I use a separate sources.list here just to speed up the process, as we need to correct apt when we’re finished.

# Download 32bit list files from the mirror specified in /tmp/sources.list
apt-get -o=APT::Architecture="i386" -o=Dir::Etc::sourcelist="/tmp/sources.list" -o=Dir::Etc::sourceparts="/dev/null" update
# Download the single library. Set libstdc++5 to whatever library you want
aptitude -o Apt::Architecture=i386 download libstdc++5
# Return apts lists to their preconfigured state
apt-get update
# Optionally, install the package
dpkg --force-architecture -i libstdc++5_1%3a3.3.6-20~lucid1_i386.deb

Note that if you install the package, it would overwrite the 64bit version of the library if it is installed. 32bit packages meant for 64bit systems, like the ia32-libs package, install to /lib32 and /usr/lib32 to avoid this. You could also extract the package with ‘dpkg -x libstdc++5_1%3a3.3.6-20~lucid1_i386.deb’ and copy the libraries to where you like, then run ‘ldconfig’. The getlibs tool will try to repack debs more appropriately for you, if you like.

libvirtError: monitor socket did not show up

Sometimes errors don’t float to the top of stacks well.

Our virtualization stack is pretty automated wherein we have a custom script that uses vmbuilder to create the guest, register it with libvirt, create first boot scripts that will have it register with a chef server, and start the VM. We saw this error today libvirtError: monitor socket did not show up.: Connection refused, and I commented that my memory contained a lot of libvirt/kvm errors, and many resolutions, but the two don’t always stay connected. I checked the libvirt logs in /var/log/libvirt and even ran libvirt with LIBVIRT_DEBUG=1 libvirtd -v. When I tried running kvm by hand using the syntax in the logs, but with the -net options removed from the command line, kvm just spouted Aborted. After starting at it for a bit, I noticed that instead of -m 1024 KVM was trying to run with -m 1073741824 (1024^3). This was due to a small conversion bug in our custom script.

Amazon EC2 Network Subnets

For a project that exists both in Amazon Web Services EC2 US-EAST-1b and another cloud, I wanted to block network traffic between the two to ensure they didn’t affect each other. I started by doing an whois looking via ARIN for all of the IP addresses we are currently assigned in EC2, and I ultimately got the same list that I found registered to the AMAZO-4 contact with ARIN, with the exception of AMAZON-AES, which I presume is for Amazon Enterprise Solutions. I couldn’t tell you offhand if the same IP blocks are used in other AWS zones.

Network CIDR Netmask ARIN Name
72.44.32.0 /19 255.255.224.0 AMAZON-EC2-2
67.202.0.0 /18 255.255.192.0 AMAZON-EC2-3
75.101.128.0 /17 255.255.128.0 AMAZON-EC2-4
174.129.0.0 /16 255.255.0.0 AMAZON-EC2-5
204.236.128.0 /17 255.255.128.0 AMAZON-EC2-6
184.72.0.0 /15 255.254.0.0 AMAZON-EC2-7
50.16.0.0 /14 255.252.0.0 AMAZON-EC2-8

Here are the IOS commands:

name 72.44.32.0 EC2-2 description AMAZON-EC2-2
name 67.202.0.0 EC2-3 description AMAZON-EC2-3
name 75.101.128.0 EC2-4 description AMAZON-EC2-4
name 174.129.0.0 EC2-5 description AMAZON-EC2-5
name 204.236.128.0 EC2-6 description AMAZON-EC2-6
name 184.72.0.0 EC2-7 description AMAZON-EC2-7
name 50.16.0.0 EC2-8 description AMAZON-EC2-8
object-group network ec2-us-east
   network-object 174.129.0.0 255.255.0.0
   network-object 184.72.0.0 255.254.0.0
   network-object 204.236.128.0 255.255.128.0
   network-object 50.16.0.0 255.252.0.0
   network-object 67.202.0.0 255.255.192.0
   network-object 72.44.32.0 255.255.224.0
   network-object 75.101.128.0 255.255.128.0

Script hacks: waiting for the internet

Now and then the VMs (kvm, libvirt + vmbuilder) I was cranking out would start up too fast, and the “first boot” script would run before the host got an IP address and had internet access. Since the first thing I was doing was downloading the Rubygems source using wget (to install chef), and since wget lacks a retry for dns failure, I hacked up this script to wait for the internet a bit.

#!/bin/bash

# Wait for internet to come up (DHCP)
MAXWAIT=60
WAITTIME=0
host production.cf.rubygems.org > /dev/null

while [ $? == 1 ] && [ $WAITTIME -le $MAXWAIT ] ; do
  WAITTIME=$(($WAITTIME + 10))
  sleep 10
  echo -n .
  host production.cf.rubygems.org > /dev/null
done

DNS-SD, a printer, and a little luck

DNS SD, also known as Apple’s Bonjour, utilizes DNS as a configuration database for automatic service discovery. For the most part, it appears its used by devices more than people. The multicast implementation, or mDNS, is what makes printers automatically show up in OS X when you put them on your network. I recently moved such a printer from a flat network, to one where the wired and wireless workstations were on separate subnets. In an attempt to make the printer easy to find, I implemented DNS SD over unicast so OS X laptops in the office could detect the printer with Bonjour.

First, I set the Domain Name to “office.opscode.com” using DHCP, so I would have a nice sandbox to mess around with DNS without breaking anything. Then I created a few DNS records:

OfficejetPro8500.office.opscode.com A 172.28.0.5
lb._dns-sd._udp.office.opscode.com PTR office.opscode.com.
b._dns-sd._udp.office.opscode.com PTR office.opscode.com.
_printer._tcp.office.opscode.com PTR _OfficejetPro8500._pdl-datastream._tcp.office.opscode.com.
_pdl-datastream._tcp.office.opscode.com PTR _OfficejetPro8500._pdl-datastream._tcp.office.opscode.com.
_OfficejetPro8500._pdl-datastream._tcp.office.opscode.com SRV 0 0 9100 OfficejetPro8500.office.opscode.com.
_OfficejetPro8500._pdl-datastream._tcp.office.opscode.com TXT "txtvers=1" "note=Office Entry" "usb_MFG=HP" "usb_MDL=Officejet Pro 8500 A909g" "ty=HP Officejet Pro 8500"
  1. Specifies the internal IP address of the resource. We use this later in the SRV record.
  2. What domain the client should browse if they haven’t specified one.
  3. What domain a client in this domain should browse.
  4. Define a LPR/LPD printer. LPR is the “Flagship” protocol and “must” be defined (Port 515)
  5. Define a PDL printer, sometimes called raw (Port 9100)
  6. Specify the printer service. The last four fields there are priority, weight, port and host, per RFC 2782.
  7. Provide additional configuration information related to the printer

There isn’t a lot of clear information regarding how you should specify multiple key/value pairs in the TXT field. RFC 1035 specifies, <character-string> is a single length octet followed by that number of characters. <character-string> is treated as binary information, and can be up to 256 characters in length (including the length octet). For Microsoft DNS, check out this article. I was using DynInc’s Dynect, and was able to put all the key/value pairs in double quotes in the single input field. Also, if you are too, use the “Expert Editor” which is a menu option under the “Simple Editor,” it is a little easier to specify the multi-part hostnames this way. It sounds like in bind you put one key/value pair in double quotes per line, with the series wrapped in parenthesis.

Dynect wouldn’t let me specify the SRV record without a preceding underscore, which is a shame, because this is what OS X detects as the device name which also lower-cased it, making it a little difficult to read. You should be able to spaces in these names, but I wasn’t about to try escaping that. The key/value pairs in the TXT resource record are documented in the Apple Bonjour Printing specification.

  • txtvers / Define what version of this format we are using
  • note / User-readable information about the device, OS X displays this as Location
  • usb_MFG / the Manufacturer name that the USB driver would specify. I made educated guesses at these.
  • usb_MDL / the Model that the USB device would specify. Combined with the last field this will choose the driver for the user.
  • ty / a User-readable name for the device. I had hoped this would be used in the Printer Name field in the GUI, but it wasn’t.

virt-manager keymaps on OS X

I’m not crazy about the lack of a definitive package manager for OS X. I tried for about a day to work with Open Source on OS X, then I built an Ubuntu VM. I’ve been using ssh with X forwarding when I need a graphical interface; OS X has reasonable good built in support for X11. However, others have found that the keymap and meta keys are broken. While I got a kick out of “After some time I discovered that the number 8 is interpreted as Return,” I did need to log in to a guest to do some debugging.

The accepted solution to making Ctrl+Alt release keyboard focus correctly in the vncviewer spawned by virt-manager is to create a .Xmodmap file in your home directory with this content:

clear Mod1
keycode 66 = Alt_L
keycode 69 = Alt_R
add Mod1 = Alt_L
add Mod1 = Alt_R

I killed the X server by focusing on it and choosing quit, and it seemed to be read the .Xmodmap file okay without my needing to restart the entire system.

The workaround for the broken keymap pointed me in the right direction, but I wasn’t happy with the solution. A little digging around the libvirt domain xml reference pointed out that you can add a keymap as an attribute to the vnc element in the domain xml definition. Use ‘virsh edit’ to edit the domain XML and modify the vnc line to add this attribute so it looks like so:

<graphics type='vnc' port='5900' autoport='yes' listen='127.0.0.1' keymap='en-us'/>

I destroyed the guest and restarted it and the keyboard worked now without any “8 is now enter” trickery. I’m pretty sure you can choose any keymap from /usr/share/qemu/keymaps. If you use vmbuilder you will want to add this to /etc/vmbuilder/libvirt/libvirtxml.tmpl as well.

Motorola Backflip charging

Nightmare.

Chargers:

AC1) Motorola DC4050US0301 5.1V DC 850MA
AC2) AT&T 03577 5.0V 1000ma
DC1) AT&T USB VPC03578
DC2) AT&T USB + MiniUSB MV302927

Cables:

M1) Motorola SKN6378A
M2) “Motorola” SKN6238A
M3) Monoprice generic microusb

Dead Phone, AC1, M1 OR M2 OR M3
Green light on Phone, OS starts, displays charging battery

Dead Phone, AC2, M1 OR M2 OR M3
Blue light on AC2, Green light on Phone, OS starts
Green light / OS cycle every 15 seconds

Dead Phone, DC1, M1 OR M2 OR M3
White light on DC1, Green light on Phone, OS starts
Green light / OS cycle every 15 seconds

Dead Phone, DC2, M1 OR M2 OR M3
White light on DC2, Green light alternates on/off on Phone

Phone on, AC1, M1 OR M2 OR M3
Green light on, charge symbol in battery on display

Phone on, AC2, M1
Blue light on AC2 for five seconds

Phone on, AC2, M2 OR M3
Blue light on AC2
Green light on, no charge symbol in battery on display

I have an AT&T AC charger at work that I believe works as well as the stock Motorola. The AT&T AC charger here at home, listed above, is a “five star” model that consumes 0W when not charging, I assume that is what the blue light turning off indicates. Hopefully the combinations that keep the green light on the phone on are charging, just very slowly, and are still somewhat useful. More to come.

Munin Aggregation with Multigraph

Six months ago I made note to the pattern for referring to stacked graph data sources in munin:

load.double.stack one=localhost.localdomain:load.load two=localhost.localdomain:load.load

This syntax evaluates to:
graph.value.stack line=host.domain:plugin.value

I’ve been using multigraph more since then, which is a boon to performance, but it complicates stacked graphs a little. This hurts because it remains very difficult to tell why your graphs are not drawing when you incorrectly reference a data source. To debug, as the munin user (use ‘su -l munin’, ‘sudo -s -u munin’ or ‘chpst -u munin’) run:
/usr/share/munin/munin-graph --service 'load.double.stack' --debug
Be sure to replace “load.double.stack” with the name of the graph you’re trying to draw.

The munin wiki example for stacked graphs explains data source names as:

snmp_ups_current.inputtotal.sum \
---------------- ---------- ---
        |             |      |
        |             |      `-- The sum mechanism
        |             `--------- One of this virtual plugin's values
        `----------------------- The name of the virtual plugin

ups-5a:snmp_ups_ups-5a_current.inputcurrent \
ups-5b:snmp_ups_ups-5b_current.inputcurrent
------ ----------------------- ------------
   |               |                 |
   |               |                 `------ The "inputcurrent" value from the real plugin
   |               `------------------------ The real plugin's name (symlink)
   `---------------------------------------- The host name from which to seek information

However, with multigraph the name of the plugins symlink isn’t necessarily the name of the graph. The trick I found was to connect the the munin node and call the multigraph plugin, looking for the ‘multigraph’ line.

$ nc localhost 4949
# munin node at server.example.org
cap multigraph # tell munin-node that you are multigraph aware
cap multigraph
fetch diskstats # fetch the diskstats multigraph plugin
multigraph diskstats_latency
sdb_avgwait.value 0
multigraph diskstats_latency.sdb
avgwait.value 0
.

I’ve removed a significant portion of the returned data here. Pay attention to the fact that this plugin returned a “diskstats_latency” graph that contains data for all of the disks, as well as individual graphs for each disk, here “diskstats_latency.sdb” In this example your stacked graph configuration would be:

disk.double.stack \
  one=localhost.localdomain:diskstats_latency.sdb.avgwait \
  two=localhost.localdomain:diskstats_latency.sdb.avgwait
  -1- ----------2---------- -----------3--------- ---4---

(1) The alias and label for this host or data point
(2) The configured node name of the host
(3) The original graphs name, either the plugin or multigraph name
(4) The value from the plugin/graph

Notice that while the period is used to separate the value from the rest of the field, there may be periods in the rest of the field. Also keep in mind that in the past I have seen dashes in configured graph names become underscores at the end of the day.

Community Cooking

It’s been a year since the Opscode Cookbook site was launched and a recent project got me thinking about what parts of my hopes that I wrote about then have taken effect so far. I recently heard that a major Chef user has switched to Ubuntu from another Linux distribution because that is what most of the cookbooks that Opscode maintains are written for and tested on. Choice of distribution is typically something that is very dear to administrators and somewhere in the world there is a flame war on this topic every second. Consequently, this is huge and I’ve been thinking about it for a while.

It is one thing for a company to choose a distribution based on a software package that is significant to them; in the past I have had to battle against stakeholders that wanted to choose a particular distribution solely on the availability of support. Chef runs on a lot of platforms, but of course some get much more attention than others because that is where the community is. But here we’re seeing a company choose their distribution not because of Chef’s support for it, but for the community support for Cookbooks that run on it. This is clear evidence that what I wrote about is starting to happen.

I’ve been working on a Chef + Cobbler writeup in my spare time. I went out the other day and bought a consumer desktop to use as a libvirt/kvm host for this project. It often tends to be least painful to use cloud resources, but sometimes that which is taken care of for you is too unknown, too much of a “black box,” and you need the deep visibility into your infrastructure that building it yourself provides. There is indication that some have gotten Cobbler running on and deploying Ubuntu, but it doesn’t appear to have taken hold. There’s a launchpad spec claiming there is a package, but I couldn’t find it. Another spec makes it clear that files for debian packaging from upstream are not finished. It is here that I first ran into problems. I couldn’t get the init.d scripts provided in the debian/ directory of the upstream repository to work. They clearly needed some help, and after spending some time on them it became clear that they’re untested templates created by debhelper.

My goal wasn’t to fix these init scripts, I just wanted to get the cobbler server running. Then I remembered that we had a great existing runit cookbook that I was familiar with. The API for the cookbook site has progressed since release. Unfortunately the documentation for the cookbook API is a little behind, but the new functionality has been built into knife, the command line tool that interacts with the Chef server or Opscode platform, as well as multiple cloud providers. From within my personal chef-repo, I ran:

knife cookbook site vendor runit

This downloaded the runit cookbook from the Cookbooks site into a branch in my chef-repo git repository, then merged it into my current branch, allowing me to track changes between my local copy and the upstream copy using git. Then I added a couple templates and a call to the runit_service definition to my cookbook:

templates/default/sv-cobbler-run.erb:

#!/bin/sh
PATH=/usr/local/bin:/usr/local/sbin:/bin:/sbin:/usr/bin:/usr/sbin
exec 2>&1
exec /usr/bin/env cobblerd -F

templates/default/sv-cobbler-log-run.erb:

#!/bin/sh
exec svlogd -tt ./main

recipes/default.rb:

# SNIP
runit_service "cobbler"

And then cobblerd was running under runit. There’s beauty in being able to take something somewhat complex like runit, and make it easy. So easy, that I used it rather than fixing up an init script.

Then I found that cobbler wanted to called through apache as proxy. No problem though, I vendored the apache2 cookbook as well. I spent a few minutes determining that I needed a couple of Apache modules, as the Cobbler instructions are pretty centric to Redhat and I got the impression that they make assumptions about what that gives you. Then I used the apache2 cookbook to proxy cobbler by adding this to the top of my recipe:

recipes/default.rb:

include_recipe "apache2"
include_recipe "apache2::mod_proxy_http"
include_recipe "apache2::mod_python"

I had some permission problems with mod_proxy, again likely a difference between Redhat and Ubuntu, but it wasn’t out of my way to ship the apache config provided by upstream using Chef with a small modification:

<Proxy http://localhost:25151>
    AddDefaultCharset off
    Order deny,allow
    Allow from all
</Proxy>

I’ll write about the Cobbler cookbook more later. You can, of course, follow it on the Cookbook site in the interim. I want to emphasize how I used a single command to grab existing code and leverage it to make it easier for me to do what I was trying to do: get Cobbler running. The Cookbook site combined with the awesome Chef community made this possible. If you haven’t used the “cookbook site” subcommands in knife yet, take a moment to try them out.

$ knife cookbook site --help
Available cookbook site subcommands: (for details, knife SUB-COMMAND --help)

** COOKBOOK SITE COMMANDS **
knife cookbook site vendor COOKBOOK [VERSION] (options)
knife cookbook site show COOKBOOK [VERSION] (options)
knife cookbook site share COOKBOOK CATEGORY (options)
knife cookbook site search QUERY (options)
knife cookbook site list (options)
knife cookbook site download COOKBOOK [VERSION] (options)
knife cookbook site unshare COOKBOOK

Northeast travels

I’ll be traveling in the northeastern US from 10/28 through 11/13. Current plans put me in DC from 10/28 – 11/1. Then I’ll be in Boston on Tuesday 11/2 for the Boston Devops Meetup. On Wednesday 11/3, I will be in NYC for the NYC Devops Meetup at drop.io. Following that, I will be stationed back in my hometown in Maine until returning to Seattle on 11/13. If you’d like to catch up with me about operations with Chef while I am in the region, or have me stop by somewhere and give a presentation, send me an email and we can make arrangements.

Installing powershell on Window Server 2008 with Chef

Here is a simple example of automation on Windows using Chef. Powershell is our gateway to nearly all parts of modern Windows servers so it is one of the first packages we want to install.

This was tested on EC2 (ami-c3e40daa) running Microsoft Windows Server 2008 Datacenter edition. You will need to start with the Chef installation directions for Windows, with either your own Chef server built out or an Opscode Platform account.

Those familiar with Chef will notice this looks exactly like a recipe for a Linux system, provided that we were installing software without the benefit of a package manager. This recipe simply downloads the installer and then executes it with a switch to perform the installation silently. Also note that I pass the “/norestart” switch as well. MSU is a standalone microsoft update installer, rather than a plain executable. This installation will trigger hooks normally associated with updates, which also means that it triggers a restart. I hadn’t thought about this the first time I ran this recipe and thought for a moment that my system had crashed. In actuality it performed flawlessly and the system restarted with powershell installed.

The remote file resource (line nineteen) only downloads the installer if it isn’t present or has changed. If Microsoft changes the placement of this package on their website, you would have to update this resource. Alternately, you could place these important files on an internal web server or file server, or place them directly in the cookbook and transfer them to the server using the cookbook file resource.

The execute resource (line twenty-three) only runs the installer if the remote file resource has chosen to download this file. The “:nothing” action tells the resource to not run, specifically so that it wouldn’t run everything. However the subscribes attribute hooks into the prior remote file resource so that it will run if that resource performs an action. You can find more information regarding resources in the Chef wiki.

Windows servers no longer need to remain in the land of “as-built” documentation, where a wiki page or word doc specified the individual steps someone ran to build out this particular system, where additional builds took days as we found the time to run all of the software installers by hand. Chef recipes can both build out your systems and simultaneously document the process for you naturally.

More to come, stay tuned.

# Cookbook Name:: powershell
# Recipe:: default
# Author:: Bryan McLellan <btm@loftninjas.org>
#
# Copyright 2010, Opscode, Inc
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

remote_file "c:/windows/temp/Windows6.0-KB968930-x86.msu" do
  source "http://download.microsoft.com/download/F/9/E/F9EF6ACB-2BA8-4845-9C10-85FC4A69B207/Windows6.0-KB968930-x86.msu"
end

execute "c:/windows/temp/Windows6.0-KB968930-x86.msu /quiet /norestart" do
  action :nothing
  subscribes :run, resources(:remote_file => "c:/windows/temp/Windows6.0-KB968930-x86.msu")
end

Silent Ruby install on Windows

I dug up unattended ruby install directions while working on Chef installation directions for windows. Most of the secrets can be found in the RubyInstaller discussion group, such as here and here.

Grab the RubyInstaller for windows, then run: rubyinstaller-1.8.7-p302.exe /tasks="assocfiles,modpath" /silent. The tasks option checks (enables) the options to associated .rb files with ruby and adding the ruby binary directory to the path. You probably wouldn’t want these if you were installing multiple versions of ruby.