Microsoft Azure on Ubuntu 12.10

I had to upgrade to 12.10 Quantal Quetzal to get a recent enough set of nodejs packages for this to work.

# First install nodejs and the the nodejs package manager
sudo apt-get install nodejs npm
# The Amateur Radio 'node' package installs /usr/sbin/node, so nodejs installs /usr/bin/nodejs on Debian/Ubuntu. Node scripts don't like this.
sudo ln -s /usr/bin/nodejs /usr/bin/node
#Now install the azure tools using NPM
sudo npm install azure-cli -g

# Get your azure credentials with the link provided
azure account download
# Import these credentials
azure account import foo-credentials.publishsettings
# Get a list of VM images
azure vm image list
# Get the list of hosting locations
azure vm location list
# Create an instance. Change UNIQUE_SERVERNAME and USERNAME to your fitting.
azure vm create UNIQUE_SERVERNAME CANONICAL__Canonical-Ubuntu-12.04-amd64-server-20120924-en-us-30GB.vhd USERNAME --location "East US" --ssh
azure vm start UNIQUE_SERVERNAME
ssh USERNAME@UNIQUE_SERVERNAME.cloudapp.net

I found this document most useful once I got the tools working.

Stubbing class constants with rspec and Ruby

I had some Ruby code that utilized File::SEPARATOR and File::PATH_SEPARATOR to run on both unix and windows, so I wanted to stub these values to test for both platforms. There are couple examples out there, building on each other. This example adds a feature that saves and recalls the former value and this example builds on that to support class constants. Both expect Activerecord, so there’s a little working around that added here. I’m ripping this directly from my spec_helper.rb before I throw it away because it feels over-engineered and complicated.

def with_warnings(flag)
  old_verbose, $VERBOSE = $VERBOSE, flag
  yield
ensure
  $VERBOSE = old_verbose
end

# http://missingbit.blogspot.com/2011/07/stubbing-constants-in-rspec_20.html
def parse_constant(constant)
  source, _, constant_name = constant.to_s.rpartition('::')

  [constantize(source), constant_name]
end

def with_constants(constants, &block)
  saved_constants = {}
  constants.each do |constant, val|
    source_object, const_name = parse_constant(constant)

    saved_constants[constant] = source_object.const_get(const_name)
    with_warnings(nil) {source_object.const_set(const_name, val) }
  end

  begin
    block.call
  ensure
    constants.each do |constant, val|
      source_object, const_name = parse_constant(constant)

      with_warnings(nil) { source_object.const_set(const_name, saved_constants[constant]) }
    end
  end
end
####################

# File activesupport/lib/active_support/inflector/methods.rb, line 209
def constantize(camel_cased_word)
  names = camel_cased_word.split('::')
  names.shift if names.empty? || names.first.empty?

  constant = Object
  names.each do |name|
    constant = constant.const_defined?(name) ? constant.const_get(name) : constant.const_missing(name)
  end
  constant
end

Then you can perform:

  it "does something when running on Windows" do
    with_constants "::File::PATH_SEPARATOR" => ";" do
      # code
    end
  end

Downloading All The Github Repositories

I had a need to grab all of the Github repositories for Cookbooks, which is a Github user maintained by the Chef community for collecting many cookbooks in one place for development. All of these cookbooks should be on the Opscode Community site, which is where you should go if you’re browsing for cookbooks to use yourself. But I needed to grep through a large number of cookbooks to develop statistics on Chef Cookbook usage patterns, so I needed All The Things.

#!/usr/bin/env ruby
# 2012-01-11 Bryan McLellan <btm@loftninjas.org>
# Fetch the list of repositories from a Github user and 'git clone' them all

require 'rubygems'
require 'json'
require 'net/http'

url = "http://github.com/api/v2/json/repos/show/cookbooks"
dir = "cookbooks"

if File.basename(Dir.getwd) != dir
 if File.exists?(dir)
   puts "Target directory of '#{dir}' already exists."
   exit 1
 end

 Dir.mkdir(dir)
 Dir.chdir(dir)
end

resp = Net::HTTP.get_response(URI.parse(url))
data = resp.body

result = JSON.parse(data)

result['repositories'].each { |repo|
 puts "Fetching #{repo['url']}"
 system "git clone #{repo['url']}"
}

Generating entropy in the cloud

Virtual machines don’t produce a lot of entropy on their own. Typically the need for additional entropy triggers talk of hardware based entropy generators or network based entropy distribution protocols. Sometimes you just need a little bit of entropy for a moment.

$ sbuild-update --keygen
Generating archive key.

Not enough random bytes available.  Please do some other work to give
the OS a chance to collect more entropy! (Need 279 more bytes)

Disk tends to be one of the only remaining sources of entropy on virtual systems. I usually do something like this:

$ while true ; do cat /proc/sys/kernel/random/entropy_avail  ; \
    sudo find / > /tmp/find.log ; sync ; done

This numbers printed should go up and down as your application consumes the entropy. Hit CTRL+C when you’ve got enough. This is probably a bad source of entropy, but the world is inherently dangerous.

Disabling Firefox shortcuts on OS X

I joined a startup, and they gave me a MacBook Pro. It was bound to happen eventually; all the cool kids use MBPs and startups are cool, right?

The great period of adaption began, as I learned I couldn’t have simple technology like sloppy focus. One of the greatest inconveniences is the keyboard. I have a hard time using the keyboard on the laptop because special keys are in different places than I’m used to. Even with a Unicomp Spacesaver M (for those of us attached to the Model M), some change is apparent, like Apple using “delete” when they mean “backspace” (The Unicomp uses “delete ->” when they mean “delete”).

Most frustrating of this set of issues is that in Firefox the home and end keys go to the top and bottom of the page, whereas you have to use cmd+left and cmd+right to go to the beginning and end of a line in a textbox. However sometimes these keys represent page forward and page back, and sometimes they don’t (usually in a flash app, I believe). The solution is to install the keyconfig extension. After you restart firefox, you will find it in the Tools menu where you can disable “GoBackKb” and “GoForwardKb”. Then these keys work as expected in a text box and you no longer find yourself going back a page unintentionally, possibly losing a textbox full of input along the way.

Recreating the Opscode Chef validation key

Chef uses a special key pair to create new clients called the “validation client.” If you lose this file, or perhaps you end up with an empty CouchDB database and no longer have this client in the database, you could get a 401 Unauthorized error when trying to use it.

$ sudo chef-client
[Thu, 14 Jul 2011 11:44:44 +0000] INFO: *** Chef 0.10.2 ***
[Thu, 14 Jul 2011 11:44:44 +0000] INFO: Client key /etc/chef/client.pem is not present - registering
[Thu, 14 Jul 2011 11:44:44 +0000] INFO: HTTP Request Returned 401 Unauthorized: Failed to authenticate. Ensure that your client key is valid.
[Thu, 14 Jul 2011 11:44:44 +0000] FATAL: Stacktrace dumped to /var/cache/chef/chef-stacktrace.out
[Thu, 14 Jul 2011 11:44:44 +0000] FATAL: Net::HTTPServerException: 401 "Unauthorized"

Removing your validation key on the server, which is typically stored on the filesystem at /etc/chef/validation.pem and restarting the chef-server will create a new key pair both on disk and in the database.

$ ls -l /etc/chef/validation.pem
-rw-r--r-- 1 root root 1676 2011-07-14 11:44 /etc/chef/validation.pem
$ sudo rm /etc/chef/validation.pem
$ sudo /etc/init.d/chef-server restart
 * Restarting chef-server
 ~ Killing pid 10783 with INT
 ~ In 12051
   ...done.
$ ls -l /etc/chef/validation.pem
-rw------- 1 chef chef 1679 2011-07-14 11:55 /etc/chef/validation.pem

The same process works with the webui key pair, which knife uses as the default “admin” key to create initial knife clients.

$ ls -l /etc/chef/webui.pem
-rw------- 1 chef chef 1675 2011-07-14 11:31 /etc/chef/webui.pem
$ sudo rm /etc/chef/webui.pem
$ sudo /etc/init.d/chef-server restart
 * Restarting chef-server
 ~ Killing pid 12051 with INT
 ~ In 12091
   ...done.
$ ls -l /etc/chef/webui.pem
-rw------- 1 chef chef 1675 2011-07-14 11:57 /etc/chef/webui.pem
$ sudo /etc/init.d/chef-server-webui restart
 * Restarting chef-server-webui
 ~ Killing pid 10945 with INT
 ~ In 12129
   ...done.

If you’ve also lost your key for your knife client, you will need to create another one. Use a new client name unless you’re sure that the server does not still contain a registration for the previous client. After creating the new client, you can delete the old one from the server using ‘knife client delete MY_OLD_CLIENT’ by replacing MY_OLD_CLIENT with the name of the former client.

$ sudo knife configure --initial
Overwrite /home/ubuntu/.chef/knife.rb? (Y/N) y
Please enter the chef server URL: [http://ip-10-204-150-209.ec2.internal:4000] 
Please enter a clientname for the new client: [ubuntu] new_ubuntu
Please enter the existing admin clientname: [chef-webui] 
Please enter the location of the existing admin client's private key: [/etc/chef/webui.pem] 
Please enter the validation clientname: [chef-validator] 
Please enter the location of the validation key: [/etc/chef/validation.pem] 
Please enter the path to a chef repository (or leave blank): 
Creating initial API user...
Created client[new_ubuntu]
Configuration file written to /home/ubuntu/.chef/knife.rb

Provided with the new validation.pem, your node should be able to register now, as long as there is not still a client by the same name. If there is, you will need to delete that client first. Note that on Opscode Hosted Chef, you currently will need to delete the node as well, because the default permissions only allow the client that created the node to modify it.

$ knife client list
  chef-validator
  chef-webui
  ip-10-204-150-209.ec2.internal
  new_ubuntu
  ubuntu
$ knife client delete ip-10-204-150-209.ec2.internal
Do you really want to delete ip-10-204-150-209.ec2.internal? (Y/N) y
Deleted client[ip-10-204-150-209.ec2.internal]
$ sudo chef-client
[Thu, 14 Jul 2011 12:04:24 +0000] INFO: *** Chef 0.10.2 ***
[Thu, 14 Jul 2011 12:04:26 +0000] INFO: Client key /etc/chef/client.pem is not present - registering
[Thu, 14 Jul 2011 12:04:27 +0000] INFO: Run List is []
[Thu, 14 Jul 2011 12:04:27 +0000] INFO: Run List expands to []
[Thu, 14 Jul 2011 12:04:27 +0000] INFO: Starting Chef Run for ip-10-204-150-209.ec2.internal
[Thu, 14 Jul 2011 12:04:27 +0000] INFO: Loading cookbooks []
[Thu, 14 Jul 2011 12:04:27 +0000] WARN: Node ip-10-204-150-209.ec2.internal has an empty run list.
[Thu, 14 Jul 2011 12:04:28 +0000] INFO: Chef Run complete in 0.623124 seconds
[Thu, 14 Jul 2011 12:04:28 +0000] INFO: Running report handlers
[Thu, 14 Jul 2011 12:04:28 +0000] INFO: Report handlers complete

require-rubygems.overrides and gem2deb 0.2.2

For those working on moving debian ruby library packaging to gem2deb, you can exempt specific hits from the slick built in ‘require rubygems’ test by adding the path to debian/require-rubygems.overrides.

For instance, to exempt this:

debian/chef/usr/lib/ruby/vendor_ruby/chef/provider/package/rubygems.rb: require 'rubygems'
debian/chef/usr/lib/ruby/vendor_ruby/chef/provider/package/rubygems.rb: require 'rubygems/version'
debian/chef/usr/lib/ruby/vendor_ruby/chef/provider/package/rubygems.rb: require 'rubygems/dependency'
debian/chef/usr/lib/ruby/vendor_ruby/chef/provider/package/rubygems.rb: require 'rubygems/spec_fetcher'
debian/chef/usr/lib/ruby/vendor_ruby/chef/provider/package/rubygems.rb: require 'rubygems/platform'
debian/chef/usr/lib/ruby/vendor_ruby/chef/provider/package/rubygems.rb: require 'rubygems/format'
debian/chef/usr/lib/ruby/vendor_ruby/chef/provider/package/rubygems.rb: require 'rubygems/dependency_installer'
debian/chef/usr/lib/ruby/vendor_ruby/chef/provider/package/rubygems.rb: require 'rubygems/uninstaller'
debian/chef/usr/lib/ruby/vendor_ruby/chef/provider/package/rubygems.rb: require 'rubygems/specification'
debian/chef/usr/lib/ruby/vendor_ruby/chef/providers.rb: require 'chef/provider/package/rubygems'
Found some 'require rubygems' without overrides (see above).
ERROR: Test "require-rubygems" failed. Exiting.
dh_auto_install: dh_ruby --install /«BUILDDIR»/chef-0.10.0/debian/chef returned exit code 1
make: *** [binary] Error 1
dpkg-buildpackage: error: fakeroot debian/rules binary gave error exit status 2

debian/require-rubygems.overrides should contain:

debian/chef/usr/lib/ruby/vendor_ruby/chef/provider/package/rubygems.rb
debian/chef/usr/lib/ruby/vendor_ruby/chef/providers.rb

locale errors on debian

I received the following error while working on a Debian sid box:

$ schroot -l
terminate called after throwing an instance of 'std::runtime_error'
  what():  locale::facet::_S_create_c_locale name not valid
Aborted

With debconf + locales already installed, I ran ‘export | grep LANG’ to discover that my locale was set to ‘en_US.UTF-8’. Then I ran ‘dpkg-reconfigure locale’ and checked that locale and set it to the default.

Creating a Debian sid emi for Eucalyptus

For the most part, this is the same as any post about creating an image for Eucalyptus, but I had a hard time figuring out exactly how to put it all together. You need an up to date Debian sid system nearby to take the kernel and ramdisk from. I found having a sid VM easier than discovering the commands to build a sid initrd on my Ubuntu workstation.

# First, the prerequisites. You need debootstrap and the eucalyptools tools installed.
sudo apt-get install debootstrap euca2ools

# Export your eucalyptus variables to use the tools.
source ~/.euca/eucarc

# Create an empty disk image. You can adjust the count to change the root disk size. 1000 is about a GB.
dd if=/dev/zero of=image count=1000 bs=1M

# Put a filesystem on the new disk image
mkfs.ext3 -F image

# Mount the filesystem
mkdir chroot
sudo mount -o loop image chroot

# Install debian sid to the chroot. Notice that the ssh server and curl are included here
sudo debootstrap --include=openssh-server,curl,vim --arch amd64 sid chroot/ http://ftp.debian.org

# chroot into the image
sudo chroot chroot

# Setup basic networking and disk configurations
echo -e 'auto lo\niface lo inet loopback\nauto eth0\niface eth0 inet dhcp' >> /etc/network/interfaces
echo -e '/dev/sda1 / ext3 defaults 0 1\n/dev/sda2 swap swap defaults 0 0' > /etc/fstab

# Set a default root password if you want
# passwd

# Set up the image to automatically install ssh keys
mkdir /root/.ssh
cat <<EOS > /etc/rc.local
echo >> /root/.ssh/authorized_keys
curl -m 10 -s http://169.254.169.254/latest/meta-data/public-keys/0/openssh-key | grep 'ssh-rsa' >> /root/.ssh/authorized_keys
echo "AUTHORIZED_KEYS:"
echo "************************"
cat /root/.ssh/authorized_keys
echo "************************"
exit 0
EOS

# Leave the image
exit

# Unmount the image
sudo umount chroot

# After you've copied the latest /boot/vmlinuz* and /boot/initrd* from your sid system, upload the kernel + ramdisk
euca-bundle-image --image vmlinuz-2.6.38-2-amd64 --kernel true
euca-upload-bundle --bucket sid --manifest vmlinuz-2.6.38-2-amd64.manifest.xml
euca-register sid/vmlinuz-2.6.38-2-amd64.manifest.xml
euca-bundle-image --image initrd.img-2.6.38-2-amd64 --ramdisk true
euca-upload-bundle --bucket sid --manifest initrd.img-2.6.38-2-amd64.manifest.xml 
euca-register sid/initrd.img-2.6.38-2-amd64.manifest.xml

# Prepare the image for upload, use the values given by euca-register above here
euca-bundle-image -i image --kernel eki-XXXXXXXX --ramdisk eri-XXXXXXXX

# Rename to manifest to something descriptive and upload it
mv image.manifest.xml `date +%Y%m%d`.sid.manifest.xml
euca-upload-bundle -b sid -m `date +%Y%m%d`.sid.manifest.xml

# Register the image to get an EMI
euca-register sid/`date +%Y%m%d`.sid.manifest.xml

You should be able to use euca-run-instance on the emi that is returned by the last command. Remember to pass an ssh key (that eucalyptus knows about) using -k. If there are any issues, use euca-get-console-output to monitor the instance startup and tail the eucalyptus/nc.log file on the node controller for any errors. Building the initrd this way is a little hackish, because it is actually generated for your sid system, not for the one running in eucalyptus. Chicken, or the egg?

LVM errors with sbuild

Here is a strange one that I fixed but I’m not sure why. Roughly using the SBuildLVM Howto, and the Chef sbuild cookbook, I have an sbuild server. It was working alright for me, but another user was seeing this:

schroot -c lucid
E: 05lvm: File descriptor 3 (socket:[460392]) leaked on lvcreate invocation.
E: lucid-40c0e109-2d5d-4103-bf92-a44288595dcc: Chroot setup failed: stage=setup-start

When he ran with verbose mode, this line was particularly interesting:

E: 05lvm:

When I su’d to his user, it worked fine for me without verbose but failed similarly with the verbose flag.

In the course of debugging, I started trying to redirect output and I found that these changes to /etc/schroot/setup.d/05lvm fixed the problem. Unfortunately I’m running behind on work so I can’t track down the root cause right now.

--- 05lvm.orig	2011-03-10 19:28:17.000000000 +0000
+++ 05lvm	2011-03-10 19:37:54.000000000 +0000
@@ -36,10 +36,10 @@
 
 	if [ "$AUTH_VERBOSITY" = "verbose" ]; then
 	    lvcreate $VERBOSE --snapshot --name "$CHROOT_LVM_SNAPSHOT_NAME" \
-		"$CHROOT_DEVICE" $CHROOT_LVM_SNAPSHOT_OPTIONS
+		"$CHROOT_DEVICE" $CHROOT_LVM_SNAPSHOT_OPTIONS 2>&1 
 	else
 	    lvcreate $VERBOSE --snapshot --name "$CHROOT_LVM_SNAPSHOT_NAME" \
-		"$CHROOT_DEVICE" $CHROOT_LVM_SNAPSHOT_OPTIONS > /dev/null
+		"$CHROOT_DEVICE" $CHROOT_LVM_SNAPSHOT_OPTIONS 2>&1 > /dev/null
 	fi
 
     elif [ $1 = "setup-stop" ]; then
@@ -57,9 +57,9 @@
 		--pid=$PID || true
 
 	    if [ "$AUTH_VERBOSITY" = "verbose" ]; then
-		lvremove $VERBOSE -f "$CHROOT_LVM_SNAPSHOT_DEVICE" || true
+		lvremove $VERBOSE -f "$CHROOT_LVM_SNAPSHOT_DEVICE" 2>&1 || true
 	    else
-		lvremove $VERBOSE -f "$CHROOT_LVM_SNAPSHOT_DEVICE" > /dev/null || true
+		lvremove $VERBOSE -f "$CHROOT_LVM_SNAPSHOT_DEVICE" 2>&1 > /dev/null || true
 	    fi
 	else
 	    # The block device no longer exists, or was never created,

munin-cgi-graph with fcgid on ubuntu lucid

Two and a half years have passed since I wrote about running Munin with fastcgi triggered graphs on Debian etch. Unfortunately, not a lot has changed since then. A revolution in trending would have been nice. When I started here munin was triggering graph generation using CGI and was painfully slow to use. I switched over to cron triggered graph generation and was happy. After a data center migration, drawing the munin graphs for that cluster from cron was taking about 130 seconds. As a precaution I wanted to get this down a bit.

Someone asked me why munin-graph would have caused data loss because munin-update collects the data and I couldn’t remember. I had problems with both munin-update and munin-update taking over five minutes in certain circumstances back then. The latter was primarily from the slow response time of the SNMP queries I was doing against MSSQL servers. That was back during Munin 1.2 as well and a few things have changed since then, most relevant is that you no longer have to patch Munin for fastcgi support.

This time around I used fcgid instead of fastcgi. There are less licensing hurdles for fcgid, which was written to be compatible with fastcgi. Provided you already have munin running, install the prerequsites first.

sudo apt-get install libcgi-fast-perl libdate-manip-perl libapache2-mod-fcgid

The packaging should restart Apache as required to load the new module we just installed, but we need to configure our Munin site a bit to link our CGI script to fcgid. Add this to or update the VirtualHost block for your Apache configuration and reload Apache.

  ScriptAlias /cgi-bin/ /usr/lib/cgi-bin/

  <Directory /usr/lib/cgi-bin/>
    AllowOverride None
    Options ExecCGI -MultiViews +SymLinksIfOwnerMatch
    Order allow,deny
    Allow from all
  </Directory>

  <Location /cgi-bin/munin-fastcgi-graph>
    SetHandler  fastcgi-script
  </Location>

Add the following lines to your munin.conf. This causes the munin-graph that is run from cron to not generate any graphs (noops) and munin-html will update the img src links to use the CGI script to generate the graphs rather than linking directly to files. You’ll need to wait for the cron job to run once or run munin-html yourself to trigger this.

graph_strategy cgi
cgiurl_graph /cgi-bin/munin-fastcgi-graph

Triggering munin-html manually:

sudo -s
sudo -u munin /usr/share/munin/munin-html --debug

Remember that Apache needs to be able to write the graphs out. You will get no graphs and HTTP 500 errors in your Apache logs if the munin-cgi-graph script cannot write the graphs out. My Munin data directory, /var/www/munin/ is owned by ‘munin’ while Apache runs as ‘www-data’. The following commands fix this, but Apache is going to change the user ownership to ‘www-data’ when it saves a file by default, so if you try to switch back to munin-graph via cron, you’ll need to fix permissions again.

sudo chgrp -R www-data /var/www/munin
sudo chmod -R g+w /var/www/munin
sudo chgrp www-data /var/log/munin /var/log/munin/munin-graph.log
sudo chmod g+w /var/log/munin /var/log/munin/munin-graph.log

After the switch to fcgid generated munin graphs, generating all the graphs for a single node would take minutes and was quite painful. I gave the node more CPU resources, but it still took two minutes to draw a page of graphs. I ended up switching back to cron based graph generation. The additional CPU resources cut about forty seconds off the munin-graph time from cron, which is progress. Having the graphs immediately available when you need them is worth the cost of the CPU resources you could otherwise share that you would save from demand based graph generation via CGI. For the time being I intend to keep giving Munin more CPU until I find settle on a better way to do trending.

The power of Chef and Ruby

The argument that Chef is difficult to learn because recipes are written in Ruby is a fallacy.

package "vim"

cookbook_file "/home/btm/.vimrc" do
  source "dot-vimrc"
  owner "btm"
  group "btm"
  mode "0644"
end

With the exception of the do/end block, that doesn’t look like a programming language at all and is way easier to grok than some configuration file syntaxes I’ve used. Any tool’s configuration file syntax has a learning curve and refusing to learn a new one means you’re going to be stuck in the past using old tools. Someone may not want to try out nginx today because they already know how to configure Apache, and I understand that up to a point. The tool you know is sometimes easier to use in the less than ideal conditions because you already understand it. I can’t spend all of my time learning new tools anymore than the next person, but frankly if you are unwilling to learn something new, you are in the wrong industry. We are moving fast over here.

Even if you don’t know any Ruby, over time you start reusing other people’s code shortcuts because it is easier to write understandable and flexible code.

# Install useful troubleshooting tools that get regular use
%w{htop dstat strace sysstat gdb tmux tshark}.each do |tool_package|
  package tool_package
end

# Install the correct apache package depending on distribution
package "apache2" do
  case node[:platform]
  when "centos","redhat","fedora","suse"
    package_name "httpd"
  when "debian","ubuntu"
    package_name "apache2"
  end
  action :install
end

Because Chef recipes are written in Ruby and they are compiled on the client rather than the server you can leverage Ruby in very powerful ways. When we want to create databases and grant privileges for a web application, we can use a number of Chef resources, primarily execute, to use existing tools such as mysqladmin. We can also leverage Ruby to access Ruby libraries. Ruby code in a Chef recipe is executed during convergence, but Ruby code in a ruby_block resource is executed along with other resources during compilation and can be notified like any other resource. You can get a better idea of when these steps happen from the Anatomy of a Chef Run page on the wiki. Here is some code I used recently that is quite a bit simpler to read and shorter than using resources to perform all of the steps.

    ruby_block "Create database + execute grants" do
      block do
        require 'rubygems'
        Gem.clear_paths
        require 'mysql'

        m = Mysql.new(mysql_host, "root", mysql_root_password)
        if !m.list_dbs.include?(node[:jira][:database_name])
          # Create the database
          Chef::Log.info "Creating mysql database #{node[:jira][:database_name]}"
          m.query("CREATE DATABASE #{node[:jira][:database_name]} CHARACTER SET utf8")

          # Grant and flush permissions
          Chef::Log.info "Granting access to #{node[:jira][:database_name]} for #{node[:jira][:database_user]}"
          m.query("GRANT ALL ON #{node[:jira][:database_name]}.* TO '#{node[:jira][:database_user]}'@'localhost' IDENTIFIED BY '#{node[:jira][:database_password]}'")
          m.reload
        end
      end
    end

Because Chef makes it easy to model data, you don’t need to write the above code. You can just use what I wrote and change your variable names. If you use it for more than one web_application, you could make it a cookbook definition or LWRP that you could extend as you need more features.

initialize_mysql_db "jiradb" do
  database_name node[:jira][:database_name]
  database_user node[:jira][:database_user]
  database_password node[:jira][:database_password]
end

Monitoring Unicorn connections with munin

Unicorn doesn’t have any monitoring hooks. Typically folks either put nginx in front and monitor response time, do some backlog magic and track errors or make guesses based on other available information. I’ve been using a modified version of the unicorn_status munin plugin for a while. It tracks CPU time for a thread and considers that thread idle if it hasn’t changed after sleeping for a second. This doesn’t pan out under load. Still, here it is.

#!/usr/bin/env ruby
#
# unicorn_status - A munin plugin for Linux to monitor unicorn processes
#
#  Copyright (C) 2010 Shinji Furuya - shinji.furuya@gmail.com
#  Copyright (C) 2010 Opscode, Inc. - Bryan McLellan <btm@loftninjas.org>
#    - Specify pid file via environment variable
#    - Do not assume process names
#  Licensed under the MIT license:
#  http://www.opensource.org/licenses/mit-license.php
#

module Munin
  class UnicornStatus

    def initialize
      @pid_file = ENV['UNICORN_PID']
    end

    def master_pid
      File.read(@pid_file).to_i
    end

    def worker_pids
      result = []
      ps_output = `ps w --ppid #{master_pid}`
      ps_output.each_line do |line|
        chunks = line.strip.split(/\s+/, 5)
        pid = chunks[0]
        result << pid.to_i if pid =~ /\A\d+\z/
      end
      result
    end

    def worker_count
      worker_pids.size
    end

    def idle_worker_count
      result = 0
      before_cpu = {}
      worker_pids.each do |pid|
        before_cpu[pid] = cpu_time(pid)
      end
      sleep 1
      after_cpu = {}
      worker_pids.each do |pid|
        after_cpu[pid] = cpu_time(pid)
      end
      worker_pids.each do |pid|
        result += 1 if after_cpu[pid] - before_cpu[pid] == 0
      end
      result
    end

    def cpu_time(pid)
      usr, sys = `cat /proc/#{pid}/stat | awk '{print $14,$15 }'`.strip.split(/\s+/).collect { |i| i.to_i }
      usr + sys
    end
  end
end

case ARGV[0]
when "autoconf"
  puts "yes"
when "config"
  puts "graph_title Unicorn - Status"
  puts "graph_args -l 0"
  puts "graph_vlabel number of workers"
  puts "graph_category Unicorn"
  puts "total_worker.label total_workers"
  puts "idle_worker.label idle_workers"
else
  m = Munin::UnicornStatus.new
  puts "total_worker.value #{m.worker_count}"
  puts "idle_worker.value #{m.idle_worker_count}"
end

And the configuration file:

$ sudo cat /etc/munin/plugin-conf.d/unicorn
      [unicorn_*]
      user root
      env.UNICORN_PID /etc/sv/opscode-chef/supervise/pid

I wrote another plugin today that uses raindrops to collect information about the active and queued connections. It is interesting how greatly active connections fluctuates. Thus, active connections don’t produce a stable munin graph, but having the queue depth recorded is pretty useful for tracking down latency issues.

#!/usr/bin/env ruby
#  Copyright: 2011 Opscode, Inc.
#  Author: Bryan McLellan <btm@loftninjas.org>
#
#   Licensed under the Apache License, Version 2.0 (the "License");
#   you may not use this file except in compliance with the License.
#   You may obtain a copy of the License at
#
#       http://www.apache.org/licenses/LICENSE-2.0
#
#   Unless required by applicable law or agreed to in writing, software
#   distributed under the License is distributed on an "AS IS" BASIS,
#   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#   See the License for the specific language governing permissions and
#   limitations under the License.

require 'rubygems'
require 'raindrops'

def collect(port)
  # raindrops requires an array of strings, even if it denies this 
  addr = [ "0.0.0.0:#{port}" ]
  stats = Raindrops::Linux.tcp_listener_stats(addr)

  puts "active.value #{stats[addr[0]].active}"
  puts "queued.value #{stats[addr[0]].queued}"
end

if ARGV[0] == "config"
  puts "graph_title Unicorn - connections"
  puts "graph_args -l 0"
  puts "graph_printf %6.0lf"
  puts "graph_vlabel connections"
  puts "graph_category Unicorn"
  puts "active.label active"
  puts "queued.label queued"
  exit 0
end

if $0 =~ /.*_(\d+)/
  # the munin wildcard format of plugin_value
  port = $1
elsif ARGV.size > 0
  port = ARGV[0]
else
  usage = "Usage: #$0 port or #{$0}_port"
  abort usage
end

collect(port)

Usage is the same as any wildcard munin plugin.

  1. Install the raindrops gem
  2. Drop the above code in “/usr/share/munin/plugins/unicorn_connections_”
  3. Create a link from “/etc/munin/plugins/unicorn_connections_UNICORNPORT” to the above script
  4. killall -HUP munin-node

Graphs should start showing up in five or ten minutes. You can always test like so:

$ nc localhost 4949
# munin node at unicorn.example.org
fetch unicorn_connections_6880
active.value 5
queued.value 0
.
quit

Of course, I use the Chef and the munin cookbook’s munin_plugin definition, so my application’s cookbook has this additional code:

# required for unicorn_connections_ munin plugin
gem_package "raindrops"

munin_plugin "unicorn_connections_" do
  plugin "unicorn_connections_6880"
  create_file true
end

Init replacements change fundamental assumptions

The trend with init replacements

When you write a number of service resource providers for a configuration management system, you get some intimate experience with the quirks of init systems. A slew of new ones are working their way into stable releases lately which seem primarily motivated by decreasing system startup time by allowing services to be started in parallel. For instance, Ubuntu has been moving to upstart, the latest release of Debian uses insserv, and OS X uses launchd. There is overlap in design, and certainly parallel service execution isn’t the only significant improvement. Since init is a basic building block of our systems, small changes can cause large ripples. In the end we will have some great new functionality, but we’re in a rough patch of transition right now and need to ensure the functionality we rely upon doesn’t get passed over.

Disabling services with Upstart

If you want a service to not start on system startup, but still want to be able to start it, you have to comment out a line in the configuration file. Programmatically editing configuration files, from a script or a configuration management system is difficult to do cleanly. In general you want to avoid minor changes to configuration files because then you have to reconcile the differences when you upgrade the package. There are plans to add support for an override file wherein you can specify that the service is manual, but clearly Ubuntu server users are taking a backseat to desktop users inside Canonical where Upstart is developed.

Restarting services with Upstart

Which is interesting, as Ubuntu server related packages are being migrated to use Upstart. We start to run into additional quirks, such as when you restart a service that isn’t running, Upstart does not start it. We plan to work around this behavior in Chef but others have clearly taken notice.

$ status mysql
mysql start/running, process 548
$ sudo restart mysql
mysql start/running, process 649
$ sudo stop mysql
mysql stop/waiting
$ sudo restart mysql
restart: Unknown instance: 

Insserv changes how you specify runlevels

On Debian lenny you could specify service runlevels and priorities as such:

$ sudo update-rc.d apache2 start 20 3 4 5 . stop 80 0 1 .
 Adding system startup for /etc/init.d/apache2 ...
   /etc/rc0.d/K80apache2 -&gt; ../init.d/apache2
   /etc/rc1.d/K80apache2 -&gt; ../init.d/apache2
   /etc/rc3.d/S20apache2 -&gt; ../init.d/apache2
   /etc/rc4.d/S20apache2 -&gt; ../init.d/apache2
   /etc/rc5.d/S20apache2 -&gt; ../init.d/apache2

However on squeeze, update-rc.d is wrapped by insserv, which ignores your request and acts on the LSB headers.

$ sudo update-rc.d apache2 start 20 3 4 5 . stop 80 0 1 2 6 .
update-rc.d: using dependency based boot sequencing
update-rc.d: warning: apache2 start runlevel arguments (3 4 5) do not match LSB Default-Start values (2 3 4 5)
update-rc.d: warning: apache2 stop runlevel arguments (0 1 2 6) do not match LSB Default-Stop values (0 1 6)
$ find /etc/rc* -name '*apache*'
/etc/rc0.d/K01apache2
/etc/rc1.d/K01apache2
/etc/rc2.d/S18apache2
/etc/rc3.d/S18apache2
/etc/rc4.d/S18apache2
/etc/rc5.d/S18apache2
/etc/rc6.d/K01apache2

Insserv does have an option to override the LSB headers, but the update-rc.d wrapper doesn’t use it and you have to be very careful as it fails silently if you use it wrong.

$ sudo insserv -r apache2
$ sudo insserv apache2,start=3,4,5,stop=0,1,2,6
$ find /etc/rc* -name '*apache*'
/etc/rc0.d/K01apache2
/etc/rc1.d/K01apache2
/etc/rc2.d/K01apache2
/etc/rc2.d/S18apache2
/etc/rc3.d/S18apache2
/etc/rc4.d/S18apache2
/etc/rc5.d/S18apache2
/etc/rc6.d/K01apache2

Additional behavior to work around in Chef.

Moving forward

Distributions continue to change the way we interact with init with every release. This is clearly a reasons to use a configuration management tool. You know that you want mysql to never start automatically because your cluster resource manager controls it, but how you achieve that has been changing lately with regularity. You can let your configuration management tool abstract that from you. Still, we need to stay involved in the discussions in the open source communities whose software we use and be proactive citizens.