Category Archives: knife

Knife Reporting: apt + updates

Nathan and I were discussing yesterday the lack of a good way to visualize all of the updates waiting to be installed across a server cluster. I wrote a another knife script to do this, and Seth Falcon helped me clean it up.

# Knife exec script to search for and describe systems needing updates
# 2011-01-11 - Bryan McLellan <btm@loftninjas.org>

gem "net-ssh", ">= 2.0.23"
require 'net/ssh/multi'

class AptSsh < Chef::Knife::Ssh
  # Override configure_session so we can specify where to get the query
  def configure_session
    @longest = 0 # Set in Chef::Knife::Ssh.run
    q = Chef::Search::Query.new
    @action_nodes = q.search(:node, ARGV[2])[0]
    fqdns = @action_nodes.map { |item| item.fqdn }
    if fqdns.empty?
      Chef::Log.fatal("No nodes returned from search!")
      exit 10
    end
    session_from_list(fqdns)
  end

  def capture_command(command, subsession=nil)
    host_data = Hash.new { |h, k| h[k] = "" }
    subsession ||= session
    command = fixup_sudo(command)
    subsession.open_channel do |ch|
      ch.request_pty
      ch.exec command do |ch, success|
        raise ArgumentError, "Cannot execute #{command}" unless success
        ch.on_data do |ichannel, data|
          host_data[ichannel[:host]] << data
          if data =~ /^knife sudo password: /
            ichannel.send_data("#{get_password}\n")
          end
        end
      end
    end
    session.loop
    return host_data
  end
end

abort("usage: knife exec apt.knife QUERY") unless ARGV[2]
ssh = AptSsh.new
ssh.configure_session

# install apt-show-versions if it isn't installed
install_show_versions = <<EOH
if [ ! -e /usr/bin/apt-show-versions ] ; then
  echo INSTALLING APT-SHOW-VERSIONS ; sudo apt-get install apt-show-versions -y
fi
EOH
ssh.ssh_command(install_show_versions)

apt_data = ssh.capture_command('apt-show-versions -u -b')

apt_data.each do |host, data|
  puts "#{host} - #{data.count("\n")} updates, #{data.scan("-security").length} of which are security updates"
  data.each_line do |line|
    puts "  #{line}"
  end
end

# Prevents knife from trying to execute any command line arguments as addtional script files, see CHEF-1973
exit 0

Given a search query, this provides an output of:

$ knife exec apt.knife role:dev
webui-dev.example.org - 6 updates, 6 of which are security updates
  libc-bin/lucid-security
  libc-dev-bin/lucid-security
  libc6/lucid-security
  libc6-dev/lucid-security
  libc6-i686/lucid-security
  libc6-xen/lucid-security
monitoring-dev.example.orgs - 6 updates, 6 of which are security updates
  libc-bin/lucid-security
  libc-dev-bin/lucid-security
  libc6/lucid-security
  libc6-dev/lucid-security
  libc6-i686/lucid-security
  libc6-xen/lucid-security
rabbitmq-dev.example.org - 6 updates, 6 of which are security updates
  libc-bin/lucid-security
  libc-dev-bin/lucid-security
  libc6/lucid-security
  libc6-dev/lucid-security
  libc6-i686/lucid-security
  libc6-xen/lucid-security
couchdb-dev.example.org - 7 updates, 7 of which are security updates
  libc-bin/lucid-security
  libc-dev-bin/lucid-security
  libc6/lucid-security
  libc6-dev/lucid-security
  xulrunner-1.9.2/lucid-security
  xulrunner-1.9.2-dev/lucid-security
  xulrunner-dev/lucid-security

Lets say that you didn’t want to upgrade the couch box, you could modify the search query to not include that box and run again to confirm. Then reuse that search string to trigger the update.

$ knife exec apt.knife "role:dev NOT hostname:couchdb-dev"
$ knife ssh "role:dev NOT hostname:couchdb-dev" "sudo apt-get upgrade -y"

Reporting using Chef’s Knife

We have a table in our corporate Confluence wiki that looks something like this. It was a product of a few quick notes to allow the team to build out VMs in parallel, distributed across a number of virtual hosts, and not rely on luck for proper resource utilization. The number fields are the amount of gigabytes of RAM allocated to the guests. As long as the total didn’t exceed a magic number for the entire host, we could keep building and the team remained unblocked. It got the job done, but it is no way to keep track of guests and resources. First, wiki’s have a tendency to get out of date and rot. It takes a fair amount of work to know what needs to be updated and keep it that way on a daily basis. Also, tables in Confluence are not all that great. They are far from Excel. The total row contains no formula to autosum the column, and you find yourself regularly switching between editor modes depending on how you are entering data, such as by hand or using cut and paste.

So, what if your “back of the napkin” calculations could be sourced from real data? This is usually unrealistic because you don’t know what data you need until you need it, so it hasn’t been captured. But we do capture a lot of data about nodes in Chef, so it is sitting there waiting for you to have that bright idea. In this case, I wanted to reconcile the memory usage on the VM hosts. I could ssh to each host, and collect this information from libvirt by hand, and put it in a spreadsheet somewhere or add it up myself for Confluence. But what happens when a teammate builds another server tomorrow? Will they update the documentation? Is that a step we want to keep doing by hand, as we build and destroy VMs on a regular basis? Is it a step we should be doing by hand, these days?

Chef::Log.level= :fatal
printf "%-10s %-12s %-8s %s\n", "host", "guest", "MB RAM", "Run List"
search(:node, 'role:virt').each do |host|
  total_mem = 0
  host[:virtualization][:domains].each do |domain,attribs|
    begin
      guest = nodes.show(domain)
    rescue
      guest = search(:node, "hostname:#{domain}")[0]
    end
    run_list = guest.run_list if guest
    printf "%-10s %-12s %-8s %s\n", host.name, domain, attribs[:memory] / 1024, run_list
    total_mem += attribs[:memory]
  end
  printf "%-10s %-12s %-8s %s\n", host.name, "TOTAL", total_mem / 1024, ""
end

This example is a knife exec script. If you saved this to a file named virt_ram.knife, then you could run it with knife exec virt_ram.knife. While Chef has full blown APIs you can interface with, that can raise the cost of a small project higher than its worth. With knife exec, small proof of concept projects done on the side of your desk are approachable with ease.

Let us take a moment to step through the code.

1 — Set the Chef log level to fatal to surpress warnings generated my line 7 when we look a non-existent node.
2 — Print out a header describing the columns of data we are going to generate
3 — Search chef for all of the nodes with the role “virt” and loop through them, naming the node object ‘host’
5 — Each virtual host object contains a hash of domains in host[:virtualization][:domains]. Step through these assigning the key to ‘domain’ and the value (another hash) to ‘attribs’
6-10 —  Look to see if we have a node in Chef whose name matches the domain name in libvirt. If not, rescue and trap that failure and try to search for a node with that hostname. Your node names in chef don’t have to be your hostnames or fqdns. At Opscode we use short unique identifiers such as EC2 instance IDs, portions of randomly generated GUIDs, and asset tracking numbers.
11 — If we did find a matching node, get its run_list. This really explains what a host does at Opscode, as we tend two only have two or three meta roles applied to a node. Usually one represents the environment it is in, such as “prod” or “dev” and the other is its role like “webserver” or “couchdb”
12 — Print out the information we known about this guest
13 — Then add the memory used by that guest to the running total for the host.
15 — Finally, print out the total memory we’ve calculated for that host.
16 — Go back around and do it all again for the next host.

$ knife exec virt_ram.knife
host guest        MB RAM   Run List
vm1  rv-735a342e  2048     role[prod], role[web]
vm1  rv-8ef1f3d1  4096     role[prod], role[database]
vm1  rv-eb574386  512      role[prod], role[dns]
vm1  TOTAL        6656
vm2  rv-91ba412e  2048     role[prod], role[web]
vm2  rv-8e342d11  4096     role[prod], role[database]
vm2  rv-e3829f86  512      role[prod], role[dns]
vm2  TOTAL        6656
vm3  cobbler1     1024
vm3  rv-e3829f86  512      role[prod], role[dns]
vm3  TOTAL        1536

This data is made up, but I’ve shown on vm3 something that I found in my own infrastructure; there were guests left over from testing that weren’t named properly and never made it into the chef server. I wouldn’t know they were there if I hadn’t done an audit of the servers this way. This exemplifies the Chef philosophy that it should help you do what you want, not model what it thinks you should be doing. This isn’t a carefully engineered reporting feature built around a common practice of virtualization management. This is a script I hacked on with Dan‘s endless helpful guidance while I was waiting for an rsync to finish. I know others have written similar scripts to reconcile EC2 instances by comparing Chef and EC2 via Fog.

I love it. Do you have some spare time? What do you need? Chef will get you there.

Knife one-liners

Knife’s exec sub-command makes it easier to interact with a Chef server from the command line. Let’s assume I’ve created a data bag named cluster as follows:

{
  "id": "www1",
  "cats": "lol",
  "hostname": "www1.example.org"
}
{
  "id": "www2",
  "cats": "lol",
  "hostname": "www2.example.org"
}
{
  "id": "www3",
  "cats": "plz",
  "hostname": "www3.example.org"
}

If I wanted to get a list of hostnames for each data bag item where the value of ‘cats’ is ‘lol’, I would run:

$ knife exec -E "search(:cluster, 'cats:lol').each {|host| puts host[:hostname] }"
www2.example.org
www1.example.org

Granted, I could get this data from the search sub-command as well:

$ knife search cluster cats:lol
{
  "start": 0,
  "total": 2,
  "rows": [
    {
      "id": "www2",
      "cats": "lol",
      "hostname": "www2.example.org"
    },
    {
      "id": "www1",
      "cats": "lol",
      "hostname": "www1.example.org"
    }
  ]
}

However, it is hard to manipulate the result of this data. For instance, if I wanted to to check the status of ntp on each of these nodes as a “one-off command”, I could run:

 
$ knife ssh -m \
"`knife exec -E "search(:cluster, 'cats:plz').each {|host| puts host[:hostname] }" | xargs`" \
'/etc/init.d/ntp status'
www1.example.org  * NTP server is running
www2.example.org  * NTP server is running

The quoting can get pretty tricky fast. Instead, if you leave off the -E flag to knife exec, you can pass a script file to knife where you can write clearer scripts, which makes it easier to do more.

# Script contents
$ cat /tmp/knife.exec
targets = Array.new
search(:cluster, 'cats:lol').each do |host| 
  targets << host[:hostname]
end
puts targets.join(' ')

# Execute the script
$ knife exec /tmp/knife.exec 
www2.example.org www1.example.org

What if you needed to reconcile your hardware support contracts with the systems currently deployed? It is no problem to get a list of hardware with chef and knife.

# Script contents
$ cat /tmp/dell.exec
search(:node, 'dmi_system_manufacturer:Dell*').each do |node| 
  puts node[:dmi][:system][:serial_number] + "\t" + node[:fqdn] 
end

# Execute the script
$ knife exec /tmp/dell.exec
XJS1NF1 www1.example.org
XJS1NF2 www2.example.org
XJS1NF3 www3.example.org

These are pretty simple examples, but hopefully you can see how easy it is with Chef to use knife scripts to create reports, collect data, and execute one-off commands.

knife, or, my tool is actually a library

The Chef site starts out with, “Chef is a systems integration framework, built to bring the benefits of configuration management to your entire infrastructure.” There is an important point hidden in that description; Chef is not a CM tool. Of course it can be used as one, and many do, but from its beginning it has been leveraged by others such as Engine Yard inside of their own infrastructure. You can safely bet it will be an integral part of the Opscode platform when released as well.

While I was off dealing with my startup’s acquisition this fall, a chef user wrote knife. It was a command line client for interacting with a chef server. An impressively simple prototype, made possible and powered by the chef libraries and API. This has happened before with chef; for instance a while ago in the 0.6 era, an OS X GUI client called Casserole was written by another chef user with an itch to scratch. However, something has happened with knife that is interesting enough I’d like to specifically point out; it got mainlined and heavily expanded.

This happened for a handful of reasons. For one, it was simply a great idea. The kind of user who would be attracted to chef as a tool is very likely to be a power user who would rather not spend their time clicking around a graphical interface. It’s much easier to script a command line tool where needed, passing data in and out for quick hacks to your infrastructure. The folks at Opscode saw this, agreed, and set out to add full functionality to it for the upcoming 0.8 release.

What I think is most important is the planning a full API from the start. From hearing Adam talk about other tools being “first-class citizens” in chef-land, and knowing his experience writing iClassify as an early open source external node tool for puppet, I know this design was intentional. Using iClassify to tell puppet about your nodes was great, but puppet assumed that you only wanted this tool to classify nodes in the way puppet thought about nodes. Consequentially, when you wanted to to use data in iClassify about your nodes to make decisions about your infrastructure on the fly, you were forced to do it in templates. This created the expected repetition of loading the iClassify library and accessing iClassify in many templates, but also required you at times to do some fancy footwork to get data between templates when you really wanted puppet to know about the data itself.

Reductive Labs recently announced a dashboard for puppet. I was hoping this meant those barriers had been removed. It certainly creates really nice graphs from your puppet report data. However from the README it looks like you’re still pushing limited data into puppet using the external node interface. Reductive is going to have to expand this interface greatly if dashboard is to have any meaningful node integration benefits that we didn’t already have two years ago with iClassify.

Just as you can see some concepts from other configuration management tools in chef, you can see parts of iClassify. It was a great start and it was disappointing that the puppet community didn’t engage it further. Perhaps it was simply before its time, but I believe it was that there were too few doors into puppet-land to let you really take advantage of and grow external tools.

I think this was the lesson that Opscode learned, and consequently chef was born with an API. With it we can accomplish nearly anything we dream up. What is most exciting about this is that we can do whatever everyone else dreams up. I can’t wait to see what that is.