Reporting using Chef’s Knife

We have a table in our corporate Confluence wiki that looks something like this. It was a product of a few quick notes to allow the team to build out VMs in parallel, distributed across a number of virtual hosts, and not rely on luck for proper resource utilization. The number fields are the amount of gigabytes of RAM allocated to the guests. As long as the total didn’t exceed a magic number for the entire host, we could keep building and the team remained unblocked. It got the job done, but it is no way to keep track of guests and resources. First, wiki’s have a tendency to get out of date and rot. It takes a fair amount of work to know what needs to be updated and keep it that way on a daily basis. Also, tables in Confluence are not all that great. They are far from Excel. The total row contains no formula to autosum the column, and you find yourself regularly switching between editor modes depending on how you are entering data, such as by hand or using cut and paste.

So, what if your “back of the napkin” calculations could be sourced from real data? This is usually unrealistic because you don’t know what data you need until you need it, so it hasn’t been captured. But we do capture a lot of data about nodes in Chef, so it is sitting there waiting for you to have that bright idea. In this case, I wanted to reconcile the memory usage on the VM hosts. I could ssh to each host, and collect this information from libvirt by hand, and put it in a spreadsheet somewhere or add it up myself for Confluence. But what happens when a teammate builds another server tomorrow? Will they update the documentation? Is that a step we want to keep doing by hand, as we build and destroy VMs on a regular basis? Is it a step we should be doing by hand, these days?

Chef::Log.level= :fatal
printf "%-10s %-12s %-8s %s\n", "host", "guest", "MB RAM", "Run List"
search(:node, 'role:virt').each do |host|
  total_mem = 0
  host[:virtualization][:domains].each do |domain,attribs|
    begin
      guest = nodes.show(domain)
    rescue
      guest = search(:node, "hostname:#{domain}")[0]
    end
    run_list = guest.run_list if guest
    printf "%-10s %-12s %-8s %s\n", host.name, domain, attribs[:memory] / 1024, run_list
    total_mem += attribs[:memory]
  end
  printf "%-10s %-12s %-8s %s\n", host.name, "TOTAL", total_mem / 1024, ""
end

This example is a knife exec script. If you saved this to a file named virt_ram.knife, then you could run it with knife exec virt_ram.knife. While Chef has full blown APIs you can interface with, that can raise the cost of a small project higher than its worth. With knife exec, small proof of concept projects done on the side of your desk are approachable with ease.

Let us take a moment to step through the code.

1 — Set the Chef log level to fatal to surpress warnings generated my line 7 when we look a non-existent node.
2 — Print out a header describing the columns of data we are going to generate
3 — Search chef for all of the nodes with the role “virt” and loop through them, naming the node object ‘host’
5 — Each virtual host object contains a hash of domains in host[:virtualization][:domains]. Step through these assigning the key to ‘domain’ and the value (another hash) to ‘attribs’
6-10 —  Look to see if we have a node in Chef whose name matches the domain name in libvirt. If not, rescue and trap that failure and try to search for a node with that hostname. Your node names in chef don’t have to be your hostnames or fqdns. At Opscode we use short unique identifiers such as EC2 instance IDs, portions of randomly generated GUIDs, and asset tracking numbers.
11 — If we did find a matching node, get its run_list. This really explains what a host does at Opscode, as we tend two only have two or three meta roles applied to a node. Usually one represents the environment it is in, such as “prod” or “dev” and the other is its role like “webserver” or “couchdb”
12 — Print out the information we known about this guest
13 — Then add the memory used by that guest to the running total for the host.
15 — Finally, print out the total memory we’ve calculated for that host.
16 — Go back around and do it all again for the next host.

$ knife exec virt_ram.knife
host guest        MB RAM   Run List
vm1  rv-735a342e  2048     role[prod], role[web]
vm1  rv-8ef1f3d1  4096     role[prod], role[database]
vm1  rv-eb574386  512      role[prod], role[dns]
vm1  TOTAL        6656
vm2  rv-91ba412e  2048     role[prod], role[web]
vm2  rv-8e342d11  4096     role[prod], role[database]
vm2  rv-e3829f86  512      role[prod], role[dns]
vm2  TOTAL        6656
vm3  cobbler1     1024
vm3  rv-e3829f86  512      role[prod], role[dns]
vm3  TOTAL        1536

This data is made up, but I’ve shown on vm3 something that I found in my own infrastructure; there were guests left over from testing that weren’t named properly and never made it into the chef server. I wouldn’t know they were there if I hadn’t done an audit of the servers this way. This exemplifies the Chef philosophy that it should help you do what you want, not model what it thinks you should be doing. This isn’t a carefully engineered reporting feature built around a common practice of virtualization management. This is a script I hacked on with Dan‘s endless helpful guidance while I was waiting for an rsync to finish. I know others have written similar scripts to reconcile EC2 instances by comparing Chef and EC2 via Fog.

I love it. Do you have some spare time? What do you need? Chef will get you there.

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload the CAPTCHA.