Tag Archives: configuration management

Configuration Management vs Meatcloud: 5 reasons CM wins

First, a bit of social commentary.

Sometimes we refer to the way something ought to be accomplished as the RightWay[tm], sarcastically noting that every best practice contains a certain degree of opinion. As we dedicate more time to doing something our way, we become invested in it being the RightWay, and risk becoming defensive about our choices. Adam Jacob calls this “survivorship-bias,” and I’ve spent some time listening to him think about what he feels the risks are, and considering them myself. When we make significant personal investment in a choice, it becomes a personal challenge to remain impartial about the merits of that choice over time.

I’ve previously said that Configuration Management is the act of programatically configuring your systems. Cloud computing says that building servers is undifferentiated heavy lifting; unless your service is building servers, you should really let someone else do it and focus on the product or service you’re actually trying to sell. Configuration Management is the first step in bringing this same ideology to configuring systems. We are not in the business of selling configured servers any more than we are in the business of providing coffee to our employees, we happen to do both. We build our systems to enable our business to conduct business. In my case, operations is enabling our customers to use the web product that we develop.

We are all members of the Configuration Management community, because we believe it is ultimately a better process for building systems. We naturally have different ideas about how that process should execute, which among other differentiating factors is often that “goals are different but are left unstated” in the community. The level of preference here and resulting fragmentation is not any different than holding an opinion over what open source operating system one should use for a specific task. The merits of our choices are worth discussing, but the implication that tools and libraries should all come to the same conclusions about design is like implying that the world only needs one type of hammer.

So, defining the meatcloud as the established notion that having your internet presence grow forms a direct relationship with hiring more people to rack servers, install software, and update configuration files; I asked around a little, why do we think Configuration Management is better?

  • Abstractation
  • You don’t need to be a mechanic to drive a car, should you need to be a subject matter expert on Apache to run a webserver? Infrastructure as code shows us how and the resulting communities are starting to implement this. When we spend less time getting the parts working, we can spend more time innovating better solutions with the parts.

  • Self-documenting
  • Ever rebuild a server that someone built long ago and is no longer with the organization, and find many small parts necessary to make it work that nobody bothered to write down? Placing those small changes and required files in configuration management ensures that even if the code doesn’t run flawlessly on an upgraded operating system, you know everything that went in to making it work. Since you’re configuring the system through configuration management, a lot less falls through the cracks because documentation is often an afterthought to getting the system working.

  • Consistency and Repeatability
  • What is on that system? Everything you told CM to put there. ‘Golden image‘ disk images often leave you in the aforementioned situation where you don’t know how to recreate them, but often you don’t know what else ended up there. Configuration Management allows you to build many copies of the same system easily, from scratch every time.

  • Agility
  • Did sales tell you they had fifty customers and it turned out to be five hundred? How long will it take you to build the extra servers by hand? How many extra people do you have to put into the meatcloud to get that done in time? Business requirements always change, and slow moving businesses are at a disadvantage to dealing with this. The inability to build and deploy servers fast enough should never be a reason your business fails.

  • Flexibility, or Don’t Repeat Yourself
  • Three applications on one server? Or one application on three servers? Apache doing different jobs on multiple servers? Moving applications between servers and leveraging existing infrastructure code for new projects is easy. We automate tasks that are repeatable, but often scripts are written to accomplish one repeatable task. Here we say, how can we treat configuration as sets of modular tasks that we can mix and match?

Opscode Cookbooks community announced

It’s no secret that I think configuration management is epic. I spent yesterday hopping around the office in excitement due to Opscode’s alpha announcement of their new Cookbooks site. This is exactly the next step needed in the CM community. Shortly thereafter, while talking with another developer about how awesome Cookbooks is, he quipped about it being second in awesome to chef itself. I argue that chef exists for cookbooks, not the other way around. It was later said Chef is the tool, cookbooks are the art”. Cookbooks is all about community.

Chef is a tool, and an open-source one, so it does have a community. A vibrant one at that; a recent ohloh factoid claims “This is one of the largest open-source teams in the world, and is in the top 2% of all project teams on Ohloh.” Cookbooks is the other way around, it is a community first and at tool second. Cookbooks has already been compared to github and launchpad (with PPAs), because like these sites it brings people with a common task together around a tool like git or ubuntu. It has been noted that every configuration management project needs something like Cookbooks, and I agree.

The community that builds around Cookbooks will be a vanguard of agile system administrators looking to reduce their undifferentiated heavy lifting. These include the people who recognized that it is their product that they are selling and no longer their infrastructure, and took up cloud computing as a result. They will soon find that the configuration management community will help them continue to spend less time reinventing the operations wheel, and more time innovating it. Cookbooks will make it easier to share infrastructure-code with like-minded individuals and is the next step beyond github and wikis for this material. These models still required concerted effort to share back with the community when you finished up your recipe, but the Cookbooks site  and it’s open API stands to change that. We are now poised to leverage the open source model to increase the rise of infrastructure as code.

Infrastructure as a code sample

Upon returning from Open Source Bridge in Portland last week, I collected my thoughts from the convergence of configuration management developers and wrote The Configuration Management Revolution, centered around the idea the something bigger is happening than we’re acknowledging.

Today Tim O’Reilly posted a blog entry about the origins of Velocity. He says “I had been thinking in the abstract about the fact that as we move to a software as a service world, one of the big changes was that applications had people “inside” of them, managing them, tuning them, and helping them respond to constantly changing conditions.” which builds on his post three years ago about operations becoming the “elephant in the room”.

That article is worth revisiting. It tails off commenting on the lack of open source deployment tools. That has definitely changed, as we have a number of open source options in the operations tool space now. O’Reilly has published a few books on operations as well, although hasn’t taken the step of considering it a category in their book list yet.

The web is full of howtos, blog posts and assorted notes on piecing together open source software to build a server. One doesn’t have to be an expert on all of the ingredients, but rather be able to figure out how to assemble them. As time goes on, the problems of the past become easier to solve; former creative solutions become mainstream and the industry leverages those advantages. This frees up mindshare for something new. I’ll emphasize that this doesn’t mean one no longer needs to have some understanding of why the server works, but the time spent engineering that particular solution is reduced because we already have the wheel, so to speak.

Writing configuration management and thus infrastructure howtos may get one started, but it’s the old way of thinking. If you can write infrastructure as code, you can share infrastructure as code. It is essential that this is achieved in a format that both promotes sharing and is relatively easy. Take the Munin and Ganglia plugin sites for instance. Munin is relatively easy to get started with and has a simple enough site for exchanging plugins. While I consider Ganglia technically superior, it’s community is not. I tried submitting to Ganglia’s plugin site once and failed. This step has to be more than a site where files are dumped, it needs community support.

I asked Luke about this at OSBridge and he said Reductive Labs plans to have a module sharing website online soon for puppet. For now, you can find an number of puppet modules in the wiki. Opscode is on track, with their chef cookbooks available as a git repository on github, combined with a ticketing system allowing users to fork, modify and contribute changes. There’s even a wiki page helping to instruct how to leverage these.

Of course, you’ll always need experienced engineers to design and tune your infrastructure. However, the time and mindshare savings from creating a LAMP stack by setting a tag or role to ‘lamp’ is immense. As Opscode produces more open APIs between parts of their product, my mind imagines the offspring of the Chef UI and virt-manager. How long until the popup touting “New features are available for your web cluster”?

The Configuration Management Revolution

The revolution is coming, and it’s about time I wrote about it.

About a year and a half ago I was settling in to a new system administration job at a startup. I was told a consulting company would be coming in to bootstrap configuration management for us. I had previously glanced at cfengine out of curiosity, but ended up spending only a couple of hours looking at it. In my mind configuration management was analogous to unattended software installation, which I was definitely in support of, but had yet to perceive how it was going to change how I viewed infrastructure.

That consulting company was HJK Solutions. Some of my coworkers had previously established relationships with a couple of the partners of HJK, but I didn’t know anything about them myself. I was along for the ride. They gave us a presentation where they showed iClassify and puppet working together to automate infrastructure for other clients, but it wasn’t until the next meeting where we made technical decisions about the implementation that I really came to appreciate their insight. It is much more interesting why someone makes a choice than the choice itself, and this was my first of many since opportunities to incite the opinions of Adam Jacob.

A year of using puppet later, not only was I hooked but my excitement about the possibilities of configuration management had grown beyond what the software could do at the time. Both my excitement and frustration was apparent and got me a sneak peak at Opscode’s Chef. The design of Chef embodies “the unix way” of chaining many tools together insofar that it allows us to take building blocks that are essentially simple on their own but from behind our backs present a system that is revolutionary enough we almost fail to recognize the familiar pieces of it.

Chef is a systems integration framework, built to bring the benefits of configuration management to your entire infrastructure.

This is not an article about Chef, this is about the big picture. However, if you take enough steps back from that statement it becomes apparent that Opscode is building toward that picture. I want to share with you the excitement that short description garners inside of me.

Configuration management alone is the act of programmatically configuring your systems. Often the benefits are conveyed in support of process, but in more agile communities different advantages are touted; such as allowing one to wrangle larger number of servers by reducing build times in the name of vertical scalability, building more maintainable infrastructures by leveraging the self-documenting side-affect of configuration languages, and reducing administrator burnout by cutting a swath in the number of repetitive tasks one must perform. These are unarguably significant boons. Nevertheless, one does not have to look hard to find a curmudgeon reluctant to change, claiming they don’t want to learn another language, that having systems run themselves will surely cause failure, or perhaps some skynet-esque doomsday scenario. History is not short of examples of luddites holding steadfast against new technology, but it is somewhat paradoxical to see this mentality held in such a technologically oriented field.

The recent Configuration Management Panel at the Open Source Bridge conference in Portland amassed many relevant core developers in one city long enough to provide a good vibe for the direction of the available tools and underscore our common charge. But the focus was more about how we will get more users of configuration management tools than why they are going to have to use them. In retrospect, perhaps I should have asked of the panel their views of how configuration management will reshape systems administration.

Configuration management is about more than automation. Some who have foreseen this have started to convey this by discussing managing infrastructures rather than systems. In analogy, the power loom, Gutenberg press, and intermodal shipping container were not merely time saving tools of automation. These inventions reshaped more than their workforce and industry, but also the global economy.

I’m fully aware of the tone set by such a call of prophecy. How will a tool that helps us configure multiple machines at once make such significant ripples in our day to day lives of the future? It will because we will be enabled to solve new problems that we did not yet realize existed. As other technological advances served as a catalyst for globalization, the industrial and scientific revolutions; changing how we build our information infrastructure leaves us poised for an exciting set of challenges that do not yet exist.

Building scalable operations infrastructure with OSS

I’m the lead systems administrator at Widemile and run operations here. Sometimes I do other things, but operations is most interesting. My linkedin profile may give you an idea of where I’m coming from, but it misses all those late nights working with OSS because I enjoy it. I usually blog on my own site, but it often serves more as a technical journal than what we are up to at Widemile, which will be the differentiator. As a rule, I’m not a developer. Certain facts may convince you otherwise, but I try to stay out of our product. You’ll start to hear from Peter Crossley , our lead software architect, soon enough. Perhaps some other developers after that. I’ll christen this blog with a round-up of how we’re building our infrastructure at Widemile.

Most recently I’ve been in heavy development on Chef and Ohai. We’ve been using puppet for about a year and a half now. Check out my Infrastructure Engineering slide deck for where we were with that a few months ago. I’ve been happy with it except for a few issues which ended up being mostly major architectural changes to fix. Adam at Opscode has a good post entitled 9 things to like about Chef, that outlines some of these difference. There’s a lot of e-drama around Opscode’s decision to write a new product rather than usher changes into puppet. I won’t touch that other than to say that we had problems with puppet that chef fixes.

Almost all of our servers are in configuration management. Which means that no one-off work is done on the production servers so that all changes are self-documenting. Granted, we’re a small shop and sometimes I’ll do minor debugging on a production server, but any changes do end up in CM.

Our servers are almost all kvm guests under libvirt running on Dell blades. There’s some information about how we got here in a slidedeck I made for GSLUG entitled Virtual Infrastructure. Apparently using kvm in production isn’t all that heard of, but since we’re a small shop we’re able to leverage new technology very quickly to make the best of it. With the use of vmbuilder, libvirt, kvm and capistrano, we build out new nodes in a matter of minutes. More importantly, it’s just a couple commands.

Once Chef is intergrated with the libvirt API we expect to be able to further simplify our deployment. The idea behing is that it will be a ghetto version of Solo, which EY built using Chef. Eventually we’ll pull out capistrano. While it’s nice for interacting with multiple machines at once, it really was written for a different purpose than what we use it for. There will be replacement functionality in Chef shortly.

Learning to cook

The chef satire will never die. Adam posted 9 things to like about chef today, which is an expanded and much better version of my original blog post on chef. AJ had an intermediate post that tried to summarize a lot of contraversy and drama. Hopefully that silliness is settling down.

I’ve been coding a lot lately, contributing to both chef and ohai. We’ve been talking about trying to use chef in the NOC at Shmoocon so that next year we can reuse the recipes rather than build the servers again by hand. Most everything runs on borrowed hardware at Shmoocon, so you’re not guaranteed everything is the way you left it a year later. We use FreeBSD for some monitoring at Shmoocon, so I’ve been spending a lot of time getting chef/ohai ready for FreeBSD.

I don’t think I’ve ever contributed to a project to this degree before. Ohloh doesn’t think so either. The last time I can recall really adding code to a project that was more than a couple files was at an ISP in Maine back in the early 00’s. It was called Panax, and there’s the usual pile of silly isp shop history. It’s funny that while it’s been sucked into an ISP conglomerate the old color scheme has been maintained. We had an in-house system for user/account management, written in Perl. It had a web front end so none of the tech support folks had to log in to any of the systems to add, remove or manage users. Usually I’m just writing glue scripts, like a good SA. Regardless, it’s been fun and I’ve been learning a lot about Ruby and rspec.

An SE at my last job (who subscribes to python and I still haven’t convinced that CM will change his live) said going into development would be a natural move as I got bored of SA work. Is it that, or is this a shift in being an SA will mean? Configuration Management is still young, despite cfengine being out for some time now, and puppet getting a good following. It may take time for the old SAs to retire and the new deal to take hold. I think more and more as people work in shops with CM implemented, they’ll start to find how hard it is to live without it once you’ve had it. I noticed recently that Slashdot lacks any coverage on Configuration Management in the last few years, but I realize Slashdot is mostly fluffy news these days. While Slashdot is still talking about SCO every day, there is of course talk of new technologies in the new mediums.

The next few months will be exciting to see people pick up chef. There’s a few very helpful individuals in #chef on freenode who want to see this used and are perfectly willing to fix any bugs you find. So give it a shot and let me know what you think.

configuration management with chef announced

Chef has been announced. Listen to this podcast at Cloud Cafe. There’s no way around comparing puppet and chef. Sure, they’re both configuration management tools. It’s simplest to put it this way:

We’re replacing puppet with chef.

And why? A little while ago I wrote about problems I’ve been having scaling puppet. Off the top of my head, the biggest issues for me working with puppet have been:

  1. Dependencies graphs
  2. Limited capabilities of the language (DSL)
  3. Templates are evaluated on the server

Dependency Graphs

There’s a talk about vertically scaling puppet, but not a lot of it about horizontally scaling. I tend to run everything under puppet. People argue that it’s too much work to put single servers in puppet, and you should only use it for systems you intend to clone. I disagree. Puppet recipe’s are self documenting. The same people who don’t want to take the time to write puppet recipes for the single services are the people you have to beat with a sucker rod to get to document anything. Sometimes if I don’t have the time to put into fully testing a puppet recipe for a new machine, I’ll at least write the recipe as I’m working to server both as documentation and a starting point for if/when I come back to it.

The point is that as I scale out in this fashion, more often puppet will fail with a dependency problem on one run, and be fine on the next.  I asked Luke about this at a BoF at OSCON 2008, and he basically told me that he really only focuses on the problems his paid customers have and was anxious to leave and get a beer. That’s fine, I understand it, but since it does nothing to fix my problem it drove me away from the puppet community.

While in theory having puppet do all this work to resolve depency issues seems fine, it is more complexity and trouble than it is worth. As a systems administrator I know what the dependancies are. As you build a system you simply write your recipe in the same order as the steps you’re taking.

Chef takes this idea and runs with it. Recipes are parsed top to bottom. If a package needs to be installed before a service is started, you simply put the package in the recipe first. This not only makes a lot of sense, it makes depencies in a complex recipe visually understandable. With puppet you can end up with spaghetti code remincisent of “goto”, jumping around a number of recipes in an order that’s difficult to understand.

Language

Before the recent 0.24.6, you could not even do:

if $ram > 1024 {
    $maxclient = 500
}

The support for conditionals was rudimentary. I run into a lot of languages and the biggest problem I have is remembering how to do the same thing in each language. The puppet language does not do what a lot of lot of other languages do. I didn’t need another language to learn, let alone one written from scratch. It was just silly doing something like:

  # iclassify script addes vmware-guest tag based on facter facts
  $is_vmware = tagged('vmware-guest')
  if $is_vmware {
    include vmware
  }

Chef uses ruby for it’s recipes. This makes the above stupidly simple with something like:

include_recipe "vmware" if node[:manufacturer] =~ /VMware/

Templates
Puppet evaluates recipes and templates on the server. I ended up with this block of code once when I need to specify the client node’s IP Address in a configuration file:

require '/srv/icagent/lib/iclassify'
ic = IClassify::Client.new("https://iclassify", iclassify_user, iclassify_password)
query = [ "hostname:", hostname].to_s
mip = nil
nodes = ic.search(query)
nodes.each do |node|
  # node.attribs is an array of hashes. keys is 'name' value is 'values'
  node.attribs.each do |attrib|
    if attrib[:name].match(/ipaddress/)
      ip = attrib[:values].to_s
      if ip.match(/10.0.0./)
        mip = ip
        break
      elsif ip.match(/10.0.1./)
        mip = ip
        break
      end
    end
  end
end

This was so much work. Of course with chef you can easily get this information in the recipe because it’s parsed on the node, let alone the ease of doing it in the template if that’s more appropriate. Since the template’s parsed on the client, you grab the information out of variables or directly from the system.

As time goes on I’ll surely write more about using chef. We’re using it production now, and happy with it. In the interim, come to #chef on freenode if you have any questions.

how big is puppet’s envelope?

More and more I run into problems with puppet’s DSL. Today a coworker came to me with problems with a munin plugin definition we have. Normally if you want to add a munin plugin that isn’t in the standard base, you use our munin_plugin_file definition, which calls the remotefile definition that simplifies copying files via puppet, and also calls the munin_plugin definition which essentially makes the symlink to enable the plugin.

Today we wanted to do this with wildcard plugins, but more than one call to munin_plugin_file would fail, because the remotefile would get defined multiple times and puppet can’t handle that.

err: Could not retrieve catalog: Puppet::Parser::AST::Resource failed with error ArgumentError: Duplicate definition: Remotefile[munin-plugin-slapd_] is already defined in file /etc/puppet/site-modules/munin/manifests/definitions/munin_plugin_file.pp at line 10; cannot redefine at /etc/puppet/site-modules/munin/manifests/definitions/munin_plugin_file.pp:10 on node

The solution is to use puppet’s immature conditionals to test against if the type was already defined and not redfine it.

define munin_plugin_file($plugin_config = “/etc/munin/plugins”, $plugin_dir = “/usr/share/munin/plugins”, $plugin) {

if defined(Remotefile[“munin-plugin-$plugin”]) {
debug (“$munin-plugin-$plugin already defined”)
} else {
remotefile { “munin-plugin-$plugin”:
path => “$plugin_dir/$plugin”,
module => “munin”,
source => “plugins/$plugin”,
owner => root,
group => root,
mode => 755,
require => Package[“munin-node”]
}
}
munin_plugin { $name:
plugin_config => $plugin_config,
plugin_dir => $plugin_dir,
plugin => $plugin,
require => Remotefile[“munin-plugin-$plugin”]
}
}

Note that the debug line is there because puppet conditionals can’t have empty blocks, see bug #1109 (tracker is down now, I’m guessing at that link).

I’m really wondering because I’ve had these sorts of problems twice now today. Normally it’s every once in a while. In shorter form:

Bryan Mclellan [10:59 AM]:
production-sites includes apache::module::php4, which includes the package, and runs apache_module. i wanted the php4-ldap package, which the php4 class installs. so I added an include for php4 in the production-sites.
but php4 also installs the apache2 php4 module, so there was a naming conflict.
so I removed the package from apache::module::php4 and added an include to php4 there, but it simply wouldn’t do the include. perhaps too many levels deep.

You have to put a lot of thought into your design if it’s going to scale. Especially when you put everything in puppet like we do. Someone told me recently that our puppet code base was much larger than most.

~/puppet$ find site-modules/ -name ‘*.pp’ -exec cat ‘{}’ \; | wc
4166   10820  101647
~/puppet$ find site-modules/ -name ‘*.erb’ -exec cat ‘{}’ \; | wc
3565   12773  112231
$ grep -R class site-modules/ | wc
152     578   12264

modules and site-modules have a lot of overlap. As others are picking up puppet, I wonder how long it takes them until they start running into this. Of course, if you avoid nesting definitions, and keep all of your classes separate, you won’t see this. But you’re doing a lot of work too.