Monthly Archives: January 2010

Configuration Management vs Meatcloud: 5 reasons CM wins

First, a bit of social commentary.

Sometimes we refer to the way something ought to be accomplished as the RightWay[tm], sarcastically noting that every best practice contains a certain degree of opinion. As we dedicate more time to doing something our way, we become invested in it being the RightWay, and risk becoming defensive about our choices. Adam Jacob calls this “survivorship-bias,” and I’ve spent some time listening to him think about what he feels the risks are, and considering them myself. When we make significant personal investment in a choice, it becomes a personal challenge to remain impartial about the merits of that choice over time.

I’ve previously said that Configuration Management is the act of programatically configuring your systems. Cloud computing says that building servers is undifferentiated heavy lifting; unless your service is building servers, you should really let someone else do it and focus on the product or service you’re actually trying to sell. Configuration Management is the first step in bringing this same ideology to configuring systems. We are not in the business of selling configured servers any more than we are in the business of providing coffee to our employees, we happen to do both. We build our systems to enable our business to conduct business. In my case, operations is enabling our customers to use the web product that we develop.

We are all members of the Configuration Management community, because we believe it is ultimately a better process for building systems. We naturally have different ideas about how that process should execute, which among other differentiating factors is often that “goals are different but are left unstated” in the community. The level of preference here and resulting fragmentation is not any different than holding an opinion over what open source operating system one should use for a specific task. The merits of our choices are worth discussing, but the implication that tools and libraries should all come to the same conclusions about design is like implying that the world only needs one type of hammer.

So, defining the meatcloud as the established notion that having your internet presence grow forms a direct relationship with hiring more people to rack servers, install software, and update configuration files; I asked around a little, why do we think Configuration Management is better?

  • Abstractation
  • You don’t need to be a mechanic to drive a car, should you need to be a subject matter expert on Apache to run a webserver? Infrastructure as code shows us how and the resulting communities are starting to implement this. When we spend less time getting the parts working, we can spend more time innovating better solutions with the parts.

  • Self-documenting
  • Ever rebuild a server that someone built long ago and is no longer with the organization, and find many small parts necessary to make it work that nobody bothered to write down? Placing those small changes and required files in configuration management ensures that even if the code doesn’t run flawlessly on an upgraded operating system, you know everything that went in to making it work. Since you’re configuring the system through configuration management, a lot less falls through the cracks because documentation is often an afterthought to getting the system working.

  • Consistency and Repeatability
  • What is on that system? Everything you told CM to put there. ‘Golden image‘ disk images often leave you in the aforementioned situation where you don’t know how to recreate them, but often you don’t know what else ended up there. Configuration Management allows you to build many copies of the same system easily, from scratch every time.

  • Agility
  • Did sales tell you they had fifty customers and it turned out to be five hundred? How long will it take you to build the extra servers by hand? How many extra people do you have to put into the meatcloud to get that done in time? Business requirements always change, and slow moving businesses are at a disadvantage to dealing with this. The inability to build and deploy servers fast enough should never be a reason your business fails.

  • Flexibility, or Don’t Repeat Yourself
  • Three applications on one server? Or one application on three servers? Apache doing different jobs on multiple servers? Moving applications between servers and leveraging existing infrastructure code for new projects is easy. We automate tasks that are repeatable, but often scripts are written to accomplish one repeatable task. Here we say, how can we treat configuration as sets of modular tasks that we can mix and match?

Moar unofficial chef 0.7.16.2

Well a couple people used the 0.7.16wt1 release, including Thom May who had issues with that version number. Consequently I took out the ‘wt’ from the version, but this is still an unofficial, unsupported release.

I pulled in the changes on Thom’s internal branch and grabbed some more low hanging fruit. Don’t ask what methodology I used. It’s magic.

Grab the github branch, or the debs in my launchpad ppa. I’ve only tested this on Ubuntu Karmic, as it goes.

** Bug
* [CHEF-454] – Centos4 yum provider failure
* [CHEF-616] – rake install in chef-repo breaks if there is no git origin
* [CHEF-637] – duplicate copies of FileEdit: file_edit.rb & fileedit.rb
* [CHEF-639] – git resource fails on subsequent checkouts of the same repostiry
* [CHEF-642] – Services will always issue a WARN when status is not present
* [CHEF-684] – Should be possible for roles to be created without anything in the run_list
* [CHEF-704] – Ruby block device does not have a default action
* [CHEF-722] – Link provider doesn’t understand paths requiring expansion
* [CHEF-843] – FileEdit permission issues

** Improvement
* [CHEF-598] – Upstart service provider
* [CHEF-617] – Install to chef repository on a remote machine
* [CHEF-709] – Support for backup up files in another directory than the original file
* [CHEF-752] – No way to pass parameters to useradd provider

Got recursion not available and Cisco SSL VPN

I’ve periodically been having DNS lookup issues with internal domains and isolated them to remote SSL VPN clients connecting to a Cisco ASA 5520 using the Anyconnect SSL VPN client. I eventually got frustrated and troubleshooted the issue by using the command line ‘vpn’ client to initiate a connection on a remote Ubuntu Linux machine while here in the office. nslookup would produce the error “Got recursion not available from x.x.x.x, trying next server” and dig would respond with “status: REFUSED” and “;; WARNING: recursion requested but not available”. I noticed traffic was not making it to the Windows Server 2008 DNS server by watching wireshark and enabling DNS debugging.

Having been acquired six months ago our list of internal domains increased quite a bit. I found the ‘split-dns’ setting in the default group access policy set to the old list of internal domains and set this to ‘split-dns none’. This resolved the issue. Apparently the client was comparing the query to its list of split-dns domains, and the match was failing so it was sending the resolver (operating system) an error message so it would go through the list of DNS servers until it tried the local server. Rather than trying to make a list of all the possible domain names in the company, I’m going to leave this off since the internal DNS servers have recursion enabled and can handle DNS lookups just fine for the remote clients.