Archive for the 'Uncategorized' Category

Page 2 of 22

virt-manager keymaps on OS X

I’m not crazy about the lack of a definitive package manager for OS X. I tried for about a day to work with Open Source on OS X, then I built an Ubuntu VM. I’ve been using ssh with X forwarding when I need a graphical interface; OS X has reasonable good built in support for X11. However, others have found that the keymap and meta keys are broken. While I got a kick out of “After some time I discovered that the number 8 is interpreted as Return,” I did need to log in to a guest to do some debugging.

The accepted solution to making Ctrl+Alt release keyboard focus correctly in the vncviewer spawned by virt-manager is to create a .Xmodmap file in your home directory with this content:

clear Mod1
keycode 66 = Alt_L
keycode 69 = Alt_R
add Mod1 = Alt_L
add Mod1 = Alt_R

I killed the X server by focusing on it and choosing quit, and it seemed to be read the .Xmodmap file okay without my needing to restart the entire system.

The workaround for the broken keymap pointed me in the right direction, but I wasn’t happy with the solution. A little digging around the libvirt domain xml reference pointed out that you can add a keymap as an attribute to the vnc element in the domain xml definition. Use ‘virsh edit’ to edit the domain XML and modify the vnc line to add this attribute so it looks like so:

<graphics type='vnc' port='5900' autoport='yes' listen='127.0.0.1' keymap='en-us'/>

I destroyed the guest and restarted it and the keyboard worked now without any “8 is now enter” trickery. I’m pretty sure you can choose any keymap from /usr/share/qemu/keymaps. If you use vmbuilder you will want to add this to /etc/vmbuilder/libvirt/libvirtxml.tmpl as well.

Motorola Backflip charging

Nightmare.

Chargers:

AC1) Motorola DC4050US0301 5.1V DC 850MA
AC2) AT&T 03577 5.0V 1000ma
DC1) AT&T USB VPC03578
DC2) AT&T USB + MiniUSB MV302927

Cables:

M1) Motorola SKN6378A
M2) “Motorola” SKN6238A
M3) Monoprice generic microusb

Dead Phone, AC1, M1 OR M2 OR M3
Green light on Phone, OS starts, displays charging battery

Dead Phone, AC2, M1 OR M2 OR M3
Blue light on AC2, Green light on Phone, OS starts
Green light / OS cycle every 15 seconds

Dead Phone, DC1, M1 OR M2 OR M3
White light on DC1, Green light on Phone, OS starts
Green light / OS cycle every 15 seconds

Dead Phone, DC2, M1 OR M2 OR M3
White light on DC2, Green light alternates on/off on Phone

Phone on, AC1, M1 OR M2 OR M3
Green light on, charge symbol in battery on display

Phone on, AC2, M1
Blue light on AC2 for five seconds

Phone on, AC2, M2 OR M3
Blue light on AC2
Green light on, no charge symbol in battery on display

I have an AT&T AC charger at work that I believe works as well as the stock Motorola. The AT&T AC charger here at home, listed above, is a “five star” model that consumes 0W when not charging, I assume that is what the blue light turning off indicates. Hopefully the combinations that keep the green light on the phone on are charging, just very slowly, and are still somewhat useful. More to come.

Munin Aggregation with Multigraph

Six months ago I made note to the pattern for referring to stacked graph data sources in munin:

load.double.stack one=localhost.localdomain:load.load two=localhost.localdomain:load.load

This syntax evaluates to:
graph.value.stack line=host.domain:plugin.value

I’ve been using multigraph more since then, which is a boon to performance, but it complicates stacked graphs a little. This hurts because it remains very difficult to tell why your graphs are not drawing when you incorrectly reference a data source. To debug, as the munin user (use ’su -l munin’, ’sudo -s -u munin’ or ‘chpst -u munin’) run:
/usr/share/munin/munin-graph --service 'load.double.stack' --debug
Be sure to replace “load.double.stack” with the name of the graph you’re trying to draw.

The munin wiki example for stacked graphs explains data source names as:

snmp_ups_current.inputtotal.sum \
---------------- ---------- ---
        |             |      |
        |             |      `-- The sum mechanism
        |             `--------- One of this virtual plugin's values
        `----------------------- The name of the virtual plugin

ups-5a:snmp_ups_ups-5a_current.inputcurrent \
ups-5b:snmp_ups_ups-5b_current.inputcurrent
------ ----------------------- ------------
   |               |                 |
   |               |                 `------ The "inputcurrent" value from the real plugin
   |               `------------------------ The real plugin's name (symlink)
   `---------------------------------------- The host name from which to seek information

However, with multigraph the name of the plugins symlink isn’t necessarily the name of the graph. The trick I found was to connect the the munin node and call the multigraph plugin, looking for the ‘multigraph’ line.

$ nc localhost 4949
# munin node at server.example.org
cap multigraph # tell munin-node that you are multigraph aware
cap multigraph
fetch diskstats # fetch the diskstats multigraph plugin
multigraph diskstats_latency
sdb_avgwait.value 0
multigraph diskstats_latency.sdb
avgwait.value 0
.

I’ve removed a significant portion of the returned data here. Pay attention to the fact that this plugin returned a “diskstats_latency” graph that contains data for all of the disks, as well as individual graphs for each disk, here “diskstats_latency.sdb” In this example your stacked graph configuration would be:

disk.double.stack \
  one=localhost.localdomain:diskstats_latency.sdb.avgwait \
  two=localhost.localdomain:diskstats_latency.sdb.avgwait
  -1- ----------2---------- -----------3--------- ---4---

(1) The alias and label for this host or data point
(2) The configured node name of the host
(3) The original graphs name, either the plugin or multigraph name
(4) The value from the plugin/graph

Notice that while the period is used to separate the value from the rest of the field, there may be periods in the rest of the field. Also keep in mind that in the past I have seen dashes in configured graph names become underscores at the end of the day.

Silent Ruby install on Windows

I dug up unattended ruby install directions while working on Chef installation directions for windows. Most of the secrets can be found in the RubyInstaller discussion group, such as here and here.

Grab the RubyInstaller for windows, then run: rubyinstaller-1.8.7-p302.exe /tasks="assocfiles,modpath" /silent. The tasks option checks (enables) the options to associated .rb files with ruby and adding the ruby binary directory to the path. You probably wouldn’t want these if you were installing multiple versions of ruby.

Dependant Paradigms

The Systems Administrator is likely the closet technological trade to skilled manual labor. They troubleshoot complex systems that others take for granted, until they fail, with a deceptive ease. Explaining to another how they had a hunch to look at a certain part of the system is either a retrospective tale of why it made sense, or a sarcastic nod to magic. This tale attempts to work out how one could have deduced the solution, but even if someone assembled a collection of symptoms and solutions into a step-by-step guide, they would not be able to replace the role of a Systems Administrator. Like an automotive mechanic can detect a blown head gasket from the smell of the oil, a Systems Administrator can sense infrastructure issues from how systems are behaving. And like a fondness for a make of automobile, we grow attached to Linux distributions that have treated us well and editors whose dark secrets we can manipulate skillfully.

I once had a student who didn’t understand why we couldn’t repair board-level hardware issues ourselves as easily as replacing a stick of memory, as their uncle was capable of repairing any engine problems by opening up the hood and quite literally “jiggling some wires.” A mystic knowledge exists in both worlds that is challenging to articulate to the layman. It can be difficult enough to explain a single component, but when a part of a system falls over and causes cascading failures in other parts of a system, outsiders are tempted to believe that they’ve just learned a truth about the solution. That is, that when certain symptoms occur, it is always caused by the failure of a particular part and that this part should be restarted to ’solve’ the problem. Yet, the experienced know that this only resolves the symptoms and the still problem lurks, now with fewer hints as to its origin.

The future is already here – it is just unevenly distributed. — William Gibson

The trouble with paradigm shifts is that they aren’t necessarily direct improvements on existing technology with a clear lineage. Critics ask why the new ways are better than that which they replace, and we struggle to draw the path that led us to this new place of understanding. The struggle is because instead of making a choice at a clear intersection of a path, we stepped through the bushes to another path not as obviously traveled. This alternate path may lead us to the same end, but its course has an entirely different shape.

To further exacerbate the problem, new innovations stand on the shoulders of giants. Some people have been convinced of the merits of leveraging cloud computing on a strictly financial basis, and have missed the tenants of Undifferentiated Heavy Lifting (UHL), where running servers and building networks may not be ones core business and ultimately a distraction. Some have yet to grasp the concept of treating systems, even built on internal hardware, as disposable, still accustomed to legacy processes of maintaining a system for the lifetime of the hardware.

It is essential to realize that these new technologies are not minor improvements to business as usual. Like the birth of globalization changing business around the world, nursed by the multi-modal shipping container’s head fake as just another way of moving cargo, todays innovations will surely reshape the face of operations permanently, in substantial and non-incremental ways.

Amazon ELB requires CRLF for HTTP Requests

Here’s an interesting bit I stumbled upon while playing with Amazon Web Services (AWS) Elastic Load Balancing (ELB): HTTP requests must have their lines terminated with CRLF and not just a line feed. When using netcat to test a web server by speaking HTTP, it only sends LFs by default (\n). While RFC 2616 specifies:

… a bare CR or LF MUST NOT be substituted for CRLF within any of the HTTP control structures …

Using netcat to connect to a web server typically works just fine. I’m inputting the HTTP requests by hand and [ENTER] is where I hit the enter key.

$ nc www.google.com 80
GET / HTTP/1.0[ENTER]
[ENTER]
HTTP/1.0 200 OK
Date: Fri, 09 Apr 2010 20:07:25 GMT
Expires: -1
[snip]

This works against Apache. However when connecting to an Apache server through ELB, one must run netcat with the -C option to send a CRLF instead of a lone LF upon return.

$ nc -C elb.example.org 80
GET / HTTP/1.0[ENTER]
[ENTER]
HTTP/1.1 302 Found
Content-Type: text/html; charset=iso-8859-1
Date: Fri, 09 Apr 2010 20:09:39 GMT
Location: http://elb.example.org/404/
Server: Apache
Vary: Accept-Encoding
Content-Length: 290
Connection: Close

Sans the -C option, the connection simply hangs. Which asks the question, what is Amazon doing with your HTTP traffic in between?

an evening with Munin graph aggregation

Trending?

I’m often a bit surprised by the lack of substance about trending that leaks out on the Internet. I mean, everybody is doing it. Right? Munin is a great introduction to trending due to its simplicity in getting started and the wealth of plugins.

I’m a believer of collecting as much data as possible and sorting it out later. Without data, you can only speculate wildly at what it might have said. So will others, so it’s nice having a response; often they won’t. I don’t need to be looking at the disk latency or available entropy for dozens of servers every day, but the time saved by being able to look at these graphs when something occurs and make correlations between trends is revolutionary to how you will spend your day. When having too much data can feel overwhelming, it’s time to post-process it into something more bite-size.

Still, I run operations for a web product and there is data I do want to see every day, both to monitor the health of the product and plan capacity for upcoming growth. Aggregating data for multiple systems and creating a sort of executive trending console helps accomplish this.

Getting Started

The best way to get familiar with munin is to install it on a debian or ubuntu workstation. Installing the ‘munin’ (server) and ‘munin-node’ (client) packages will be enough to generate some graphs about your local machine. Go ahead and run:

sudo su munin -s /bin/bash -c 'time /usr/bin/munin-cron'

Then point your browser at file:///var/cache/munin/www/index.html.

Aggregates

Aggregate graphs are created by munin-graph from existing data in the RRDs collected by munin-update. There are two types of aggregates: sum and stack. Sum will show you the total of multiple data sets. The Munin wiki uses the aggregate current between two UPS’s as an example. Sum is most useful when the data sets are relatively meaningless individually. For instance if you wanted to know the total current CPU usage in a 50-node cluster, each node is not particularly interesting alone, but the sum would be. Stack provides the data sets visually stacked on a single graph. The Munin wiki uses the total entropy between two systems as their example, which isn’t particularly interesting. I’ll use some similarly uninteresting examples, but later I’ll show one that produces a stack comparing data in multiple datacenters.

Lets look at a simple example /etc/munin/munin.conf file with an aggregate graph similar to what is in the munin.conf man page:

[localhost.localdomain]
address 127.0.0.1
use_node_name yes

[localdomain;Totals]
update no

load.graph_title 2xload
load.double.stack one=localhost.localdomain:load.load two=localhost.localdomain:load.load

This will create a graph that shows the local systems load twice in a graph by stacking the same value.

Munin separates hosts by domain in more ways than just the html index that munin-html puts out. By default hosts are put into a “group” by their domain name. If an aggregate graph attempts to reference data values from a host in another group, munin may not find it and fail to clearly notify as to why. You can manually place a node in a group as we do above where we put the virtual host “Totals” in the “localdomain” group by entitling the section “[localdomain;Totals]” on line 5. Your groups can be called anything, they don’t have to be a domain name.

The “update no” directive on line 6 tells munin-update to skip this section, or hos since these graphs are created entirely from data collected from other hosts. Please note that you typically still need to run munin-update before munin-graph to get configuration changes to aggregate graphs to appear in the graph. Munin appears to bailout on drawing a graph if it sees no new data for that graph pretty early in the process.

Typically failures in this area of configuration result in a new graph not being created but munin-graph appearing to run successfully otherwise. Note that graph_title is required. If you see an error that looks like:

2010/04/08 18:43:46 [RRD ERROR] Unable to graph /var/cache/munin/www/localdomain/Totals/load-year.png : opening '': No such file or directory

This is because munin was unable to find a data set, or specifically the RRD file, based on the value you specified. Both of the following lines cause this error and the graph to not be drawn:

load.double.stack one=localhost.localdomain:load.load two=localhost.localdomainX:load.load
load.double.stack one=localhost.localdomain:load.load two=localhost.localdomain:load.loadX

This syntax evaluates to:
graph.value.stack line=host.domain:plugin.value

Line, also called alias, ends up being the label for that line. Often dashes are inconsistently converted to underscores in Munin. I have a working plugin called ‘foo_js-3_0′, which I have to specify as ‘foo_js_3_0′ in the above syntax.

[localhost.localdomain]
    address 127.0.0.1
    use_node_name yes

[localdomain;Totals]
  update no

  load.graph_title 2xload
  load.double.sum localhost.localdomain:load.load localhost.localdomain:load.load
  load.double.label Double the load

Here is the same example but displayed as a sum. Note that we’ve added ‘load.double.label’, and this is required. This replaces the ‘alias’ or ‘line’ value we were just discussing in stacked graphs, which you will notice is no longer in the configuration line for ‘fieldname.sum’ on line 9.

Making it useful

Here is a proof of concept configuration that I made that counts some javascript calls in different datacenters

# Aggregrates
[example.org;OTS]
  update no
  contacts no

  js-3_0.update no
  js-3_0.graph_category example
  js-3_0.graph_title CAPI3 OTS Calls
  js-3_0.graph_total Total calls per minute
  js-3_0.graph_scale no
  js-3_0.graph_period minute
  js-3_0.graph_args --base 1000 -l 0
  js-3_0.graph_order iad irl las
  js-3_0.total.graph no
    js-3_0.iad.label IAD calls per minute
    js-3_0.iad.sum \
      iadots02.example.org:example_js_3_0.calls \
      iadots01.example.org:example_js_3_0.calls   

    js-3_0.irl.label IRL calls per minute
    js-3_0.irl.sum \
      irlots02.example.org:example_js_3_0.calls \
      irlots01.example.org:example_js_3_0.calls   

    js-3_0.las.label LAS calls per minute
    js-3_0.las.sum \
      lasots02.example.org:example_js_3_0.calls \
      lasots03.example.org:example_js_3_0.calls \
      lasots06.example.org:example_js_3_0.calls \
      lasots04.example.org:example_js_3_0.calls \
      lasots05.example.org:example_js_3_0.calls \
      lasots01.example.org:example_js_3_0.calls

This creates the below graph. The jagged lines at the left edge are from missing data values while I was working out some of the issues I describe in this post. There are a couple new directives in this configuration. The ‘contacts’ directive on line 4 specifies that if we had munin configured for monitoring (as opposed to trending) we don’t want it to provide any notification based on the graph values for this virtual host. This is the job of munin-limits. The ‘graph_category’ directive allows us to put this graph in a category that we specify, otherwise Munin puts it in ‘other’. This is particularly useful if you have different types of aggregate graphs data such as CPU and Apache related data on the same virtual host. The ‘graph_total’ directive on line 9 isn’t that well documented but provides a simple way to add the black total line you see in the graph and is therefore quite useful. Lines 10-12 control how the graph is drawn and are outside the scope of this post. The ‘graph_order‘ directive seems to give us the ability to control the order in which the fields are drawn on the graph, but is documented as a method to control the order in which the graphs are drawn to specify complex data dependencies.
JS3 Calls Per Day

Configuration Management!

For fun, here is the Chef template that created this, which allows additional nodes be added automatically, but is still ultimately incomplete.

[example.org;OTS]
  update no
  contacts no

  <% wop_datacenters = [ "iad", "irl", "las" ] -%>

  js-3_0.update no
  js-3_0.graph_category example
  js-3_0.graph_title CAPI3 OTS Calls
  js-3_0.graph_total Total calls per minute
  js-3_0.graph_scale no
  js-3_0.graph_period minute
  js-3_0.graph_args --base 1000 -l 0
  js-3_0.graph_order <%= wop_datacenters.join(" ") %>
  js-3_0.total.graph no
  <% wop_datacenters.each do |dc| -%>
    js-3_0.<%= dc %>.label <%= dc.upcase %> calls per minute
    js-3_0.<%= dc %>.sum \
    <% dc_servers = @ots_servers.select { |host| host['hostname'] =~ Regexp.new(dc) }.select { |host| host['hostname'] !~ /pp/ } -%>
    <% dc_servers.each_with_index do |host, index| -%>
      <%= host['fqdn'] %>:example_js_3_0.calls <%= '\\' unless dc_servers.length - 1 == index %>
    <% end -%>

  <% end -%>

When it does not work

Debugging munin can be really tough. I keep stopping myself from breaking into explanation of munin’s process, but something as innocent as as an omitted ‘graph_title’ can cause munin to all but silently fail at producing a graph for you. Normally munin runs every give minutes via cron, usually via the ‘munin-cron’ wrapper, but you can run the parts individually to look for issues. These tools create a lockfile when they run so they won’t interfere with the regular process if it is started by cron.

user@localhost:~$ sudo su - munin -s /bin/bash
munin@localhost:~$ /usr/share/munin/munin-update --debug --nofork
munin@localhost:~$ /usr/share/munin/munin-graph --debug --nofork --nolazy
munin@localhost:~$ /usr/share/munin/munin-html --debug

In larger infrastructures, you can limit munin-update and munin-graph to specific host and service combinations while testing. Be wary that these sometimes will appear more successful than they are:

munin@localhost:~$ /usr/share/munin/munin-update --debug --nofork --host nonexistent --service nonexistent
2010/04/08 17:13:23 [DEBUG] Creating new lock file /tmp/munin-update.lock
2010/04/08 17:13:23 [DEBUG] Creating lock : /tmp/munin-update.lock succeeded
2010/04/08 17:13:23 [INFO]: Starting munin-update
2010/04/08 17:13:23 [DEBUG] Creating new lock file /tmp/munin-datafile.lock
2010/04/08 17:13:23 [DEBUG] Creating lock : /tmp/munin-datafile.lock succeeded
2010/04/08 17:13:23 [INFO]: Munin-update finished (0.00 sec)

Setting a permanent Windows Hostname on EC2

In less than obvious fashion, Amazon EC2 Windows Server AMIs reset their hostname on startup due to the Ec2ConfigService. To disable this feature, select ‘EC2ConfigService Settings’ from the start menu, and uncheck the first checkbox under ‘Set Computer Name’

Configuration Management vs Meatcloud: 5 reasons CM wins

First, a bit of social commentary.

Sometimes we refer to the way something ought to be accomplished as the RightWay[tm], sarcastically noting that every best practice contains a certain degree of opinion. As we dedicate more time to doing something our way, we become invested in it being the RightWay, and risk becoming defensive about our choices. Adam Jacob calls this “survivorship-bias,” and I’ve spent some time listening to him think about what he feels the risks are, and considering them myself. When we make significant personal investment in a choice, it becomes a personal challenge to remain impartial about the merits of that choice over time.

I’ve previously said that Configuration Management is the act of programatically configuring your systems. Cloud computing says that building servers is undifferentiated heavy lifting; unless your service is building servers, you should really let someone else do it and focus on the product or service you’re actually trying to sell. Configuration Management is the first step in bringing this same ideology to configuring systems. We are not in the business of selling configured servers any more than we are in the business of providing coffee to our employees, we happen to do both. We build our systems to enable our business to conduct business. In my case, operations is enabling our customers to use the web product that we develop.

We are all members of the Configuration Management community, because we believe it is ultimately a better process for building systems. We naturally have different ideas about how that process should execute, which among other differentiating factors is often that “goals are different but are left unstated” in the community. The level of preference here and resulting fragmentation is not any different than holding an opinion over what open source operating system one should use for a specific task. The merits of our choices are worth discussing, but the implication that tools and libraries should all come to the same conclusions about design is like implying that the world only needs one type of hammer.

So, defining the meatcloud as the established notion that having your internet presence grow forms a direct relationship with hiring more people to rack servers, install software, and update configuration files; I asked around a little, why do we think Configuration Management is better?

  • Abstractation
  • You don’t need to be a mechanic to drive a car, should you need to be a subject matter expert on Apache to run a webserver? Infrastructure as code shows us how and the resulting communities are starting to implement this. When we spend less time getting the parts working, we can spend more time innovating better solutions with the parts.

  • Self-documenting
  • Ever rebuild a server that someone built long ago and is no longer with the organization, and find many small parts necessary to make it work that nobody bothered to write down? Placing those small changes and required files in configuration management ensures that even if the code doesn’t run flawlessly on an upgraded operating system, you know everything that went in to making it work. Since you’re configuring the system through configuration management, a lot less falls through the cracks because documentation is often an afterthought to getting the system working.

  • Consistency and Repeatability
  • What is on that system? Everything you told CM to put there. ‘Golden image‘ disk images often leave you in the aforementioned situation where you don’t know how to recreate them, but often you don’t know what else ended up there. Configuration Management allows you to build many copies of the same system easily, from scratch every time.

  • Agility
  • Did sales tell you they had fifty customers and it turned out to be five hundred? How long will it take you to build the extra servers by hand? How many extra people do you have to put into the meatcloud to get that done in time? Business requirements always change, and slow moving businesses are at a disadvantage to dealing with this. The inability to build and deploy servers fast enough should never be a reason your business fails.

  • Flexibility, or Don’t Repeat Yourself
  • Three applications on one server? Or one application on three servers? Apache doing different jobs on multiple servers? Moving applications between servers and leveraging existing infrastructure code for new projects is easy. We automate tasks that are repeatable, but often scripts are written to accomplish one repeatable task. Here we say, how can we treat configuration as sets of modular tasks that we can mix and match?

Got recursion not available and Cisco SSL VPN

I’ve periodically been having DNS lookup issues with internal domains and isolated them to remote SSL VPN clients connecting to a Cisco ASA 5520 using the Anyconnect SSL VPN client. I eventually got frustrated and troubleshooted the issue by using the command line ‘vpn’ client to initiate a connection on a remote Ubuntu Linux machine while here in the office. nslookup would produce the error “Got recursion not available from x.x.x.x, trying next server” and dig would respond with “status: REFUSED” and “;; WARNING: recursion requested but not available”. I noticed traffic was not making it to the Windows Server 2008 DNS server by watching wireshark and enabling DNS debugging.

Having been acquired six months ago our list of internal domains increased quite a bit. I found the ’split-dns’ setting in the default group access policy set to the old list of internal domains and set this to ’split-dns none’. This resolved the issue. Apparently the client was comparing the query to its list of split-dns domains, and the match was failing so it was sending the resolver (operating system) an error message so it would go through the list of DNS servers until it tried the local server. Rather than trying to make a list of all the possible domain names in the company, I’m going to leave this off since the internal DNS servers have recursion enabled and can handle DNS lookups just fine for the remote clients.

Error 80070005 accessing SSIS/DTS on SQL 2008 and Server 2008

Retrieving the COM class factory for component with CLSID {BA785E28-3D7B-47AE-A4F9-4784F61B598A} failed due to the following error: 80070005. (Microsoft.SqlServer.ManagedDTS)

Trying to access SSIS (DTS) on Microsoft SQL 2008 with SSMS (SQL Server Management Studio) on Microsoft Windows Server 2008 gave the above error. Trying to create a maintenance plan provided the same error, since it uses SSIS. There were indications online that I should try running SSMS with elevated permissions using the ‘Run as administrator’ option on the context (right-click) menu, however that provided a “The parameter is incorrect” error on startup. Eventually I discovered that the disk that the SQL tools were installed on did not have default (R+X) permissions to the local users group. Once I added this group, I was able to connect to SSIS and create a maintenance plan without issue.

Scripting the root password on Ubuntu 9.10 (karmic)

On Ubuntu 9.04 (jaunty) I had been generating and setting the root password in a bootstrapping script using:

# Generated MD5 encrypted password
/usr/bin/openssl passwd -1
# Set the password
/bin/echo 'root:ENCRYPTED_PASSWORD' | /usr/sbin/chpasswd -e

With shadow 4.1.4, chpasswd now uses PAM, and has dropped the -e option used above, as well as the -c option that I’d used to generate sha512 encrypted passwords. You’ll want to use mkpasswd from the whois package (yeah, weird) for that now, such as:

mkpasswd -m sha-512 -s

The password can be presented to useradd / usermod in encrypted format, such as:

/usr/sbin/useradd -m -p 'ENCRYPTED_PASSWORD' -G admin -s /bin/bash toor

Installing Chef 0.8 alpha on Ubuntu Karmic

Theres push to get Chef 0.8 out the door because we’re all anxious for its awesome list of features and fixes, so we’re all hunkering down on fixing bugs. Scott Likens has similar notes and theres some more to be found in Dan Deleo’s 08boot bootstrap recipe. This should help get you going.

On a fresh Ubuntu Karmic install (a VM makes this easy of course):
# Add the Canonical Ubuntu ‘multiverse’ repository for Java.
sudo vi /etc/apt/sources.list # add multiverse to your ‘deb’ lines if it is not there
sudo apt-get update

# Start Chef Gem bootstrap with some notes
# note that I don’t like to install rubygems from source and use the packages instead. this adds a step or two.
sudo apt-get install ruby ruby1.8-dev libopenssl-ruby1.8 rdoc ri irb build-essential wget ssl-cert rubygems git-core -y
sudo gem sources -a http://gems.opscode.com
sudo gem sources -a http://gemcutter.org # for nanite
sudo gem install ohai chef json –no-ri –no-rdoc

We now have enough chef to bootstrap ourselves
# Create ~/chef.json:

{
  "bootstrap": {
    "chef": {
      "url_type": "http",
      "init_style": "runit",
      "path": "/srv/chef",
      "serve_path": "/srv/chef",
      "server_fqdn": "localhost"
    }
  },
  "recipes": "bootstrap::server"
}
# End of file

# Create ~/solo.rb:

file_cache_path "/tmp/chef-solo"
cookbook_path "/tmp/chef-solo/cookbooks"
# End of file

mkdir /tmp/chef-solo
cd /tmp/chef-solo
# Get kallistec’s 08boot bootstrap cookbook
git clone git://github.com/danielsdeleo/cookbooks.git
cd cookbooks
git checkout 08boot
# Bootstrap chef
sudo /var/lib/gems/1.8/bin/chef-solo -j ~/chef.json -c ~/solo.rb
# If the bootstrap hangs for more than a minute after “Installing package[couchdb] version 0.10.0-0ubuntu3″ then hit ctrl+c and run again

Now prepare to install the development versions
# install some development tools
sudo apt-get install rake librspec-ruby -y
sudo gem install cucumber merb-core nanite jeweler uuidtools
# install missing dependencies
sudo apt-get install libxml-ruby thin -y
# get chef from the repository
mkdir ~/src
cd ~/src
git clone git://github.com/opscode/chef.git
cd chef
rake install
# remove the old version of chef
sudo gem uninstall chef -v0.7.14
# patch up some runit paths
sudo sed -i s_chef-_/var/lib/gems/1.8/gems/chef-solr-0.8.0/bin/chef-_ /etc/sv/chef-solr*/run
# allow access to futon for development purposes (http://IPADDRESS:5984/_utils)
sudo sed -i ’s/;bind_address = 127.0.0.1/bind_address = 0.0.0.0/’ /etc/couchdb/local.ini
sudo apt-get install psmisc # for killall
sudo /etc/init.d/couchdb stop
sudo killall -15 couchdb # stubborn
sudo killall -15 beam.smp # yup
# shut it all down
sudo /etc/init.d/chef-solr stop
sudo /etc/init.d/chef-solr-indexer stop
sudo /etc/init.d/chef-solr-client stop
sudo /etc/init.d/chef-client stop
sudo /etc/init.d/chef-server stop
sudo killall -15 chef-server

Build some data and start up Chef
# start up the integration environment
cd ~/src/chef
sudo rake dev:features
# this will create a database
# now hit ctrl+c
sudo mv /var/lib/couchdb/0.10.0/chef_integration.couch /var/lib/couchdb/0.10.0/chef.couch
sudo chown couchdb:couchdb /var/lib/couchdb/0.10.0/chef.couch
# start it all up
sudo /etc/init.d/couchdb start
sudo /etc/init.d/rabbitmq-server start
sudo /etc/init.d/chef-solr start
sudo /etc/init.d/chef-solr-indexer start
sudo /etc/init.d/chef-server start

Start the web server
# the web server is now a separate application and uses the API to reach the server
sudo cp /tmp/chef_integration/webui.pem /etc/chef
cd ~/src/chef/chef-server-webui
sudo /var/lib/gems/1.8/bin/slice -p 4002

Using knife
From the user interface you can create a client keypair to use knife from the web interface. I recommend using ‘view source’ to copy the private key, and remember to save it without any leading whitespace and run knife like so:

OPSCODE_USER=’btm’ OPSCODE_KEY=’/home/btm/btm.key’ /var/lib/gems/1.8/bin/knife

If you can’t get it to work, you can always use the webui’s key:

sudo OPSCODE_USER=’chef-webui’ OPSCODE_KEY=’/etc/chef/webui.pem’ /var/lib/gems/1.8/bin/knife

Hopefully that is enough to get you going. Jump on #chef on irc.freenode.net or join the chef list if you have any problems. Tickets/bugs/features are tracked in JIRA, and all sorts of other useful information is in the wiki.

HP SMH On Ubuntu Karmic

I recently had to install HP’s System Management Homepage (SMH) on Ubuntu Karmic (9.10) on hardware I had never touched for Hosted Operations to monitor. The hardware wasn’t my choice, but I’m indifferent to it. The operating system is my choice. Apparently they support Debian Lenny (5.0) and Ubuntu Jaunty (9.04), but ours was too new. However, while I commend them for building debs, they’re a little sketchy and broken. Granted, I wasn’t deploying to a supported release, but nonetheless. Here’s a link to download options for the DL360 G6, that may never work because the HP site isn’t meant to be linked to.

Downloading the provided Ubuntu Jaunty iso and mounting it produced a standard debian repository tree for both lenny and jaunty.
sudo mount -o loop HP_ProLiant_Value_Add_Software-8.25-19-12.iso /mnt

I added these packages to our local respository, but you can copy them to every server and install them by hand using ‘dpkg -i DEB’ instead of ‘apt-get install PACKAGE’. You’ll end up installing all of them really. The HP SMH package is mostly an apache fork and a ton of included/vendored libraries.

You’ll log in to HP SMH on port 2381 over HTTPS. As usual, if you get a data stream, you are likely connecting over HTTP by accident. By default a user must be in the ‘root’ group. You can use ‘vigr’ to add another user to the root group as you usually don’t have a root user on Ubuntu. You can edit ‘/opt/hp/hpsmh/conf/smhpd.xml’ and put another group in the ‘admin-group’ element. I put ‘domain-admins’ there because we use likewise to authenticate against a windows domain. I couldn’t figure out how to have groups added via the web interface to save, but that really was a hoop anyway since I wanted to push the configuration files out via configuration management.

I don’t know if HP SMH reads snmpd.conf to figure out how to connect back to snmp locally, but I had to initially run ‘/sbin/hpsnmpconfig’ to generate a few wizardy lines in /etc/snmp/snmpd.conf’. I later pushed this out via configuration management, but if you check that script it does create an “answer file” which looked like a bunch of variables you could export before you run the script non-interactively.

HP SMH gets its information from the HP SNMP agents, so if you log in and don’t see any data, it can not contact the SNMP source. You should see a page like this. Because so many libraries are shipped in the debs rather than being required, libraries are the most common source of issues. I had to restart ‘hp-snmp-agents’ after installation, getting this error on the initial startup in ‘/var/log/hp-snmp-agents/cma.log’:

libcmacommon.so.1: cannot open shared object file: No such file or directory

Another way to say all of this is via my chef recipe:

#
# Cookbook Name:: hpsmh
# Recipe:: default
#
# Copyright 2009, Webtrends
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# Restart hp-snmp-agents later. it is buggy and has issues with its own libraries when started on package installation
service "hp-snmp-agents" do
  action :nothing
end

package "hp-health"
package "hpacucli"
package "cpqacuxe"
package "hp-snmp-agents" do
  notifies :restart, resources(:service => "hp-snmp-agents")
end
package "hp-smh-templates"
package "hpsmh"

service "hpsmhd" do
  action [ :start, :enable ]
end

service "snmpd" do
  action [ :start, :enable ]
end

remote_file "/opt/hp/hpsmh/conf/smhpd.xml" do
  source "smhpd.xml"
  owner "root"
  group "root"
  mode 0644
  notifies :restart, resources(:service => "hpsmhd")
end

remote_file "/etc/snmp/snmpd.conf" do
  source "snmpd.conf"
  owner "root"
  group "root"
  mode 0644
  notifies :restart, resources(:service => "snmpd")
end

Talent is Human

As I look back at growing up in a small town, there was a surprising lack of everyone wanting to move to the city as soon as they could. Perhaps that was because there is not a recognizible city anywhere near coastal eastern Maine. Despite, there still was a lingering belief that people were different elsewhere. Granted, they’re different, but in the same ways.

The majority of those I consider my colleagues have not worked for the same companies that I have. While our projects are of importance to our companies, it is usually our passion and not our employment that drive them. Some days I feel certain this is commonly understood, but it only takes a personal blog policy or a social media marketing drive to remind me that I’m actually isolated on an island of like-minded individuals hiding under the radar like stowaways. You can’t escape culture, but you can find different ones.

In Paul Graham’s recent essay about Apple, he markedly warns of mistreating the developers of your platform, lest they form a distaste for your brand altogether. Before I read the essay I was feeling quite sure that it was commonly understood today that developers are your greatest asset. Perhaps more valuable than even your big idea. Likely due to being mentioned by name in the essay, I was reminded of the great Google interview algorithm; commonly known for streamlining their processes at the cost of the interviewee. This seems to only alienate the prospect, unless they happen to enjoy passing tests over creating value. As the strengths of mass-collaboration become more accepted, it strikes me odd that on the whole we’re still missing that it is made up of individual human talent.

The product of our creativity is no longer hidden behind towering walls of corporations. We are global citizens innovating for the sake of it. You won’t see this on a college transcript, in ones knowledge of inodes, or in a six month product road map of release stability. The pieces are not exactly hidden either. I’m tempted to point out how slowly we’re changing by example with the United States’ difficulty transitioning from educating factory workers to innovators now that globalization has helped much of the the rest of the world catch up as industrial nations. However I can’t help but remember that we’ve gotten this far on our own.

Despite reminding us that we are living in a small town, the murmuring you’ve heard from pundits and rabble-rousers but could not make out sounds perfectly clear here. We are not going to wait for you to get it. The catch is that we no longer need to move to the city, because we’re building it every day. Coming?