Infrastructure as a code sample

Upon returning from Open Source Bridge in Portland last week, I collected my thoughts from the convergence of configuration management developers and wrote The Configuration Management Revolution, centered around the idea the something bigger is happening than we’re acknowledging.

Today Tim O’Reilly posted a blog entry about the origins of Velocity. He says “I had been thinking in the abstract about the fact that as we move to a software as a service world, one of the big changes was that applications had people “inside” of them, managing them, tuning them, and helping them respond to constantly changing conditions.” which builds on his post three years ago about operations becoming the “elephant in the room”.

That article is worth revisiting. It tails off commenting on the lack of open source deployment tools. That has definitely changed, as we have a number of open source options in the operations tool space now. O’Reilly has published a few books on operations as well, although hasn’t taken the step of considering it a category in their book list yet.

The web is full of howtos, blog posts and assorted notes on piecing together open source software to build a server. One doesn’t have to be an expert on all of the ingredients, but rather be able to figure out how to assemble them. As time goes on, the problems of the past become easier to solve; former creative solutions become mainstream and the industry leverages those advantages. This frees up mindshare for something new. I’ll emphasize that this doesn’t mean one no longer needs to have some understanding of why the server works, but the time spent engineering that particular solution is reduced because we already have the wheel, so to speak.

Writing configuration management and thus infrastructure howtos may get one started, but it’s the old way of thinking. If you can write infrastructure as code, you can share infrastructure as code. It is essential that this is achieved in a format that both promotes sharing and is relatively easy. Take the Munin and Ganglia plugin sites for instance. Munin is relatively easy to get started with and has a simple enough site for exchanging plugins. While I consider Ganglia technically superior, it’s community is not. I tried submitting to Ganglia’s plugin site once and failed. This step has to be more than a site where files are dumped, it needs community support.

I asked Luke about this at OSBridge and he said Reductive Labs plans to have a module sharing website online soon for puppet. For now, you can find an number of puppet modules in the wiki. Opscode is on track, with their chef cookbooks available as a git repository on github, combined with a ticketing system allowing users to fork, modify and contribute changes. There’s even a wiki page helping to instruct how to leverage these.

Of course, you’ll always need experienced engineers to design and tune your infrastructure. However, the time and mindshare savings from creating a LAMP stack by setting a tag or role to ‘lamp’ is immense. As Opscode produces more open APIs between parts of their product, my mind imagines the offspring of the Chef UI and virt-manager. How long until the popup touting “New features are available for your web cluster”?

The Configuration Management Revolution

The revolution is coming, and it’s about time I wrote about it.

About a year and a half ago I was settling in to a new system administration job at a startup. I was told a consulting company would be coming in to bootstrap configuration management for us. I had previously glanced at cfengine out of curiosity, but ended up spending only a couple of hours looking at it. In my mind configuration management was analogous to unattended software installation, which I was definitely in support of, but had yet to perceive how it was going to change how I viewed infrastructure.

That consulting company was HJK Solutions. Some of my coworkers had previously established relationships with a couple of the partners of HJK, but I didn’t know anything about them myself. I was along for the ride. They gave us a presentation where they showed iClassify and puppet working together to automate infrastructure for other clients, but it wasn’t until the next meeting where we made technical decisions about the implementation that I really came to appreciate their insight. It is much more interesting why someone makes a choice than the choice itself, and this was my first of many since opportunities to incite the opinions of Adam Jacob.

A year of using puppet later, not only was I hooked but my excitement about the possibilities of configuration management had grown beyond what the software could do at the time. Both my excitement and frustration was apparent and got me a sneak peak at Opscode’s Chef. The design of Chef embodies “the unix way” of chaining many tools together insofar that it allows us to take building blocks that are essentially simple on their own but from behind our backs present a system that is revolutionary enough we almost fail to recognize the familiar pieces of it.

Chef is a systems integration framework, built to bring the benefits of configuration management to your entire infrastructure.

This is not an article about Chef, this is about the big picture. However, if you take enough steps back from that statement it becomes apparent that Opscode is building toward that picture. I want to share with you the excitement that short description garners inside of me.

Configuration management alone is the act of programmatically configuring your systems. Often the benefits are conveyed in support of process, but in more agile communities different advantages are touted; such as allowing one to wrangle larger number of servers by reducing build times in the name of vertical scalability, building more maintainable infrastructures by leveraging the self-documenting side-affect of configuration languages, and reducing administrator burnout by cutting a swath in the number of repetitive tasks one must perform. These are unarguably significant boons. Nevertheless, one does not have to look hard to find a curmudgeon reluctant to change, claiming they don’t want to learn another language, that having systems run themselves will surely cause failure, or perhaps some skynet-esque doomsday scenario. History is not short of examples of luddites holding steadfast against new technology, but it is somewhat paradoxical to see this mentality held in such a technologically oriented field.

The recent Configuration Management Panel at the Open Source Bridge conference in Portland amassed many relevant core developers in one city long enough to provide a good vibe for the direction of the available tools and underscore our common charge. But the focus was more about how we will get more users of configuration management tools than why they are going to have to use them. In retrospect, perhaps I should have asked of the panel their views of how configuration management will reshape systems administration.

Configuration management is about more than automation. Some who have foreseen this have started to convey this by discussing managing infrastructures rather than systems. In analogy, the power loom, Gutenberg press, and intermodal shipping container were not merely time saving tools of automation. These inventions reshaped more than their workforce and industry, but also the global economy.

I’m fully aware of the tone set by such a call of prophecy. How will a tool that helps us configure multiple machines at once make such significant ripples in our day to day lives of the future? It will because we will be enabled to solve new problems that we did not yet realize existed. As other technological advances served as a catalyst for globalization, the industrial and scientific revolutions; changing how we build our information infrastructure leaves us poised for an exciting set of challenges that do not yet exist.

LSI mptlinux / mptsas 3.12.29 on ubuntu

I recently upgraded Dell OMSA to 6.0.1 on a number of Ubuntu Intrepid and Jaunty hosts using sara.dl’s packages and got a warning that the mptsas driver version 3.04.07 was below the minimum supported version. The version from ‘modinfo mptsas’ confirmed I was on the right track looking at this driver. A quick look revealed no update in 2.6.29.4 or 2.6.30-rc8, so I went searching for the drivers source.

LSI’s site is terrible. I have Dell 1955 blades, and the Dell SAS5/iR chipsets are really LSI SAS1068s. I searched the drivers page for SAS1068 eventually and found the right download page. I grabbed the 4.18.00 archive file.

After decompressing it I found a dkms folder and rpm. I eventually gave up and used this to build a dkms deb with the following commands:

sudo apt-get install dkms
sudo rpm -i mptlinux-4.18.00.00-1dkms.noarch.rpm --no-deps
sudo dkms mkdeb -m mptlinux -v 4.18.00.00
scp /var/lib/dkms/mptlinux/4.18.00.00/deb/mptlinux-dkms_4.18.00.00_all.deb OTHERHOST:

Then install that deb on the otherhost (with the LSI based chipset) and it will install the correct modules via dkms. I rebooted and used modinfo to verify that mptsas was now version 4.18.00 and ‘omreport storage controller’ now reports ‘Ok’ instead of ‘Degraded’ again.

Recovering from a Windows Server 2003 mirrored dynamic disk failure

I’m no fan of software raid. Pretty much, ever. At my last job, for whom I still consult, my predecessor was really into technology creep. All of the workstations used that awesome fake raid that is actually implemented in the mass storage driver and is therefore pretty useless and can actually reduce your paths to recovery from disk failure. I’ll leave out the list of arguments against software raid. It just simply isn’t worth it.

I showed up to a call with a server with an 0×7b error. Of course, Microsoft has this cool feature by default where servers automatically reboot when they blue screen. So nobody knew this was the error until I showed up and tried the “don’t automatically restart on BSOD” option under the F8 startup menu. I’m used to this error from moving system images between hardware, especially with virtual machines. As it turns out, the other values inside the parenthesis are actually useful. If the second value inside the parenthesis is 0×00000010, then you’re likely dealing with a disk in a software raid mirror set (dynamic disk) that Windows has marked as failed, and thus won’t start from.

The trick, which took me a while to nail down, is getting a boot.ini setup to boot from another disk. Since you can’t actually access this partition even in the Recovery Console, you can’t edit the boot.ini to tell it to start from the other disk. In the end, I formated a floppy using simply ‘format A:’ on an XP desktop (would you believe this entire data center lacks a Windows server with a floppy drive?), then copied ntldr, ntdetect.com and boot.ini from another Server 2003 machine with the same service pack to this floppy. Then I changed the boot.ini to contain:

[boot loader]
timeout=60
default=multi(0)disk(0)rdisk(0)partition(1)\WINDOWS
[operating systems]
multi(0)disk(0)rdisk(0)partition(1)\WINDOWS=”DISK 0″ /noexecute=optout /fastdetect /3GB
multi(0)disk(0)rdisk(1)partition(1)\WINDOWS=”DISK 1″ /noexecute=optout /fastdetect /3GB
multi(0)disk(0)rdisk(2)partition(1)\WINDOWS=”DISK 2″ /noexecute=optout /fastdetect /3GB
multi(0)disk(0)rdisk(3)partition(1)\WINDOWS=”DISK 3″ /noexecute=optout /fastdetect /3GB

If you’re not familiar with this file, you may want to read about ARC paths. Remember that ntldr and ntdetect.com are hidden, system and read-only by default, although it’s fine to leave this options unset. ‘attrib -s -h -r C:\ntldr’ will make the file accessible so you can copy it to a floppy. I have to assume when you format a floppy from an NT based operating system it puts a bit of code in the bootsector to look for these files.

I then booted from the floppy and for me I then chose ‘DISK 1′ and the system started up fine. I went pulled the failed disk (carefully guessed which disk it was by the disk order in disk management and the scsi id jumper settings) and replaced it. In disk management, right click the good disk, “remove mirror” and choose the missing disk. Then right click again, “add mirror” and choose the new disk. Drink coffee.

It’s late and I can’t figure out how to run ‘fixboot’ and ‘fixmbr’ with a disk mirror, so I’m still using the floppy disk to boot and choose either disk to start from.

Migrating Virtual PC Windows servers to KVM

Windows XP / 2003 have always been treacherously unstable when moving them between hardware, so much so that fresh installs are wired into my head as being the only option. I’m tired of having three different virtualization platforms, and I don’t want to rebuild these machines. ‘Virtual Server 2005 R2′ is the first to go.

To deal with getting 0×000000ce / 0xce BSOD’s with processr.sys, set both/either of these key/values to ‘4′ in the registry before migrating the machine, or on the first startup:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Processor\Start
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Intelppm\Start

To avoid getting a 0×0000007b / 0×7b BSOD on startup due to the mass storage controller changing, run the registry modifications in MS KB 314082. I save the text inside the “copy here” block to a .reg file and ran it before converting below, and it was enough.

Then use the free VMWare vCenter Converter to convert the Virtual PC image (it has to be off) to a vmware image. On the third step of the wizard, under options, I set the disk controller from automatic to ide. Copy the resulting vmdk file to your KVM host.

Use qemu-img or kvm-img to convert the disk image from vmdk to qcow2:

kvm-img convert -O qcow2 server-ide.vmdk server-ide.qcow2

You can then use KVM to run this disk image. I use libvirt, I simply copied another libvirt xml file, removed the MAC addresses from it, removed the uuid, updated the guest name, and point it to this disk, specifying ‘ide’ as the ‘bus’ in the disk’s target element.

It takes the keyboard and mouse a few moments to work on the console the first time, I assume the devices are being detected in the background. I had one ‘PCI device’ detected without a driver, I let it stay that way. I set the resolution up by hand. On an early run I had issues with the VGA driver, but I can no longer recreate this.

If you were using a static address, you may need to follow the directions in MS KB 269155 to delete the old network interface that is now hidden or you’ll get an error about the address being in use.

Definitely leave a comment about how this works for you. It’s like playing with fire.

Here’s a libvirt xml for kicks:

<domain type='kvm'>
  <name>server</name>
  <memory>786432</memory>
  <currentMemory>786432</currentMemory>
  <vcpu>1</vcpu>
  <os>
    <type arch='i686' machine='pc'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
  </features>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <devices>
    <emulator>/usr/bin/kvm</emulator>
    <disk type='file' device='disk'>
      <source file='/srv/kvm/server/server-ide.qcow2'/>
      <target dev='hda' bus='ide'/>
    </disk>
    <interface type='bridge'>
      <source bridge='br0'/>
    </interface>
    <input type='mouse' bus='ps2'/>
    <graphics type='vnc' port='-1' autoport='yes' listen='127.0.0.1'/>
  </devices>
</domain>

Etherchannel and trunking with Cisco 3524xl and 6509

The Cisco 3524XL doesn’t support PaGP or LaCP, you simply configure etherchannel by adding ‘port group N’ to each interface. The port group takes the configuration of the first interface in the port group

! Cisco 3524XL
interface FastEthernet0/1
 description uplink to 6509
 port group 1
 switchport trunk encapsulation dot1q
 switchport mode trunk
end

The 6509 supports more dynamic protocols, and will try to use them unless you specify ’switchport nonegotiate’ on the portchannel interface, which is key. Otherwise everytime you turn on ‘channel-group 4 mode on’ the ports will go down on the 3524XL and the ports on the 6509 will go into the ‘err-disabled’ state until you ’shut’ / ‘no shut’ them.

! Cisco 6509
interface GigabitEthernet7/7
 description sw03 - rack 3
 no ip address
 switchport
 switchport mode trunk
 switchport nonegotiate
 channel-group 4 mode on
end

interface Port-channel4
 description sw03 - rack 3
 no ip address
 switchport
 switchport trunk encapsulation dot1q
 switchport mode trunk
 switchport nonegotiate
end

libvirt: unknown OS type hvm

It took me a little while to narrow this down. Building a kvm guest with vmbuilder via libvirt I was getting the error “unknown OS type hvm”. When I compared the output of ‘virsh capabilities’ on a good host and the one that wasn’t working, the later was missing the kvm hvm entries. When I checked out the init script for kvm, I realized the the kernel module wasn’t loaded and a quick check of dmesg confirmed that virtualization was disabled in the bios.

Why can’t sysadmins build networks?

Why can’t System Administrators get network design?

Sometime around 1997 I built my first ISP. I was doing computer repair for a man at the time. Internet access was just getting situated in my small city. This man wanted in, but showed up at my house in frustration one night because he couldn’t figure out how to get the router to work. He came sporting a $100 bill and told me it was mine if I fixed it. I suppose it was going to be much more than he had been paying me hourly, but I was more interested in the problem then the pay, and he was frustrated. He had a Livingston Portmaster 2ER, a pile of external modems, and a 56K frame relay uplink to another local ISP. This ISP was always more network gear than computers, because he was “thrifty” mostly, despite owning a computer store. There was an NT 3.5.1 box, a Linux box, and for a little while before it got reappropriated, a FreeBSD machine as well. As fanciness like 56k modems came out and customers grew, hardware scaled out. It remained mostly network hardware.

Ever since then, every network I’ve inherited has been a mess. There have been design ideals focused around age old buzzwords like “security” that results in a pile of expensive security gear that’s essentially useless because proper implementation and design simply wasn’t understood. All of them have grown their L2 infrastructure out horizontally, usually with terribly cheap switches, but often with terrible not so cheap switches as well. Patch Panels and cabling have always run amok, usually with patch cables two to three times longer than necessary stuffed into the cable ducts.

VLANs are almost always used on a single switch, then individual switches are plugged into access ports to provide a switch for every VLAN. Or worse, the switches are all broken up into multiple vlans, with an uplink cable for each VLAN. It’s obvious that concepts like trunking and vtp are simply not understood. These don’t add complexity cost, they simplify what otherwise tends to be a disaster.

I find myself up early lying in bed thinking about the second round of ripping out erroneous unmanaged switches and migrating a live production network to a proper hierarchal design. Suddenly I realized it shouldn’t have to be this way, and really wish more administrators had at least the knowledge of a CCNA. Small companies don’t usual get the benefit of administrators who take the time to understand technology, and usually suffice on consultants who draw a direct line between something functioning and it being right, unfortunately between something not working and it being wrong as well. The latter is almost always because they failed to understand the problem and instead blamed the vendor or technology, from then on spouting that using a SAN creates a SPOF, domain controllers can’t be virtual machines, portable A/C doesn’t actually do anything.

As I trudge through my memory recalling these kinds of misguided attempts at wisdom, they all have a common denominator: not knowing the cause of the problems they are having. You have to understand the technology you’re leveraging. It’s absolutely essential that you know why your network works, not only that it does at the moment.

Displaying the time in wordpress posts with K2

K2 defaults to adding:

‘Published by btm on April 16, 2009 in Uncategorized’

to posts, which doesn’t include the time, which is sometimes contextually important. This is controlled in ‘theloop.php’ in K2, which uses the date_format, which you can set under ‘Settings -> General’ in the wordpress configuration. The format is the php date format. Simply using ‘r’ is the best, since it provides a nice RFC 2822 formatted date like:

‘Published by btm on Mon, 20 Apr 2009 09:28:48 -0700 in Uncategorized’.

Configuring LVM preseed on Ubuntu intrepid

It recently clicked in my head that all the blades with small swap partitions were because they had their OS installed when they had very little RAM in them. So I set out to modify the Ubuntu 8.10 preseed install to create a larger swap partition and configure LVM while we were at it.

This proved difficult. Mostly because the better documentation of debian-installer (preseed, partman-auto) has features that aren’t in the version in Ubuntu.

Just got this working:

d-i partman-auto/disk string /dev/sda
d-i partman-auto/method string lvm
d-i partman-lvm/device_remove_lvm boolean true
d-i partman-lvm/device_remove_lvm_span boolean true
d-i partman-auto/purge_lvm_from_device  boolean true
d-i partman-auto-lvm/new_vg_name string system
#d-i partman-auto/init_automatically_partition \
#  select Guided - use entire disk and set up LVM
d-i partman-auto/expert_recipe string                         \
      boot-root ::                                            \
              40 300 300 ext3                                 \
                      $primary{ }                             \
                      $bootable{ }                            \
                      method{ format } format{ }              \
                      use_filesystem{ } filesystem{ ext3 }    \
                      mountpoint{ /boot }                     \
              .                                               \
              2000 10000 1000000000 ext3                      \
                      $lvmok{ }                               \
                      method{ format } format{ }              \
                      use_filesystem{ } filesystem{ ext3 }    \
                      mountpoint{ / }                         \
              .                                               \
              8000 8000 200% linux-swap                       \
                      $lvmok{ }                               \
                      method{ swap } format{ }                \
              .

d-i partman-lvm/confirm boolean true
d-i partman/confirm_write_new_label boolean true
d-i partman/choose_partition select Finish partitioning and write changes to disk
d-i partman/confirm boolean true

Quick bridging with KVM on Ubuntu jaunty

It took me a little while to put the pieces together to figure out how to take a vm-builder created vm and use briding with it instead of kvm/qemu’s user-mode networking. All the pieces are available on the internet, but there was some emphasis lacking to make it all clear to me.

You’ll need to have a bridge set up on your host. Install the ‘bridge-utils’ package first. Then the relevant section of my /etc/network/interfaces file looks like:

# The primary network interface
auto eth0
iface eth0 inet manual
up ifconfig $IFACE up

auto br0
iface br0 inet static
address 10.0.0.60
netmask 255.255.255.0
gateway 10.0.0.1
bridge_ports eth0
bridge_stp off
bridge_maxwait 0
bridge_fd 0
bridge_hello 0

You could probably use ‘dhcp’ instead of a ’static’ address on the bridge. The point is that your ipv4 address should be on the bridge, not on the actual interface.

Then create a ‘br-ifup’ script in your vm directory. This is based on /etc/qemu-ifup. This script is passed the name of the interface (tap0) which brings the interface up, and then adds it to your bridge.
#!/bin/sh
#sudo -p "Password for $0:" /sbin/ifconfig $1 172.20.0.1
sudo /sbin/ifconfig $1 up
sudo /usr/sbin/brctl addif br0 $1

Then run kvm with something like this:

sudo kvm -m 128 -smp 1 -drive file=disk0.qcow2 -net nic -net tap,script=br-ifup

‘/etc/kvm-ifup: could not launch network script’ means that the script passed in ’script=’ could not be found.

‘Could not initialize device ‘tap” means that kvm is unable to create the TAP/TUN interface. Running kvm as root via sudo is the easy solution.

‘warning: could not open /dev/net/tun: no virtual network emulation’ probably means that the ‘tun’ module isn’t loaded. You can load it with ’sudo modprobe tun’.

The tap interface is removed from the bridge when the guest is shutdown.

Beware of MAC address generation on libvirt 0.4.4

Two or three times now libvirt (0.4.4-3ubuntu3.1, Ubuntu Intrepid 8.10) has automatically generated overlapping MAC addresses on me. I can’t find the source for this MAC address generation in 0.4.4, but in 0.6.1 which is included in Ubuntu Jaunty 9.04 it’s virGenerateMacAddr in src/util.c. This leads me to believe it’s been rewritten, and I’m hoping it’s better. It looks perfectly fine.

Comodo is shady

A few minutes ago I got a cold call on my cell phone. I almost didn’t answer, I tend not to answer calls to my cellphone from unknown numbers. I have teams of lawyers and medical people out there looking for me sometimes, so sometimes I must.

The caller said that my SSL certificate was expiring soon with Company A (who I forget because it’s an old certificate for email I don’t use anymore since I switched to Google for mail) and they’d like the chance to win me over. I paused as I added this all up in my head. After I realized it was just telemarketing, I said “No, thanks” and hung up. Then I get an email from them. Scroll down and read it, them come back.

I like the Creating Trust Online part. Is this a strong arm technique meant to scare me into purchasing from them? Are they trying to create some kind of trust in a “we know more than you, buy our stuff” way? Is this Louis character rogue or is this standard operating procedure?

Ways to get me to never buy products or services from you:
1) Call me
2) Call me, then send me an email

I almost filed the call under weird and forgot about it, thanks for the email that I can search for later when I’m shopping for SSL certificates so I know who not to call.

Delivered-To: btm@loftninjas.org
Received: by 10.142.215.17 with SMTP id n17cs645196wfg;
        Thu, 12 Mar 2009 10:48:23 -0700 (PDT)
Received: by 10.150.95.15 with SMTP id s15mr422861ybb.247.1236880102854;
        Thu, 12 Mar 2009 10:48:22 -0700 (PDT)
Return-Path: 
Received: from sharon.nj.office.comodo.net (mail.nj.office.comodo.net [38.104.66.254])
        by mx.google.com with ESMTP id 1si2384323gxk.79.2009.03.12.10.48.18;
        Thu, 12 Mar 2009 10:48:19 -0700 (PDT)
Received-SPF: pass (google.com: domain of louis.cicero@comodo.com designates 38.104.66.254 as permitted sender) client-ip=38.104.66.254;
Authentication-Results: mx.google.com; spf=pass (google.com: domain of louis.cicero@comodo.com designates 38.104.66.254 as permitted sender) smtp.mail=louis.cicero@comodo.com
Received: (qmail 13908 invoked by uid 1001); 12 Mar 2009 17:48:17 -0000
Received: from mmonroe.comodo.net (HELO louisc) (192.168.68.79)
    by sharon.nj.office.comodo.net (qpsmtpd/0.40) with ESMTP; Thu, 12 Mar 2009 13:48:17 -0400
From: "Louis Cicero" 
To: 
Subject: Info on compromised root key
Date: Thu, 12 Mar 2009 13:48:16 -0400
Message-ID: <00a201c9a33a$b955fa20$4f44a8c0@comodo.net>
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----=_NextPart_000_00A3_01C9A319.32445A20"
X-Mailer: Microsoft Office Outlook 11
Thread-Index: AcmjOrkMPeS02oldT1mZI5bKFnL3rA==
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3350
X-Comodo-Virus-Checked: Checked by ClamAV on sharon.nj.office.comodo.net
X-Comodo-ClamAV-Virus-Program: ClamAV 0.92.1

This is a multi-part message in MIME format.

------=_NextPart_000_00A3_01C9A319.32445A20
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: 7bit

http://www.computerworld.com/action/article.do?command=viewArticleBasic
 &articleId=9124558&intsrc=it_blogwatch

http://bits.blogs.nytimes.com/2008/12/30/outdated-security-software-threaten
s-web-commerce/

1024-bit encryption is 'compromised'

Upgrade to 2048-bit, says crypto expert

Written by James Middleton

vnunet.com  

According to a security debate sparked off by cryptography expert Lucky
Green on Bugtraq yesterday, 1,024-bit RSA encryption should be "considered
compromised".

The Financial Cryptography conference earlier this month, which largely
focused on a paper   published by
cryptographer Dan Bernstein last October detailing integer factoring
methodologies, revealed "significant practical security implications
impacting the overwhelming majority of deployed systems utilising RSA as the
public key algorithm".

Based on Bernstein's proposed architecture, a panel of experts estimated
that a 1,024-bit RSA factoring device can be built using only commercially
available technology for a price range of several hundred million to $1bn.

These costs would be significantly lowered with the use of a chip fab. As
the panel pointed out: "It is a matter of public record that the National
Security Agency [NSA] as well as the Chinese, Russian, French and many other
intelligence agencies all operate their own fabs."

And as for the prohibitively high price tag, Green warned that we should
keep in mind that the National Reconnaissance Office regularly launches
Signal Intelligence satellites costing close to $2bn each.

"Would the NSA have built a device at less than half the cost of one of its
satellites to be able to decipher the interception data obtained via many
such satellites? The NSA would have to be derelict of duty to not have done
so," he said.

The machine proposed by Bernstein would be able to break a 1,024-bit key in
seconds to minutes. But the security implications of the practical
'breakability' of such a key run far deeper.

None of the commonly deployed systems, such as HTTPS, SSH, IPSec, S/MIME and
PGP, use keys stronger than 1,024-bit, and you would be hard pushed to find
vendors offering support for any more than this.

What this means, according to Green, is that "an opponent capable of
breaking all of the above will have access to virtually any corporate or
private communications and services that are connected to the internet".

"The most sensible recommendation in response to these findings at this time
is to upgrade your security infrastructure to utilise 2,048-bit user keys at
the next convenient opportunity," he advised.

But a comment   from
well known cryptographer Bruce Schneier casts doubt on Bernstein's findings
in practical application.

"It will be years before anyone knows exactly whether, and how, this work
will affect the actual factoring of practical numbers," he said.

But Green, much to the clamour of "overreaction" from the Slashdot
community, added: "In light of the above, I reluctantly revoked all my
personal 1,024-bit PGP keys and the large web-of-trust that these keys have
acquired over time. The keys should be considered compromised."

Whatever the practical security implications, one sharp-witted Slashdot
reader pointed out: "Security is about risk management. If you have
something to protect that's worth $1bn for someone to steal, and the only
protection you have on it is 1,024-bit crypto, you deserve to have it stolen

Louis Cicero

Business Development Executive - Comodo 

Direct Line 1- 908- 376-0145

Main Office US: +1 888.COMODO1 (888.266.6361) ext.4062

Fax US: +1 866-405-5816

Louis.Cicero@Comodo.com 

Creating Trust Online

Comodo   Helps
Leading Cutlery eTailer Increase Individual Transactional Value By Over 250%

Generating sha512 passwords

Normally I would use ‘openssl passwd’ to generate encrypted passwords for scripts and config files, but it doesn’t appear to support sha256 and sha512 yet. There doesn’t appear to be an openssl ticket for this yet. Ubuntu has switched to using SHA512 by default (see ENCRYPT_METHOD in /etc/login.defs). In the course of tracking down the use of passwd/root-password-crypted not working in a jaunty pxe/network install (LP: 340841), I needed to generated a sha512 password to replace the md5 password in the d-i config file.

15:11 < cjwatson> $ echo cjwatson:foo | chpasswd -S -c SHA512
15:11 < cjwatson> cjwatson:$6$K./rc/OhIRi$ylKWgewTkGP3TyXfwj8nnKyIhph66WucLseLjGKKzRM0oRcuRzng2szcC/JZpY13dLxmlILx7eSfdfMHTruH40

Samba/winbind 3.3.1 on Ubuntu jaunty

I’ve been working on testing jaunty before it goes live. Winbind stopped working and I initially assumed it was another configuration change. In the end, it was. The caching functionality wasn’t very straight forward so it took me a while to get to a point where I could test configurations without the cache messing with the results. Intrepid to Jaunty is Samba 3.2.3 to 3.3.1, which being a different major version includes some changes. Mostly the internet is chock full of examples that don’t specify the version of Samba that they’re for, and it’s been changing a lot.

It looks like 3.0.21a added support for ‘idmap backend = ad’ for retrieving uid/gid information from active directory. At some point ‘idmap config’ showed up, for maintaining multiple domains. I assume this was around 3.0.25 where ‘idmap domains’ showed up. Apparently with 3.3.0, the ‘idmap backend’ is back, which became depreciated with the 3.0.25 changes. There is talk in the release notes of using ‘idmap uid’ and ‘idmap gid’. I’ve seen errors about these not existing, I just went without. Without further ado, here’s my working winbind config:

[global]
security = ADS
server string = %h server (Samba %v)
workgroup = WM
realm = CORP.WIDEMILE.COM
idmap config WM : backend = ad
idmap config WM : schema_mode = rfc2307
idmap config WM : range = 1000-20000
winbind enum users = Yes
winbind enum groups = Yes
winbind use default domain = Yes
winbind nested groups = Yes
template shell = /bin/bash
template homedir = /home/%U
allow trusted domains = No

The other interesting thing was the caching. I eventually read the code while watching the output of ‘winbind -i -d10 -n -s /etc/samba/smb.test.conf’ and saw that ‘-n’ which is supposed to disable caching doesn’t affect the idmap cache. The ‘winbindd_cache.tdb’ and ‘winbind_idmap.tdb’ files were not said cache. It ended up hiding in ‘/var/run/samba/gencache.tdb’, with who knows what else. You need to delete this file manually each run. I filed a bug over it too.

The Public Domain

I just finished reading The Public Domain. Before I had even finished the book, I had purchased multiple copies online, tried to arrange to get more copies in the library [and failed], and began scheming up ways to get others to read it.

I’ve always had a community oriented mindset. Having limits on copyright, patents and their ilk has always been an important issue to me. However this book frames the issue from many directions, helping you see just how much we stand to lose if the tides do not change.

Songs written by Ray Charles, who played a part in the birth of soul, may never have been released in today’s environment, where copyright extends far beyond the life of the artist.

Do you remember before Wikipedia? An excellent question, when was the last time you looked up something in a regular encyclopedia? What would the Internet be like today if we argued about net neutraility fifteen years ago. Would you have put your faith in a world-wide band of individual software developers to change the way blue chip companies like IBM do business? Really?

The book touches on mashups in music and how it’s nearly impossible to do the sampling you could do a few years ago now. We’re not just talking about sampling new music either, copyright has been extended beyond the life of the artist retroactively so the few copyrights with a viable business model get to maintain. That was never the reason for the monopoly power behind copyright; it exists to fuel innovation, not create new business models. If we risked so many musical genre’s of the past (like soul, aforementioned) what are we losing out on because of the limits today?

What about all the music, books, and material that cannot be archived and digitized because of the copyrights? We can’t begin to fathom how immensely important this information could be to us in fifteen years. The Internet is a perfectly example of amazing sources of creativity that couldn’t have been planned for in a study.

Read this book, it’s even online under the Creative Commons. Pass it on. I’ll even send you a copy if you promise to.

stack level too deep with rcov on Ubuntu 8.10

/usr/lib/ruby/1.8/rexml/formatters/pretty.rb:129:in `wrap': stack level too deep (SystemStackError)

I’ve had this issue for a while but just started looking for a solution. There’s a number of REXML workarounds in ‘/usr/lib/ruby/1.8/rcov/report.rb’ of debian rcov package version 0.8.1.2-2 for Ruby 1.8.6. Since we’re using ubuntu ruby package 1.8.7.72-1ubuntu0.1 now, these workarounds aren’t used. The cheap workaround is to edit this file directly and edit line 15 to change 1.8.6 to 1.8.7.

if RUBY_VERSION == "1.8.7" && defined? REXML::Formatters::Transitive

Writing to the clipboard from the command line in Linux

I needed to paste a bunch of data to my browser to get it into a gist and didn’t want to copy and paste a page at a time. Install the ‘xsel’ package and you can use it to manipulate the clipboards.

ohai | xsel -b

This takes the output of the program and puts it on the ‘clipboard selection’ instead of the ‘primary selection’, which I needed to do to make firefox happy. You can also see the selections from the command prompt with ‘xsel -o’.

Learning to cook

The chef satire will never die. Adam posted 9 things to like about chef today, which is an expanded and much better version of my original blog post on chef. AJ had an intermediate post that tried to summarize a lot of contraversy and drama. Hopefully that silliness is settling down.

I’ve been coding a lot lately, contributing to both chef and ohai. We’ve been talking about trying to use chef in the NOC at Shmoocon so that next year we can reuse the recipes rather than build the servers again by hand. Most everything runs on borrowed hardware at Shmoocon, so you’re not guaranteed everything is the way you left it a year later. We use FreeBSD for some monitoring at Shmoocon, so I’ve been spending a lot of time getting chef/ohai ready for FreeBSD.

I don’t think I’ve ever contributed to a project to this degree before. Ohloh doesn’t think so either. The last time I can recall really adding code to a project that was more than a couple files was at an ISP in Maine back in the early 00’s. It was called Panax, and there’s the usual pile of silly isp shop history. It’s funny that while it’s been sucked into an ISP conglomerate the old color scheme has been maintained. We had an in-house system for user/account management, written in Perl. It had a web front end so none of the tech support folks had to log in to any of the systems to add, remove or manage users. Usually I’m just writing glue scripts, like a good SA. Regardless, it’s been fun and I’ve been learning a lot about Ruby and rspec.

An SE at my last job (who subscribes to python and I still haven’t convinced that CM will change his live) said going into development would be a natural move as I got bored of SA work. Is it that, or is this a shift in being an SA will mean? Configuration Management is still young, despite cfengine being out for some time now, and puppet getting a good following. It may take time for the old SAs to retire and the new deal to take hold. I think more and more as people work in shops with CM implemented, they’ll start to find how hard it is to live without it once you’ve had it. I noticed recently that Slashdot lacks any coverage on Configuration Management in the last few years, but I realize Slashdot is mostly fluffy news these days. While Slashdot is still talking about SCO every day, there is of course talk of new technologies in the new mediums.

The next few months will be exciting to see people pick up chef. There’s a few very helpful individuals in #chef on freenode who want to see this used and are perfectly willing to fix any bugs you find. So give it a shot and let me know what you think.

Replacing munin with ganglia

I’ve been using munin for some time for server trending. It works well out of the box, but it gets really difficult to get it to scale. The poller runs every five minutes and if it doesn’t finish, the next run is simply skipped. As you add more and more data points, this becomes more likely and more common. You simply can’t use SNMP with it (well, you CAN) because the poll is real time and so slow it increases the poller run time significantly.

Adam Jacob at HJK put together a replacement poller called Moonin, but they’ve been busy with chef and it appears in maintainence mode (or worse). We currently run Moonin, until we find a better solution. John Allspaw talks everywhere about using Ganglia at flickr, so I’ve been testing that.

Ganglia definitely lacks the community that munin has, but I like it’s design much better. It was written for monitoring clusters and supports all sorts of business like using multicast to share traffic data about the cluster. I also like that it’s interface for exchanging data is XML and opposed to the custom stuff in munin. This makes it easier to share data about. It’s fast though. When you write plugins for it using gmetric, you give the data to the monitoring daemon, gmond, instead of it polling. Then you collect the data from your clusters using gmetad, and eventually display the data with the web front end.

The lessons I’ve learned so far is that, at least as of 3.1.1, you can only have one cluster per multicast address/pair combination. Regardless of the setting in your gmond configuration, all nodes get reported as a part of the cluster that the machine running gmond is in when gmetad contacts it. I’ve had to deal with this by setting each cluster to use a different port. This isn’t a big deal, because I’m using chef so the gmond configuration file is a ruby template anyhow, but I consider it a bug. In the gmetad configuration you then poll a gmond in each cluster (you can poll multiple nodes in each cluster for redundancy) which forms a grid. Each gmetad instance only supports a single grid for now. The point is this is all very scalable.

The bonus of clusters for us is you can group each type of server, say all your front end web servers, into a cluster, and you get aggregate graphs out of the box. They are limited to a couple default metrics like CPU, but it’s nice. In regard to aggregates for other metrics, I don’t know yet if you can do it or how to go about it.

In my first attempt at adding additional metrics, I wrote a ruby script to poll jboss for statistics data, which you can then pass to gmetric using cron. I’m going to dump it here so it’s on the net. If I keep writing these I’ll put them on github or somewhere.


#!/usr/bin/ruby
#
# tomcat-stat - Collects statistics from tomcat via the status interface,
#   and provides the data for use in other scripts
#
# Copyright 2009 Bryan McLellan (btm@loftninjas.org)
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# To use with ganglia add a cron entry such as:
# * * * * * /usr/bin/gmetric -n 'tomcat threads max' -t uint32 -v `/usr/local/bin/tomcat-stat --thread-max`
#
require 'optparse'
require 'net/http'
require 'rexml/document'

include REXML

options = {}
OptionParser.new do |opts|
options[:host] = "localhost"
options[:port] = "8080"

opts.banner = "Usage: tomcat-stat [options]"

opts.on("-h HOST", "--host HOST", "Host to connect to") { |host| options[:host] = host }
opts.on("-p PORT", "--port PORT", "Port to connect to") { |port| options[:port] = port }

opts.separator " "
opts.separator "Choose one:"
opts.on("--memory-free", "Return free memory") { |free| options[:memoryfree] = free }
opts.on("--memory-total", "Return total memory") { |total| options[:memorytotal] = total }
opts.on("--memory-max", "Return max memory") { |max| options[:memorymax] = max }

opts.on("--thread-max", "Return max threads") { |max| options[:threadmax] = max }
opts.on("--thread-count", "Return count threads") { |count| options[:threadcount] = count }
opts.on("--thread-busy", "Return busy threads") { |busy| options[:threadbusy] = busy }

opts.on("--request-mtime", "Return max request time") { |mtime| options[:requestmtime] = mtime }
opts.on("--request-ptime", "Return request processing time") { |ptime| options[:requestptime] = ptime }
opts.on("--request-count", "Return request count") { |count| options[:requestcount] = count }
opts.on("--request-error", "Return error count") { |error| options[:requesterror] = error }
opts.on("--request-received", "Return bytes received") { |received| options[:requestreceived] = received }
opts.on("--request-sent", "Return bytes sent") { |sent| options[:requestsent] = sent }
end.parse!
# build a url from options
url = "http://#{options[:host]}:#{options[:port]}/status?XML=true"

# retrieve xml document
tomcat_xml = Net::HTTP.get_response(URI.parse(url)).body
doc = REXML::Document.new(tomcat_xml)

puts doc.elements["//jvm/memory"].attributes["total"] if options[:memorytotal]
puts doc.elements["//jvm/memory"].attributes["free"] if options[:memoryfree]
puts doc.elements["//jvm/memory"].attributes["max"] if options[:memorymax]

puts doc.elements["//connector[@name='http-0.0.0.0-#{options[:port]}']"].elements["threadInfo"].attributes["maxThreads"] if options[:threadmax]
puts doc.elements["//connector[@name='http-0.0.0.0-#{options[:port]}']"].elements["threadInfo"].attributes["currentThreadCount"] if options[:threadcount]
puts doc.elements["//connector[@name='http-0.0.0.0-#{options[:port]}']"].elements["threadInfo"].attributes["currentThreadsBusy"] if options[:threadbusy]

puts doc.elements["//connector[@name='http-0.0.0.0-#{options[:port]}']"].elements["requestInfo"].attributes["maxTime"] if options[:requestmtime]
puts doc.elements["//connector[@name='http-0.0.0.0-#{options[:port]}']"].elements["requestInfo"].attributes["processingTime"] if options[:requestptime]
puts doc.elements["//connector[@name='http-0.0.0.0-#{options[:port]}']"].elements["requestInfo"].attributes["requestCount"] if options[:requestcount]
puts doc.elements["//connector[@name='http-0.0.0.0-#{options[:port]}']"].elements["requestInfo"].attributes["errorCount"] if options[:requesterror]
puts doc.elements["//connector[@name='http-0.0.0.0-#{options[:port]}']"].elements["requestInfo"].attributes["bytesReceived"] if options[:requestreceived]
puts doc.elements["//connector[@name='http-0.0.0.0-#{options[:port]}']"].elements["requestInfo"].attributes["bytesSent"] if options[:requestsent]

configuration management with chef announced

Chef has been announced. Listen to this podcast at Cloud Cafe. There’s no way around comparing puppet and chef. Sure, they’re both configuration management tools. It’s simplest to put it this way:

We’re replacing puppet with chef.

And why? A little while ago I wrote about problems I’ve been having scaling puppet. Off the top of my head, the biggest issues for me working with puppet have been:

  1. Dependencies graphs
  2. Limited capabilities of the language (DSL)
  3. Templates are evaluated on the server

Dependency Graphs

There’s a talk about vertically scaling puppet, but not a lot of it about horizontally scaling. I tend to run everything under puppet. People argue that it’s too much work to put single servers in puppet, and you should only use it for systems you intend to clone. I disagree. Puppet recipe’s are self documenting. The same people who don’t want to take the time to write puppet recipes for the single services are the people you have to beat with a sucker rod to get to document anything. Sometimes if I don’t have the time to put into fully testing a puppet recipe for a new machine, I’ll at least write the recipe as I’m working to server both as documentation and a starting point for if/when I come back to it.

The point is that as I scale out in this fashion, more often puppet will fail with a dependency problem on one run, and be fine on the next.  I asked Luke about this at a BoF at OSCON 2008, and he basically told me that he really only focuses on the problems his paid customers have and was anxious to leave and get a beer. That’s fine, I understand it, but since it does nothing to fix my problem it drove me away from the puppet community.

While in theory having puppet do all this work to resolve depency issues seems fine, it is more complexity and trouble than it is worth. As a systems administrator I know what the dependancies are. As you build a system you simply write your recipe in the same order as the steps you’re taking.

Chef takes this idea and runs with it. Recipes are parsed top to bottom. If a package needs to be installed before a service is started, you simply put the package in the recipe first. This not only makes a lot of sense, it makes depencies in a complex recipe visually understandable. With puppet you can end up with spaghetti code remincisent of “goto”, jumping around a number of recipes in an order that’s difficult to understand.

Language

Before the recent 0.24.6, you could not even do:

if $ram > 1024 {
    $maxclient = 500
}

The support for conditionals was rudimentary. I run into a lot of languages and the biggest problem I have is remembering how to do the same thing in each language. The puppet language does not do what a lot of lot of other languages do. I didn’t need another language to learn, let alone one written from scratch. It was just silly doing something like:

  # iclassify script addes vmware-guest tag based on facter facts
  $is_vmware = tagged('vmware-guest')
  if $is_vmware {
    include vmware
  }

Chef uses ruby for it’s recipes. This makes the above stupidly simple with something like:

include_recipe "vmware" if node[:manufacturer] =~ /VMware/

Templates
Puppet evaluates recipes and templates on the server. I ended up with this block of code once when I need to specify the client node’s IP Address in a configuration file:

require '/srv/icagent/lib/iclassify'
ic = IClassify::Client.new("https://iclassify", iclassify_user, iclassify_password)
query = [ "hostname:", hostname].to_s
mip = nil
nodes = ic.search(query)
nodes.each do |node|
  # node.attribs is an array of hashes. keys is 'name' value is 'values'
  node.attribs.each do |attrib|
    if attrib[:name].match(/ipaddress/)
      ip = attrib[:values].to_s
      if ip.match(/10.0.0./)
        mip = ip
        break
      elsif ip.match(/10.0.1./)
        mip = ip
        break
      end
    end
  end
end

This was so much work. Of course with chef you can easily get this information in the recipe because it’s parsed on the node, let alone the ease of doing it in the template if that’s more appropriate. Since the template’s parsed on the client, you grab the information out of variables or directly from the system.

As time goes on I’ll surely write more about using chef. We’re using it production now, and happy with it. In the interim, come to #chef on freenode if you have any questions.

Creating user vms with libvirt and kvm

I used virt-manager to create a local vm to build a debian guest. I usually use vm-builder, but it doesn’t support debian at this time.

I was a little confused at first why I could see the vm in virt-manager, but the xml file wasn’t in /etc/libvirt/qemu nor could I see it in virsh.

virt-manager appears to have a connection open by default called “localhost (User)”, as opposed to “localhost (System)” which you need to open a new connection to localhost from the menu to access. The latter is what you connect to when you run virsh. To make the former connection run ‘virsh –connect qemu:///session’, as opposed to ‘virsh –connect qemu:///system’ which is the default.

System vm’s are stored in ‘/etc/libvirt/qemu’, user vm’s are stored in ‘~/.libvirt/qemu’.

No valid PXE rom found for network device

Using virt-manager (libvirt) to build a KVM debian etch guest on ubuntu intrepid via pxe boot produced the error: “No valid PXE rom found for network device”.

Reading LP Bug #193531 showed the need to install the ‘kvm-pxe’ package (sudo apt-get install kvm-pxe).

Then I got “Out of space while reading console startup output”, which I haven’t solved and I’m probably giving up on backporting to try to solve due to a number of hurdles.

Using an ISO image as an apt repository

I picked up an MSI Wind desktop recently for $140 + $20 or so for a 2GB SO-DIMM from Frys and put it together with a SATA hard drive I had kicking around. I didn’t want to spend the money on a SATA CDROM that I would use just for the install, or bother pulling the one out of my mothers identical PC I just built for her. I did a PXE network install of Ubuntu 8.10, not choosing to install anything over base. Then I installed openssh-server and removed all input devices.

To install the ubuntu-desktop virtual package, I wanted to use apt-cdrom to allow using the iso image as a repository rather than download 500MB worth of packages.

sudo mv ubuntu-8.10-desktop-i386.iso /media
sudo mkdir /media/iso
#add  "/media/ubuntu-8.10-desktop-i386.iso /media/iso iso9660 user,loop 0 0" to /etc/fstab
sudo mount -a
sudo apt-cdrom add -d /media/iso -m

This turned out to be completely useless though, as there are only a few debs on the Ubuntu LiveCD. The LiveCD uses Ubiquity to install which just copies the CD to the new partition. I almost always use the alternative installer via PXE booting, so I never noticed this before.

gem fetech errors with Errno::ENOENT

This one isn’t too hard to figure out, but annoying and frustrating. On rubygems <= 1.3 (Including Ubuntu Intrepid’s 1.3.0~RC1really1.2.0):

$ gem fetch erubis
ERROR: While executing gem ... (Errno::ENOENT)
No such file or directory - /var/lib/gems/1.8/cache/erubis-2.6.2.gem

This is a silly permissions problem. Instead, ’sudo gem fetch erubis’.

See Rubygems bug #21134.