vmware timekeeping part 3

earlier posts here and here.

A review:
1) We removed ntp from the linux guests and left it running on the vmware hosts.
2) We installed open-vm-tools on the guest and live enabled timesync using vmware-guestd

Notes revealed we were gaining about 40s a day.

3) set clock=pit (use clocksource=pit now) in the grub config as a kernel option and restarted a guest

That looks like about 40s over three weeks.

4) today I noticed a lot of “/dev/vmmon[3685]: host clock rate change request 500 -> 998” messages on the vmware hosts (linux) and I set up the recommendations here which is ‘host.cpukHz = cpuspeedinkhz’, ‘host.noTSC = TRUE’, and ‘ptsc.noTSC = TRUE’ to work around possible speed step issues.

I accidentally used khz = mhz * 100 instead of khz = mhz * 1000 which made the time get way off when I stopped and then started the vm I testing was on. This was interesting though because I was afraid I’d have to stop vmware-server, not just an individual vmware-vmx process to get it to re-read /etc/vmware/config.

Looping ntpdate shows about 8/10th of a second gain over 20 minutes. Still more gain than I’d like to see. Will watch the graph and then try again in a week or two.

Avocent KVMoIP LDAP Configuration

The manual is way too confusing about this:

It works like this:
LDAP Overview:
LDAPS works fine with Server 2003 R2 AD, and is preferred (leave it on port 636). If you’re using fqdn’s, make sure you have DNS servers set in the network section.

On the Search page:
‘Search DN/Password’ is the Bind DN/Password.
‘Search Base’ is similarly the ‘Base DN’.
‘UID Mask’ should be ‘attribute=%1’, replace attribute with the name of the attribute storing the username, so generally with AD this is ‘sAMAccountName=%1’

Query page:
If ‘Group Container Mask’ = ‘ou=%1’ and Group Container = ‘KVM’ then we’re looking for ou=KVM in the above configured BaseDN. This is where we’ll set everything up. I recommend staying at the top of the tree for simplicity.

Target mask should be ‘cn=%1’ because we’re looking for objects and * Access Control Attribute will be ‘info’ because that corressponds to ‘notes’ in the ADUC UI.

In this OU container:

1) Create a computer object with the same name as the KVM name under ‘Appliance -> Overview’. I renamed this to KVM01. I had to do this on a DC as MMC was crashing on my terminal server when creating a computer object, probably unrelated.

2) Now create a group, call it whatever. In the notes section put ‘KVM Appliance Admin’. This is how we define what you can do. Add the KVM computer object to this group, and any users (or groups, ie domain admins) you want.

3) These people will have full access to the kvm and all objects. It sounds like adding access into individual objects requires being in a group with info of ‘KVM User’ and the computer objects for the actual server names in the group as well. Bah.

Vista trust relationship login failures

A local Vista computer started having intermittent login failures when a domain user tried to log in about a trust problem with the account database.

Since Vista disables the local administrator account even though it had a password. I used Nordahl’s ntpasswd linux boot cd to enable the local administrator account (if I hadn’t known the password I could have changed it as well). Of course the CD requires access to the syskey as the SAM is encrypted, but it always finds it automatically since nobody puts the syskey on floppy.

Then I logged in and removed the computer to the domain, changed it’s name, and rejoined it and things were fine.

Domain profiles were kept intact by the way.

FHS Compliance for NFS mounts

Where should one mount shared NFS data?

FHS 2.3 has no advice. All the NFS talk is about how you might NFS mount /usr and the likes.

Options: /opt, /mnt, /srv. mnt is the old school way, but FHS uses the key word ‘temporary’ which makes sense these days, even though we’ve started using /media for most things temporary. opt? I stay away from opt since I touched oracle.

/srv : Data for services provided by this system
Sounds good to me. nfs mounts will go in /srv, since it’s all data for services provided by this system.

further vmware timekeeping

I’ve talked about this in other posts. I’ve been automating vmware guest creation and configuration. Time has been one of the bigger hassles. The best reading about it is here.

I automated vmware tools install using the open-vm-tools deb (backport to etch). Then used puppet to run vmware-cmd to enable timesync on all of the guests. See here.

This keeps time from falling behind, but we set up some munin graphs and saw time was gaining about 40s a day. so I just wrote another puppet exec to add ‘clock=pit’ to the end of the kernel lines. Newer kernels use time algorithms that try to correct time for lost cycles. Lost cycles are common in virtualized environments. I’ll note how this works out after a week or so.

Anti-spam gateway design notes

Once again I’m rebuilding an anti-spam gateway. This time I’m puppetizing it as I go, so I wanted to take some time today to think about the design.

MTA (flame war #1)
About four years ago I built a personal mail server and used qmail. Before that I don’t remember what I used, probably sendmail. Qmail’s nice because it’s small and well designed, but the author had some RFC fixation and support for things like TLS had to be patched in. This qmail install was on gentoo though, and the emerge auto-patched about over 20 features in as it built it. I believe the idea was that these features wouldn’t make it into the official source, so they wouldn’t be in a binary build either. Pain in the ass really.

I do have memories of using sendmail. Actually, horrible dreams of youthful innocence being torn to shreds by m4. We’ll stay away from the beast.

A couple years ago I built an anti-spam gateway using postfix and it was easy enough.

Queueing
In the past I’ve used amavisd with postfix to run the clamav and spamassassin checks. This has worked by taking incoming smtp messages to postfix and routing them to amavisd on another locally bound port, which scans them and then redelivers them to another locally bound port. One neat thing about this design is you could have amavis running on seperate boxes, with one doing spam, one doing antivirus, and just route between them all, with the final one doing the delivery to the internal mail servers.

qmail had qmail-scanner-queue which tied all of this together in a way that looks similar to MailScanner, that picks up the messages in one folder and when its done leaves them somewhere else.

postfix uses content_filter to tie into antispam otherwise. The trouble with this is that it’s already accepted a message by the time it’s gotten all of this far.

When you decide something is spam, you can do a couple things. If you’re still in the SMTP phase, you can reject it before you accept it. I prefer this. Otherwise you’ve accepted it and you can delete it, return it, tag it (modify the subject), or grey list it somewhere. Option #1 is bad because it may not have been spam. #2 is bad because you have to generate a email message back to the sender address saying “We think this is spam” and if it was spam, whoever gets it is certainly not the person that sent it. This is better than #1 though because you get less support calls for disappearing email. #3 and #4 are annoying because you still have to look at the mail.

In the past I’ve used RBLs in postfix to reject mail, which gets a lot of spam, then tagging in spamassassin so it’ll filter into users JunkMail folders so at least they only look at it if they’re looking for something. This is probably acceptable still. Sometimes I’ll delete mail based on spamassassin score if it’s really high, because if someone sends you a legitimate email that gets a score that high, you probably don’t want to talk to them anyway.

Ubuntu 7.10 GRUB Error 21

A recent install of Ubuntu Gutsy 7.10 on a slave IDE disk (cable select) with an existing master IDE disk with XP Pro on it rebooted and got a GRUB Error 21. I was about to boot off the network again to go into rescue mode and look at the grub configs, but when I saw the boot menu I wondered what the boot order since I just added the disk. When I got into BIOS I saw that Primary Slave was OFF. Ubuntu had seen the disk even though the BIOS had it disabled, and since GRUB talks to the BIOS it couldn’t find the disk. Enabling the disk by setting it to auto in the Dell BIOS fixed GRUB.

dimdim on centos (fail)


I managed to track down a copy of centos 4.5 i386 and made a VM to try to get dimdim running. I had all sorts of fun earlier trying to get it running on the much preferred debian. I was talking to a friend of mine about this attempt and he noted that when someone requests him to install some OSS software, one of his major filters is “does it install on debian?”. If it doesn’t have a deb, it fails the bar. This is a pretty good bar. There are exceptions for things like java before they relicensed it. Perhaps, “does it install on ubuntu?” is a better question.

That the “installer” for dimdim installs a pile of rpms from dimdim’s website that have nothing to do with the product (glibc? wtf?) is a great example of why we don’t use rpm based linux distributions.

1) People who don’t understand the differences between rpm/deb distros tend to not respect why packaging is essential, and do stupid shit like put system library rpms in their installer.

2) RPMs suck, and therefore RPM based distros suck. I’m not going to get into a flame war over this, but simply try to take your major RPM distro and upgrade it from one major version to the next. Then try to convince me how the steps you took are not cruel and unusual punishment. (“apt-get update && apt-get upgrade && apt-get dist-upgrade” Wow.)

Anyways, I ran the installer per the PDF documentation that reads like it was made by the marketing department. It managed to make it through after doing a bunch of kooky stuff to remind me that it is just a shell script, not a packaging system. (Note that if you run it twice, it’ll fail because lighttpd is already installed. Maybe this bug that was supposedly fixed last year?).

Once you run the startup script, if you connect to the host you’ll get something like this:


404 Not Found
The path '/' was not found.

Traceback (most recent call last):
File "/usr/lib/python2.3/site-packages/cherrypy/_cprequest.py", line 551, in respond
cherrypy.response.body = self.handler()
File "/usr/lib/python2.3/site-packages/cherrypy/_cperror.py", line 198, in __call__
raise self
NotFound: (404, "The path '/' was not found.")

You need to go to http://host/dimdim/, the trailing slash is essential.

This time around the site was less responsive. Sometimes when you start a meeting and you install the plugins the first time, the connect to the meeting fails. Attempts to start a new meeting fail with “Exceeded server limit of meetings”. I thought this was a bug, which I worked around by restarting the server. But this time I restarted the server, joined a meeting, then tried to create another one and got this message. Let’s make this clear since dimdim doesn’t.

The Open Source Edition of Dimdim is intentionally crippled.

You can only have one active meeting at a time. While their editions page mentions that ‘dimdim pro’, a SaaS product, only allows one meeting at a time, the OSS column merely says ‘Free’ in that box. This is really perturbing. It wouldn’t be so bad if they were up front about it. There’s a thread here and here on the official sourceforge forums with no official responses. Someone there talks of having reverse engineered the limitation, but it’s a “email me” type talk, not an open discussion.

Grepping for ‘maxConcurrentConferences’ in the dimdim install shows it set to 50 in the dimdim.properties file. The forum post refers to a comment of:

## NOTE : In this Open Source Edition only 1 Meeting at a time is allowed. If you need a Dimdim Meeting Server with higher capabilities then please
## contact sales@dimdim.com.

However my dimdim.properties lacks any such note. Perhaps in the source code rather than the slightly older centos installer it says this. This value is set to 50 by default in my config files, I recall seeing some mention somewhere that this limit was in a jar file.

I later found a thread by a user complaining that only five or six users could get into a meeting. This response appears to be by a dimdim employee and states:

Open Source SF edition of dimdim is a personal edition of the meeting server and is meant to cater to single meeting. We have currently placed the restrction to upto 5 participants. For larger meetings, the resources required increase significantly and require dedicated servers.
Please use the hosted dimdim edition – for hosting larger meetings. We also provide an enterprise server build for on-premise installations.

Someone replies with the same sort of arguments that seem obvious to any OSS fan, and links to a webarchive copy of dimdim’s website where they say:

Dimdim makes extensive usage of open source components and products and hopes that someday Dimdim itself will be useful to others in the way others have been useful to it. Big thanks to the communities and individuals of all the open source projects used in Dimdim.

I assume at some point the company had OSS fans, and management has pushed it away from OSS.

Sigh. Dimdim is a very pretty waste of time.

Programming an old EM01 Websensor


I have an old EM01b websensor made by eesensors. They’re an awesome product, basically a small webserver that senses humidity, temperature and illumination. We’ve been using an old one as a nagios monitor for the server room temperature. The newer model (both are called an em01b). The one pictured is the older model. I recently picked up the newer model as it comes with one of three options: contact closure, thermistor (additional temperature monitor), or voltage monitoring (great for UPS batteries). I got one with contact closure and tied it into the Common Alarm circuit on our HVAC unit because one of them recently shut down due to a high water level (drain was clogged) and we didn’t know until nagios threw a temperature warning. Now nagios can poll for the contact closure and will know of an HVAC alarm immediately.

The EM01b isn’t cheap, but I’m sure it’s cheaper than a separate monitoring unit for a UPS or HVAC unit, and since many people use nagios, it ties in pretty well. I also wrote a ruby munin module for it recently, which I’ll post later when I get permission from work to keep the copyright on it and GPL it. This is awesome for temperature trending so you can see how all those servers you’ve added over the last six months have affected environmentals in the data center.

Once I had the new EM01b set up, which you program via the network interface using HTTP requests, I went about reconfiguring the old one. The old ones are a little tougher as there is no information about them on the web. I had to email eesensors and I was sent this link to the old cdrom. Maybe nobody else will have this problem, but since I hadn’t bought the old em01b, I had no idea how to configure it. It comes with another module, the es00r, which an esbus to serial interface. You need to connect this to the 6pin esbus interface on the em01b using a 6pin phone cable. Power up the em01b with the es00r connected and run the Com2ex*.exe file in the EM01_Configuration folder the zip file. You need to connect the es00r to the computer with a regular M-F RS323 cable. Select your COM port and hit connect. If it doesn’t say “communications established” on the bottom of the program, it’s likely you don’t have a true RS232 cable. I had to try a couple to find one that would work.

Once you get an established link, restart the em01b. Re-establish the link, then click read/verify to ensure the communications are good. Enter the configuration you want in, and click transfer to send it to the em01b via the es00r. When it’s complete, restart the em01b, reconnect, and hit read/verify to make sure it got there ok.

update:I was getting the same values from the early em01b every query and emailed eesensors about leaving the es00r connected and they said:

The Es00r cannot be plugged in – it may interfere with the Websensor data which could explain the reason you are seeing the same values. In addition, the 6 random digits should be appended to the back of the “em” command (ie. “em123456”) on earlier models.

I disconnected the es00r and power cycled the em01 and I’m getting different readings now over time. I’m still querying index.htm?em though, as the v4.2 manual says this is okay and it seems to work for me:

Compatibility with the earlier models of Websensor has been maintained. Any version of the Websensor will always return temperature, relative humidity and illumination data by sending: http://192.168.254.102/index.html?em

Widemile takes over world by way of multivariate testing

At this point, it’s official, Widemile is taking over the world. What?! You want proof?

I’d like to pretend sometimes I don’t know a whole lot about business, but in actuality there’s a bunch of experience kicking around in my past and I tend to pick up more than the average bear. The difference is that I’ve never considered myself a business person, or that it was my primary responsibility by happenstance (other than while consulting). But I’ve done lots of supportish things, lots of consulting, have had to manage people and the likes. More than I’m willing to admit to even myself. Anyways, the point is I tend to only do business related things when I don’t feel like someone else competent is doing them. So I notice things, but keep them to myself.

I work at Widemile as a Systems Administrator. I don’t even know what that title means anymore. I think I’m the first full time, non development sysadmin there. I do a number of things, like helping users find the any key, remind them cdrom drives aren’t cup holders… actually, I spend most my time building the operations platform. So I do development, like puppet, ruby, shell scripts and the likes, but I’m not a developer. Or so I say. Endlessly. Fortunately those people with developer in their title know what they’re doing.

When I first started working at Widemile, I wasn’t particularly interested in the business plan. Linux systems engineering? Sounds good. What do you do? Web 2.0 Product? Check… I’ve heard it. People sometimes don’t realize how socialized a sysadmin gets, everybody wants to be your friend when something doesn’t work. (There is no friend checkbox in RT. People don’t make note of this.) So I hear a lot of chatter about our product and the results it brings in. I figured, “automated testing of a web page? Sounds good, makes sense, but it’s novel right? I mean, how much can it really make a difference?”.

The answer is tons. The term they use is Conversion Marketing. I’m sure this means something to SEO/SEM types, but what we really do is “Make more people buy your stuff.” Which, after all, is kind of the point of business. At this point, I have no reluctance to put forth that using Widemile’s product will make more people buy your stuff. It works kind of like this:

You sell stuff to farmers. You pay some carebear 1000 gold pieces to hang around the farmers yakking about how great your stuff is. On average, you make 10,000 gold pieces. Now what if you had some ‘multivariate testing‘ pixie dust to sprinkle on that carebear such that there was less yakking, and more of what people wanted? Wait, you ask, how we know what people want? Magic! (Math…) You give us 250g, and we find you a better carebear with Math Dust for 750g and now you’re making 20,000 gold.

Jokes aside (it’s hard, really). All the talk I hear is of our customers actually having huge success. I’m not in sales, I can’t be quoting things, but from my techie point of view with secret business experience, it’s magic “something for nothing” sort of success. When I’ve managed to convey to people what Widemile does, a couple educated few have said, “Oh, like Google Optimizer.” No, actually. Congrats on knowing someone in the optimization business, even if it is Google. It’s basically like this, google has a thinger. They get cool thingers, like take Dodgeball. I love dodgeball. Second to Google Search, it’s my most used google product, even more than google maps. How much Dodgeball changed… in years? Little, it’s no secret. Some things Google makes are cool, don’t get me wrong, but there are lots of reasons Google has products, and they’re not always to be innovative.

Widemile is a Landing Page Optimization (LPO) pioneer. They have the secret sauce (ooh, see what I did there? I linked to an article promoting operations, slam!). Seriously though, people are being sold on LPO that’s called LPO but it doesn’t compare to what we do. There is secret sauce out here, real stuff. If you care to know the ingredients, I encourage you to go read every character on Billy’s blog. I don’t have a lot of free time, and LPO isn’t a package management system that generates me more free time, so I’ll leave it to you business types to figure this stuff out. But it’s neat all the same.

The reality is, from a personal point of view: somehow less and less companies seem to get what I want on the web. I recall hearing talk over the years about what kind of time opportunity you had to capture someones interest in traditional marketing. It was pretty short, I forget what it was exactly. I’ll tell you this though. If I don’t have an established relationship with a company (which, if I did, it’s not really marketing when I go to their site, since I’m going there anyways), how long will I fudge around trying to find where to click next to get what I want? A very short amount of time. What do I want from you, web? Simplicity with endless bounds. I want the tubes to be lego. By itself, it’s just a little piece of plastic, but with a handful, you’ve got a Space Elevator. Alright, maybe not the best example, but that’s the point.

Today I was trying to find support information for a Netgear ReadyNAS. The web has been defeated in the world of driver searching (search for a dell driver if you haven’t experienced this), training me to start at a vendors website and drill down, rather than just search. Netgear’s web site is terrible. What do you get if you just search for ReadyNAS support? Netgear, and look, a community oriented site! Communities have it figured out because they’re usually filled with information created by people who were once trying to figure things out. Black boxes are alright if a) we buy them to do something and b) they do it.

Try going to newegg and finding RJ45 crimp connectors without searching. Then try with searching. It’s tough. Most websites are tough to get what you want. This is why tags are getting popular on web 2.0 sites like flickr and delicious. People choose tags that are meaningful to them because we want to be able to find what we want. How do you know what other people are looking for? If you don’t know, it only makes sense to test to me. This is where split testing sounds so silly to me, of all the possibilities you’re trying two. That you probably thought up yourself. Isn’t this supposed to be a test to see what other people want? There really is Magic in Widemile’s platform, and I’m serious when I say there’s spiffy math behind it a secret sauce design, but software that finds out what variation is most successful? It’s easy to understand how awesome that is. If you’re spending any significant amount of money on online advertising and not doing LPO, you’re throwing away money.

debugging netgear readynas (was infarant)

I’ve talked in the past about how cool it is to have a root shell on your NAS. I’d like to take a moment to second that.

Some software that copies web logs off one of our readynas 1100s wasn’t working today. I got looking and it used a domain account. I realized pretty quickly it had stopped working when we had upgraded the NAS devices to the new domain, but we don’t use this one setup often enough to have noticed it had stopped running.

I logged into the readynas and used wbinfo to verify that winbind was working right. While poking around the log files I saw and error about proftpd and PAM. I’m lucky to have to of these readynas boxes, so I verified that the pam configs hadn’t changed compared to the production system. I then checked the proftpd binary and it had changed size. Raidiator appears to be debian based, you can see woody packages in a ‘dpkg -l’. Interestingly ‘dpkg -s proftpd’ shows version ‘1.3.0-9.netgear6’ on both machines, although it had definitely changed. I copied the proftpd binary from the production nas to the backup nas and restarted proftpd and authentication started working again.

5% chance it was a fluke, but I think it’s a real bug that slipped past QA and if not for being open source based I’d be sitting in a support queue rather than having the problem fixed and blogging about it already. Forum post here too.

They’ve been adding lots of cool features to the ReadyNAS line, like a built in bittorrent client and some neat photo support. It already supports CIFS and things like rsync, making it pretty accessible and functional out of the box. Besides what looks like decent support for third party development. That there’s a real usable website separate from the netgear main site points to there being some decent smart people behind the project, and possibly at Netgear for letting their acquisition do some things the right way.

Despite the RND4000 (4 Disk desktop model without disks) being about $800, I want one just to hack on raidiator. Too bad it’s not a fully open source distro.

security questions, offline banking?

It’s an odd thing to say, but I’ve considering -not- paying bills, banking, etc online anymore. Why? Security Questions. My bank account just made me add some, and I’ve been struggling with Sallie Mae for some time, having had to reset my account twice since they’ve added security questions and not before. I suppose it’s not as bad as how Key Bank liked to ask my debit card and pin for security verification. If there was anything to not entire into a web site, I think a debit card pin would be near the top of the list.

Worst of the security questions is they require exact answers. Gone are the days of “what is your mothers maiden name”, instead we have “What is the street your favorite residence is on?”. How the hell do I remember if it’s “26th” or “26th Ave” or just “26” or some other combination? Name of my first teacher? Which one?

The solution? I use a password as the answer to all security questions now. Where’s the version of Dell IdeaStorm that applies to the web on the whole? How long is it going to take until the increase in support calls to reset accounts makes web sites realize this is the worst idea I’ve seen to date? Meh.

An Exchange 2007 server on which an address list service is active cannot be found

While modifying the mailbox quotes on a user mailbox on exchange 2007 I got the error “An Exchange 2007 server on which an address list service is active cannot be found”. Lots of chatter here but I did look at see that the ‘Microsoft Exchange System Attendant’ service wasn’t running, although it was set to automatic. Start -> Run -> services.msc and started it and replayed the actions and the changes worked afterwards. Not sure why it wasn’t running.

dimdim on debian etch

update2: I couldn’t get it working right on centos either, although I spent less time on it. I did verify that the OSS edition of dimdim is crippled. Do not use dimdim.

update:this install managed to get the conference server going, possibly the streaming server, but not the media server. there’s good information in it though.

‘Opensource’. Heh. I think a decent community makes things much more open source than a license does, but semantics…

-worst build system ever- What’s the point of packaging tar, sed, python with your distribution? a) you’re using rpms and don’t know better b) you only want to ride the OSS wave, but you don’t actually want to be part of the OSS community?

We’ll install a ton of shit via apt rather than touch those dirty dirty rpms that come with the offline installer.

download the fancy “centos” offline installer.

unzip *zip
chmod 755 *run
mkdir dimdim
./*run --tar -xvf -Cdimdim
# install lots of crap. who knows?
apt-get install sun-java5-jre openoffice.org libaio1
cd /usr/local ; tar -xvzf ~/dimdim/dimdimrepository/dimdim.tar.gz

Make sure nothing is running on port 80 (netstat -lnp), stop it if it is.

Go to /user/local/dimdim
Read Linux_Readme_1.5.0.txt

vi server.xml, replace DIMDIM_PORT_NUMBER with 80, edit the servernames at the top
vi wrapper.conf, replace wrapper.java.command= with /etc/alternativa/java
# the above is a link into the above installed jvm by way of the alternatives system
./dimdim start ; tail -f wrapper.log

Seems to.. do something?

edit?: ConferenceServer/apache-tomcat-5.5.17/webapps/dimdim/WEB-INF/classes/resources/streaming.properties

The Conference Server appears to be in ConferenceServer/, and is the main web interface that you want running on port 80. the dimdim.properties and server.xml in /usr/local/dimdim are the most important files. ./dimdim start will start it, then you can watch wrapper.log

The Streaming Server is in StreamingServerCluster/server1. There’s Information about duplicating it in Linux_Readme_1.5.0.txt. StreamingServerCluster/server1/conf/red5.properties contains it’s port configurations, this is what runs on 1935/30001. I don’t really know what the http.port is supposed to point to.

The Media Server… Who knows? I think this is what dimdim.dmsServerAddress in dimdim.properties is supposed to point to. Before I set this, I could connect to dimdim but portions didn’t work. After I set this, the site would lock up just after the browser checks and future attempts to log in reported that the server was full of meetings or something like that.

automating vmware guest deployment with capistrano

This will get some more work, but I didn’t find much out there so this is a good starting point for someone

It appears straightforward enough, but feel free to ask any questions. You’ll need the rest of your operations platform pre-built, such as existing vmware hosts, pxe booting a debian install, etc.

I don’t think blogger is killing anything important. Some day I’ll setup an actual repository instead of using blogger for this crap. On the new server, next vacation. 🙂


# Capistrano recipe to build a vmware guest
# Bryan McLellan -- bryanm@widemile.com

require 'erb'

logger.info("Vmware guest creation script logs in as root")
set(:user, "root")

vmxtemplate = %q{
#!/usr/bin/vmware
config.version = "8"
virtualHW.version = "4"
scsi0.present = "TRUE"
scsi0.virtualDev = "<%=disktype %>"
scsi0:0.present = "TRUE"
scsi0:0.redo = ""
priority.grabbed = "normal"
priority.ungrabbed = "normal"
guestOS = "other26xlinux-64"
ide1:0.startConnected = "FALSE"
floppy0.startConnected = "FALSE"

displayName = "<%=fqdn %>"
scsi0:0.fileName = "<%=fqdn %>.vmdk"
memsize = "<%=memory %>"

Ethernet0.present = "TRUE"
Ethernet0.virtualDev = "e1000"
ethernet0.addressType = "generated"
ethernet0.generatedAddressOffset = "0"
Ethernet0.connectionType = "custom"
Ethernet0.vnet = "<%=eth0 %>"

Ethernet1.present = "TRUE"
Ethernet1.virtualDev = "e1000"
ethernet1.addressType = "generated"
ethernet1.generatedAddressOffset = "10"
Ethernet1.connectionType = "custom"
Ethernet1.vnet = "<%=eth1 %>"

tools.syncTime = "TRUE"
}

pxetemplate = %q{
DEFAULT etch_i386_install_auto
TIMEOUT 100

LABEL etch_i386_install_auto
kernel debian/etch/i386/linux
append vga=normal initrd=debian/etch/i386/initrd.gz preseed/url=http://debian.example.org/preseed/autoserver-etch.cfg debian-installer/locale=en_US console-keymaps-at/keymap=us hostname=<%=hostname %> domain=<%=domain %> interface=eth0 --
}

def lastdhcpip(ourmac)
curLeaseIp = nil
curLeaseMac = nil
lastip = nil

f = File.open("/var/lib/dhcp/dhcpd.leases")
f.each do |line|
case line
when /lease (.*) \{/
curLeaseIp = $1
when /hardware ethernet (.*);/
curLeaseMac = $1
if ourmac == curLeaseMac
lastip = curLeaseIp
end
end
end

f.close
return lastip
end

set(:disktype, "lsilogic")
set(:disksize, "3Gb")
set(:memory, "768")

#set(:hostname, fqdn.match(/^[0-9A-Za-z-]*/))
#puts("hostname: #{hostname}")

task :build, :roles => :host do
set(:host) do
Capistrano::CLI.ui.ask "vmware hostname: "
end unless exists?(:host)

role :host, host

set(:hostname) do
Capistrano::CLI.ui.ask "guest hostname (vm16-dev-ots04): "
end unless exists?(:hostname)

set(:network) do
Capistrano::CLI.ui.ask "guest network (prod/corp/test): "
end unless exists?(:network)

case network
when /prod/
set(:fqdn, "#{hostname}.prod.example.org")
set(:domain, "prod.example.org")
set(:eth0, "/dev/vmnet4")
set(:eth1, "/dev/vmnet11")
when /corp/
set(:fqdn, "#{hostname}.corp.example.org")
set(:domain, "corp.example.org")
set(:eth0, "/dev/vmnet0")
set(:eth1, "/dev/vmnet0")
when /test/
set(:fqdn, "#{hostname}.test.example.org")
set(:domain, "test.example.org")
set(:eth0, "/dev/vmnet2")
set(:eth1, "/dev/vmnet14")
end
puts("fqdn: #{fqdn}")

result = ERB.new(vmxtemplate).result(binding)

run("mkdir /srv/vmware/#{fqdn}")
logger.info("Building vmx configuration file")
put(result, "/srv/vmware/#{fqdn}/#{fqdn}.vmx", :mode => 0755)

logger.info("Creating virtual disk")
run("/usr/bin/vmware-vdiskmanager -c -a #{disktype} -s #{disksize} -t 2 /srv/vmware/#{fqdn}/#{fqdn}.vmdk")

# start and stop vm to generate uuid and MACs
logger.info("starting VM")
#run("/usr/bin/vmware-cmd -s unregister /srv/vmware/#{fqdn}/#{fqdn}.vmx")
run("/usr/bin/vmware-cmd -s register /srv/vmware/#{fqdn}/#{fqdn}.vmx")
run("/usr/bin/vmware-cmd /srv/vmware/#{fqdn}/#{fqdn}.vmx start")
sleep 1
run("/usr/bin/vmware-cmd /srv/vmware/#{fqdn}/#{fqdn}.vmx stop hard")
macaddr0 = nil
run("cat /srv/vmware/#{fqdn}/#{fqdn}.vmx") do |ch, stream, data|
case data
when /ethernet0.generatedAddress = "(.+)"/
macaddr0 = $1
end
end
macaddr0dash = macaddr0.gsub(/:/, "-");

pxeConfig = File.new("/srv/tftp/pxelinux.cfg/01-#{macaddr0dash}", "w", 0644)
result = ERB.new(pxetemplate).result(binding)
pxeConfig.puts(result)
pxeConfig.close

# Box gets a different ip sometimes on install than on first boot. annoying
run("/usr/bin/vmware-cmd /srv/vmware/#{fqdn}/#{fqdn}.vmx start")
logger.info("Sleeping 30 seconds for network startup")
sleep 30
ipaddr0 = lastdhcpip(macaddr0)
logger.info("host #{fqdn} is now building and we be available at #{ipaddr0}")
File.delete("/srv/tftp/pxelinux.cfg/01-#{macaddr0dash}")
end