Monthly Archives: December 2007

no sound on youtube (flash)

Doing the Christmas holiday fix peoples computers bit. This one had no sound on youtube, but sound worked fine elsewhere. Flash video was fine, just no sound both in IE and in Firefox. I finally found my answer here although there’s a lot of information here. The links were annoyingly hard to find.

In the registry the key HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Drivers32\wavemapper was set to ‘msacm32.dvr’ not ‘msacm32.drv’. The youtube video that was playing in the background immediately started producing sound when I fixed the entry. Who knows what mucked it up. There are lots of stories about the Flash 9 upgrading causing troubles, I blame Norton and AOL, since they’re both on this computer and their bloatedness causes me nightmares about support problems.

Unable to download NAT policy for ACE

On an ASA 5520 with Cisco Adaptive Security Appliance Software Version 8.0(3), I had set up an SSL VPN. It was working okay as it was set up for dhcp to an existing subnet, which I assume was proxy-arping for its clients. Attempts to access other subnets using split tunneling was producing “No translation group found for ..” errors. I tried every possible combination of nat exceptions, which believe me was a lot. ASDM sometimes would throw errors when adding them and when I’d do this by hand sometimes I’d get an “Unable to download NAT policy for ACE” error. Frustrated, I removed almost everything I did and rebooted the damn thing.

This fixed the problem. Seriously, the Cisco needed a reboot. It wasn’t a config thing, as I saved just before the reboot. Perturbing. Now I’m using an address pool, and I’ve gotta go back and setup routes and our disappointly static routed network to the new pool.

Troubleshooting Office Communications Server 2007

Microsoft definitely takes the cake on useless error messages (There is no cake). While setting up an edge server, I started with this (OCS Protocol Stack, Error 14501, Office Communications Server event log):

A significant number of invalid certificates have been provided by remote IP address 24.18.22.8 when attempting to establish an MTLS peer. There have been 10 such failures in the last 1 minutes.
Certificate Names associated with this peer were

The serial number of this certificate is
.
The issuer of this certificate is
The specific failure types and their counts are identified below.
Instance count – Failure Type
10 C3E93D6D

First off, it’s totally awesome that you don’t get this error right away, you have to fail a bunch of times before it shows up in the event log. I hate certificates, so I’ve been assuming it was that. I ran C3E93D6D through ‘lcserror’ which comes with the resource kit.

lcserror C3E93D6D
0xC3E93D6D -> (SIPPROXY_E_CONNECTION_EXTERNAL_INTERNET_ACCESS_DISABLED) (C:\Program Files (x86)\Microsoft Office Communications Server 2007\ResKit\RTCERes.dll)

Remote user client cannot connect because access is disabled.

Yup. More useful. My client says “cannot sign in because the server is temporarily unavailable. If the problem persists, contact your system administrator.” I thought about calling myself, but I’m an asshole and figured it wouldn’t do either of us any good. Another user on Vista x64 said he was getting a certificate error but I didn’t see it.

OMG! As an aside, if you’re doing this on x64, and you’re doing an edge server in a box not in the domain as I think the whole point is, you have to manage this through the Computer Management mmc snap-in. Long time windows fixers would be tempted to right click ‘My Computer’ and choose manage. Don’t. It doesn’t show up there. Run, or create a shortcut for ‘%windir%\system32\compmgmt.msc -32’. I don’t know how I found that out, deep in a document somewhere.

In here, I went to the ‘External User Access’ section of the status tab and saw “‘Remote User Access:’ was the big off red X. Right click, Properties on the server, Access Methods tab, Check ‘Allow remote user access to your network’. I guess. I mean, is this needed? Why else would I have an edge server? I guess for federation.

I feel like doing this caused a little more ssl traffic to pass, but it’s mostly binary data that means nothing to me. I still see the servers public ssl web cert being passed (via wireshark, remember to turn off chimney) though. A coworker said something about having to disable MTLS on the last version of LCS, but I couldn’t find it. It sort of made sense as the first error seems to be saying that the client is passing a bunk cert. Page 39 of the Edge Server Deployment Guide has a table for firewall configuration that lists:

Local Port: 5061 TCP (SIP/MTLS)
Local Port: 443 TCP (SIP/TLS)

I was thinking at first 443 was all the rage. But after looking in the above deployment guide at a diagram I started wiresharking for traffic on tcp pot 5061 as well and there’s a bunch of TLS crap going on there. Poking around on the ‘internal interface’ I see traffic is actually going to the ‘front end’ server. Running wireshark there, I see some similiar business and more fun errors, getting slightly more useful:

A significant number of invalid certificates have been provided by remote IP address ipaddressofedgeserver when attempting to establish an MTLS peer. There have been 10 such failures in the last 30 minutes.
Certificate Names associated with this peer were
actualinternalservername

The serial number of this certificate is
4e7f38cd00000000001e.
The issuer of this certificate is corp.widemile.com
The specific failure types and their counts are identified below.
Instance count – Failure Type
10 C3E93D6A

Run it through the damn tool again. Couldn’t this just be put in the event log?

lcserror C3E93D6A
0xC3E93D6A -> (SIPPROXY_E_ROUTING_UNKNOWN_SERVER) (C:\Program Files (x86)\Microsoft Office Communications Server 2007\ResKit\RTCERes.dll)
The FQDN in the peer’s certificate subject name is not a configured server.

I recently added the the edge server’s ip on the Host Authorization tab of the Front End Properties for the group (Under ‘Standard Edition Servers’ under the forst in the mmc snap in on the full server, not the edge server). Doing so added the name of the server under ‘access edge servers’ in ‘edge servers settings’ on the status tab for the forest in said mmc snap in. The dns didn’t match though. My dns is mess. The actual place where this comes from though is ‘Global Properties’ on the forest, ‘Edge Servers’ tab.

Followed my more failing, which you gotta keep repeating to get the darn error messages to show up the event log:

The server configuration validation mechanism detected some serious problems.

1 errors and 0 warnings were detected.

ERRORS:
Two server roles at FQDN [myedgeserversfqdn.test.org] have different ‘Throttle As Server’ options. First server has GUID {D00AC7FC-3E15-4CD6-A33A-0A823BE24EA1} and role ‘Authorized Host’ (option is not set). Second server has GUID {C8F1ADE5-9CB5-56AE-BDE3-D53B9D9B3BF9} and role ‘Edge Server’ (option is set).

WARNINGS:
No warnings

Cause: The configuration is invalid and the server might not behave as expected.
Resolution:
Review and correct the errors listed above, then restart the service. You also wish to review any warnings present.

So I go into the server group, right click, properties, front end properties, host authorization tab and set ‘throttle as server’.

This time I get a log in prompt on the click (Office Communicator 2007). Enter credentials, wait a while, and it works!

Older notes:

1) I had a bunch of problems getting past the certificate wizard because I thought importing certificates was enough, but this doesn’t actually assign them to the interfaces.

2) Speaking of interfaces, you need two. Well, two IPs. You can just add an IP Alias to the existing interface (TCP/IP Properties, advanced). Otherwise an early wizard fails with a useless error when it gets to the external interface page. See this forum thread for a little more info.

cisco anyconnect ssl vpn client random debugging

Cisco AnyConnect VPN (SSL VPN Software) was throwing “Failed to install and start agent. Installation has failed.” on me after logging in but before connecting (which was failing). I ended up randomly trying to go to the https site in firefox and it threw an error about the ssl certificate, some random number that was pretty meaningless.

This inspired me to go to ASDM, Configuration, Remote Access VPN, Advanced, SSL Settings and under certificates remove the certificate I was using for the interface. This problem went away! I have no idea what it didn’t like about the certificate. It would have been nice if it said something about this though. Seeing some errors about the client trying to access “https://ipaddr/CACHE/stc/1/VPNManifest.xml” is actually what got me playing around with the browser otherwise the only readable errors in the event log (the software makes it’s own section in the event viewer) is “WINDOWS_ERROR_CODE”. Nice. The debugging in ASDM wasn’t helping much either.

After that I was getting a bit further but failing and this time ASDM which was set to logging debugging was giving me a “TunnelGroup GroupPolicy User IP No address available for SVC connection” error. This turned out to be because I was using dhcp-server on the policy and pointing it at the broadcast address. This had worked for other cisco gear in the past. I watched the DORA process using wireshark on a server and saw that it was just repeating the discovers and offers over and over. When I changed dhcp-server to point at the unicast address for the server, the connection finally worked.

SCEP is not an enrollment proxy.

It’s taken me a couple weeks of staring at books and screens to figure this out. Please let me know if you think I’m bullshitting you.

Cisco Systems’ Simple Certificate Enrollment Protocol(SCEP) is for providing a protocol for Cisco’s routers, vpn concentrators (vpn 3000), access points (1130AG) and firewalls (asa 5500) to get the root certificate and get it’s own certificate “in band”. The “In band” means it does this over the network rather than you having to paste a BER x.509 certificate or some other “out of band” method.

It does not, as I was thinking for some reason, allow the client to obtain it’s own certificate from the CA by way of the device.

SCEP really isn’t needed for EAP, as the certificates are passed to the RADIUS server (IAS) which then does the required work (comparisons, validation, etc). PEAP (Protected EAP) supports using both certificates (and smart cards) and ‘secured’ passwords for authentication.

eap trials (notes) with a cisco 1130ag

It seems like while Cisco always supports lots of security protocols, it’s a horrible world of acronyms… and I work in a horrible world of acronyms.

I did get PEAP going, finally. Prerequisites: IAS installed, with the remote access policy set to grant and ‘EAP Method’ on the Authentication tab of the Profile allowing PEAP. Best install MSCEP too.

Note that getting certificates and password working for amount to: ‘EAP Methods:’ set only the PEAP type. Edit that. On the EAP Types here, Add both ‘Smart Card or other certificate’ and ‘Secured Password’. I have Smart Card or other certificate first. Don’t worry much about ‘Fast Reconnect‘ unless you’re using multiple APs, in which I hope you have a better idea of what’s going on than me.

First, set the time. I couldn’t find ntp supoprt so use the ‘clock set’ command and set the time. After some debugging trying to get certificates working with MSCEP, I figured out that it wasn’t accepting the certificate because it thought it was 2002, and thus the certs weren’t valid yet. Make sure you set up the correct summer-time, heh. My time config as well, set in global configuration mode:

clock timezone PST -8
clock summer-time PDT recurring 2 Sun Mar 2:00 1 Sun Nov 2:00
clock save interval 12

On that note, if you’re having trouble with certificates, this should help, as I was just getting “% Error in saving certificate: status = FAIL” until I set the following:

debug crypto verbose
terminal monitor

There are more debug crypto commands too, some helped, some were just too verbose. This was probably the best cisco reference. This was useful in troubleshooting, as well as the event logs on the IAS server, which showed the wrong usernames and such. In one case my laptop, which isn’t in the domain, was sending ‘LAPTOPNAME\localuser’ without asking me, go to the wireless card, properties, ‘Wireless Networks’ tab, click on the AP profile, properties, Authentication tap, Enable 802.1x, Set EAP type to PEAP, properties. At the bottom under ‘Select Authentication Model’ choose ‘smart card or other certificate’ or ‘Secured Password’ then configure, both have a check box about asking you for a username or not.

I came into all of this half-configured, so theres probably more to it but hopefully I saved some people some troubleshooting.

As a side note, don’t delete these, heh:

aaa authentication login default local
aaa authorization exec default local

Removing the later while cleaning up was letting me login but was dropping me into level 1 access and giving me “% Error in authentication.” when I tried to enable. I was trying to clean up because there were piles of different authentication methods from previous attempts of multiple people to get this going.

update 12/18:

Figured out NTP, heh, ‘sntp server ipaddress’ in global config mode and ‘show sntp’ in exec mode.

Also ended up switching to WPA2 as Vista wasn’t liking the WEP and I really couldn’t tell why. Unfortunately this also means that XP needs the WPA2 patch. Ended up with this:

dot11 ssid YOURSSID
authentication open eap YOURGROUPNAME
authentication network-eap YOURGROUPNAME
authentication key-management wpa
guest-mode

interface Dot11Radio0
encryption mode ciphers aes-ccm tkip
ssid YOURSSID
! snip some default stuff

aaa group server radius YOURGROUPNAME
server RADIUSIP auth-port 1645 acct-port 1646

radius-server host RADIUSIP auth-port 1645 acct-port 1646 key 7 YOURSECRET

This is a good place to start.

moving a database from mssql 2000 to 2005

I had to move a database to Microsoft SQL 2005 today. I managed to get both the servers to attach in SQL Management Studio (2005) and use the Copy Database Wizard (CDW) (Right click database, tasks, copy database) to set up a job (SQL Server Agent -> Jobs in the management studio) but it was failing. Event logs indicated it was because it was trying to log into the remote server as ‘DOMAIN\ANONYMOUS LOGIN’ or something stupid.

I tried with both the “detach and attach method’ where you have to provide a share path to the data files (I just used the administrative share on the source server) and the ‘sql management object method’. While I had domain admin credentials on both servers, they were in different domains (part of the reason of the migration). I think the failure was the attempt to use the “Integration Services Proxy Account” which probably is not setup at all, or obviously correctly.

Event log shows Source: SQLISPackage, Event ID: 12550:

Event Name: OnError
Message: Failed to connect to server SOURCESERVER.
StackTrace: at Microsoft.SqlServer.Management.Common.ConnectionManager.Connect()
at Microsoft.SqlServer.Dts.Tasks.TransferObjectsTask.TransferObjectsTask.OpenConnection(Server& server, ServerProperty serverProp)
InnerException–>Login failed for user ‘NT AUTHORITY\ANONYMOUS LOGON’.

I ended up detaching the database via the management stupid and copying the ldf and mdf files to the new servers data directory. Then running a query to attach it (one could probably do this visually). I got the query template from here rather than trying to remember how to type it. the “FOR ATTACH” part is important, heh.

‘Converting database ‘blah’ from version 539 to the current version 611.’ Showed up the query results, so it looks like it “upgraded” on it’s own.

wireshark/winpcap and windows server 2003 SP2 (chimney)

This problem has bugged me for a while and I finally resolved it this week. I had a good idea of the cause but it just wasn’t important enough to solve. I wrote this up in a mailing list post, by I think the winpcap-users list is slow, or my antispam is causing trouble as it usually takes a great deal of time for me to get messages from the list.

Running Wiresharp 0.99.6a with Winpcap 4.0.1 on Windows Server 2003 x64 R2 SP2 wasn’t showing all the traffic. For the most part, I was only seeing the TCP handshake although occasionally I’d see another packet or two, such as the bindrequest in an LDAP query and nothing else. This is on Dell 1955 blades with Broadcom BCM5708S NetXtreme II GigE (NDIS VBD Client) chipsets. I assumed it was some kind of offloading like TCP checksum but more advanced. I eventually found that the Microsoft ‘Scalable Networking Pack’ was integrated into Server 2003 SP2, which includes ‘Chimney’ they’re advanced TCP Offloading Engine.

I had looked in the advanced driver properties and saw that ‘Large Send Offload’ and ‘TCP Check Offload’ were both disabled and gained a false sense of security from this. Apparently Chimney isn’t controlled through here.

‘Netsh int ip set chimney DISABLED’ Turns off chimney though. I don’t know that you want it on except on high performance webservers or what not, so it would have been nice if it wasn’t on by default. There’s a bunch of registry entries as well, but that command works without a reboot of any kind.

Xerox Workcentre Pro: Job must have at least one To: recipient

After an upgrade to Exchange 2007 SP1 from 2007 RTM, the Xerox WorkCentre Pro stopped sending email scans. I originally thought it was an LDAP problem, because I had seen an LDAP error the night before and WireShark on the DC was showing the bind request but no other traffic. I finally resolved the Chimney problem with winpcap, but in the interim just used a port mirror on the catalyst.

Attempts to Email Scan on the WorkCentre Pro were throwing out an error page that said ‘Job Status: FAILED Job must have at least one “To:” recipient.’ Under the ‘Message Settings:’ header it listed From: with a valid address but the To: line was empty. Getting wireshark going showed that it was a 5.7.1 SMTP error and the Xerox just wasn’t communicating as much. Whitelisting the IP for the Xerox on the SMTP server resolved the problem.

practical packet analysis, book review

I picked up “Practical Packet Analysis: Using wireshark to solve real-world network problems” recently by Chris Sanders. Overall, it’s a little fluffy for my taste, I should know better than buying non ORA, AWP or Cisco Press books, but I’ve heard decent things about No Starch Press, somewhere.

If you’ve done any intermediate networking, the first chapter should be a quick review for you. If you’ve done any network troubleshooting with tcpdump/ethereal/wireshark, the next few chapters should be review as well. The rest of the book is mostly examples of traffic and how to tell what’s going on. This is nice, but at least once I felt the screenshots weren’t in the right order (DHCP DORA). All in all, I was really hoping this book would be more advanced than it was, but when you head in that direction, the author tells you to read an RFC.

There wasn’t any talk about TCP checksum offloading that I saw, or the heavier stuff like Chimney that microsoft is doing now that I don’t know anything about. Since I picked up the book because I’m having weird TCP segment timeouts and getting different traffic out of wireshark on the server and my workstation, I was hoping for more information that would be specific to this problem. Back to RFCs and google I guess.

If you consider yourself a senior linux systems administrator, most of this should be review for you. If you haven’t done much traffic analysis in the past, it’s worth picking it up and reading it. It’s a pretty light read, I didn’t skip any chapters but did skim a few and it was only a few hours of reading.

ISBN-13: 978-1-59327-149-7

SAS5/iR firmware on 1955 debian blades

I did finally get OMSA running fully on the Dell Poweredge 1955 blade with the SAS 5/iR (LSI SAS1068 PCI-X Fusion-MPT SAS) RAID controller (Not PERC5, apparently, which means its the mptsas driver and not megaraid_sas like PERC5). This confirmed my suspicion that ‘omreport storage controller’ listing the device as degraded was because the firmware was below the ‘minimum firmware’ version and not because of the virtual disk which ‘omreport storage vdisk’ listed as Ready and OK. Rainer Sabelka helped me out again noting that you could run the RHEL firmware update off support.dell.com.

The hard part was that I kept selecting RHEL5 in the pull downs, finding the utiltity and clicking “Download Now” which would send the standard Dell self-extracting executable which was very dos looking. I was thinking maybe you could take the firmware file and use some utility to upgrade it, whatever replaced omupgrade. Eventually I found that if you click on the file on dells site instead of download now, it gives you additional options, one of which is a BIN file for redhat. You can download this, chmod 755 it, and run it. It’ll less the release notes, then start the install when you say y. I shut down all the dell omsa stuff via init.d scripts before running this based on recommendations in the release notes. I think ‘file.bin –version’ brings up the notes such that ‘q’ just exits instead of continuing the install. Note that the install doesn’t force a reboot at the end, but recommends it. I didn’t see any changes in the omreport output until the reboot, so you’ll want to do the upgrade when you can reboot to avoid possible wonkyness.

As a side note, this is a decent starting place for Debian on Poweredge gear, although it says:

The three PowerEdge components that cause the most difficulty for new Debian sysadmins are the PERC 5 SAS RAID controllers (megaraid_sas driver), the SAS 5 non-RAID controllers (mptsas driver), and the Broadcom gigabit network adapters (bnx2 driver).

Which is a little confusing as the 1955 only has the mptsas driver running under etch, not the megaraid_sas, but it’s obviously doing raidish things and since we’re talking about linux and not windows I was thinking it wasn’t fakeraid. This is doubly interesting though:

vmware14:/proc/scsi# cat scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: FUJITSU Model: MAY2073RC Rev: D108
Type: Direct-Access ANSI SCSI revision: 03
Host: scsi0 Channel: 00 Id: 01 Lun: 00
Vendor: FUJITSU Model: MAY2073RC Rev: D108
Type: Direct-Access ANSI SCSI revision: 03
Host: scsi0 Channel: 01 Id: 00 Lun: 00
Vendor: Dell Model: VIRTUAL DISK Rev: 1028
Type: Direct-Access ANSI SCSI revision: 05

So it shows both disks, which are “raid1” and this “Virtual Disk”. Theres some jabber here about how it’s not RAID because it’s showing the disk and the other agrees and claims faulty assumptions, but there’s no reference or linkage. Here a Dell engineerish guy says “Yes the SAS 5/iR is just very light weight hardware RAID.”

Supposedly theres an ‘lsiutil’ package hiding in a driver package somewhere but I couldn’t find it in a brief search. The OMSA utilities appear to be working, although I still haven’t tried SNMP yet.

patching debian etch 64 for sara.nl omsa packages login

I’ve been playing with omsa on debian etch for a bit (dell 1955 blades) and wanted to post the notes on here. I’ve been using the sara.nl packages. After install, you’ve gotta start the daemon up: ‘/etc/init.d/dsm_om_connsvc start’.

Then you login as root/yourrootpassword. I’ve seen docs somewhere, I forget, that any user in /etc/passwd can login and those that are in the root group are admins. But it doesn’t work. If you check out /var/log/user.log you’ll see pam errors.

For some awesome reason when googling the dell list server archives you always get hits on the monthly gz digests but not the individual messages. I saw a few about this but I’m not going to dig them all back up, but this is good.

Basically you need 32bit versions of the following files and to put them in /lib32:

/lib/libsepol.so.1
/lib/libselinux.so.1
/lib/security/pam_unix.so
/lib/security/pam_nologin.so

You can figure out what package each file comes from with ‘dpkg -S file’, ie, ‘dpkg -S /lib/security/pam_unix.so’ which comes from libpam-modules. Its worth noting this package comes from the PAM master package so it it’ll be in pool/main/p/pam in the repo rather than pool/main/libp or whatever. Grab the i386 version of each package (libsepol1, libselinux1, libpam-modules) and extract them with dpkg -x package temp, then go into the temp tree (cd temp/lib) and copy the respective files to /lib32. I just copied the whole security folder from libpam-modules by the way. I used ‘dpkg -l | grep package’ to find out what version of each package was installed, then grabbed the same version in i386.

Run ‘ldconfig’ then tail /var/log/user.log just to keep an eye on things, and you should be able to log in now.

update: it’s worth linking that the sara.nl folks have revealed an wiki site with many of these answers here.

migrating an openldap database with slapcat

I had to move openldap from one box to another, different version.

The slap* tools that come with openldap appear to access the files directly while the ldap* tools in ldap-utils go through the ldapserver.

I started by exporting the old server to ldif with ‘slapcat -l oldserver.ldif’. Then copy this file to the new server

Here were some problems. I couldn’t slapadd because some of this already existed. I ran ‘dpkg-reconfigure slapd’ to get the config file to match what was on the old server. In retrospect I could have just copied the old config, but I wanted to make sure there weren’t new config options that were needed. Reconfigure asks you what OU and DNS entries you want to use and creates a new database. Of course, this will have admin entries, so you’ll still have this above problem.

Stopped slapd. I went into /var/lib/ldap and removed all the files. Then used ‘slapadd -l oldserver.ldif’ to import the entries, which created a new database. Make sure all these files are owned by the user that slapd will run as (chown -R openldap.openldap /var/lib/ldap).

Now start slapd and see if it’s running. the init.d script on debian etch was starting it, but it was closing with no errors. I changed the ‘loglevel’ line /etc/ldap/slapd.conf to ‘511’ and restarted, seeing “bdb_db_open: alock package is unstable” in /var/log/syslog before slapd shut down. Somehow I had gotten this created in all my screwing around. I removed alock and log.* from /var/lib/ldap, double checked the permissions which I think slapadd had also mangled at some point, and started the server. Came up fine now. Tested with ldapsearch -x -W -b ‘basedn’ where basedn is the basedn =P.

This helped me figure out some of the differences between slap* and ldap* and this made me realize that the “DB_KEYEXIST: Key/data pair already exists” error was because the administrative entry was already in the preconfigured database so I had to start from scratch.

dell omsa for sas raid on debian etch 64bit

I’ve got a dell blade (1955 or 1855 or something) that has a failed SAS disk and I need to get into the RAID array config without rebooting. The box runs debian etch 64bit. While dell has this crafty linux site with lots of info, it’s pretty rpm-based distro oriented (redhat/sles). There are some debian bits hiding in places though like omsa contributions here. (Be sure to go to IdeaStorm and vote for better debian support for OMSA here and here.) OMSA is the “Open Manage Server Administrator”. It looks pretty java oriented. I installed dellomsa_5.1.0-5_amd64.deb from here though, roughly following these directions.

It’s worth noting that the RAID is the new (PERC 5? PERC 6?) “generation” of SAS RAID, using the MPT Fusion drivers LSI. lspci provides “LSI Logic / Symbios Logic SAS1068 PCI-X Fusion-MPT SAS (rev 01)” so I tried using the MegaCLI tools but there’s a lack of documentation and I had to download an rpm, extract it, and try trunning the MegaCli and MegaCli64 tools by hand, which both said I had no controllers (‘MegaCli64 -adpcount’ I think). This uses the ‘mptsas’ kernel module

Theres a qwoot dell wiki project that links to some docs on OMSA. The standard dell download interface totally pisses me off, as it’s table design is hard to read and often broken like here where a lot of the links just go to gpg signatures.

I couldn’t get logged into the web interface, which you start with /etc/init.d/dsm_om_connsvc and runs on port 1311. If you’re connecting and getting a “application/octetstream” MIME type download prompt, it’s probably because you’re not using ssl and it is, reconnect with ‘https://ip.ip.ip.ip:1311’. It’s pointed out here that password information is stored in /opt/dell/srvadmin/iws/config/iws.ini but doesn’t appear to be used. It looks like PAM is used, but I didn’t get so far as to debug PAM to make sure it was getting hit. Just try logging in with the root/rootpassword credentials.

I tried using omreport and whatnot in /opt/dell/srvadmin/oma/bin (use omhelp to get information on onreport, ie: omreport chassis info), but these tools said they couldn’t find any controllers or chassis info. I’m not sure whats msising there.

I removed those debs then installed dellomsa_5.1.0-2_i386.deb with ‘dpkg –force-architecture -i dellomsa_5.1.0-2_i386.deb’ as it’s i386 and this is amd64. Now ‘omreport chassis info’ brought back real data. ‘omreport system summary’ is useful, but ‘omreport storage controller’ still reports ‘No controllers found’.

Maybe I’ll go back to working with the megacli stuff. I wish I could get source instead of rpms.

Seriously people. What the hell is the use of a RAID controller that I’ve got to shut the box down to manage?

update: this looks common (1, 2, 3), I searched the linux-poweredge dell mailing list. Looks like an etch thing, mebbe an etch 64 thing. I just rebooted the box and rebuilt from the bios.

update2: Rainer pointed out on the poweredge list here that he needed to load mptctl before dataeng. so ‘ /etc/init.d/dataeng stop ; modprobe mptctl ; /etc/init.d/dataeng start’ got the storage portion of omreport working for me. Also Bas van der Vlies asked if I had tried their packages so I removed the i386 dellomsa, reinstalled the latest package from their distro on sara.nl (dellomsa_5.2.0-2_amd64.deb) and that worked alright. Interestingly ‘omreport storage controller’ says the array state is degraded. I put a new disk in last night and started the sync in BIOS, so I’m curious if it didn’t finish.

I still can’t login to the web portion, but this is good enough.

update3: noted that he’s arranged many of these notes in a secret clean wiki here, the FAQ has many of the same tips without my rambling. This includes the bits that I figured out from mail list posts on how to get the IWS (web) login working correctly, which I documented here.