I’m no fan of software raid. Pretty much, ever. At my last job, for whom I still consult, my predecessor was really into technology creep. All of the workstations used that awesome fake raid that is actually implemented in the mass storage driver and is therefore pretty useless and can actually reduce your paths to recovery from disk failure. I’ll leave out the list of arguments against software raid. It just simply isn’t worth it.
I showed up to a call with a server with an 0×7b error. Of course, Microsoft has this cool feature by default where servers automatically reboot when they blue screen. So nobody knew this was the error until I showed up and tried the “don’t automatically restart on BSOD” option under the F8 startup menu. I’m used to this error from moving system images between hardware, especially with virtual machines. As it turns out, the other values inside the parenthesis are actually useful. If the second value inside the parenthesis is 0×00000010, then you’re likely dealing with a disk in a software raid mirror set (dynamic disk) that Windows has marked as failed, and thus won’t start from.
The trick, which took me a while to nail down, is getting a boot.ini setup to boot from another disk. Since you can’t actually access this partition even in the Recovery Console, you can’t edit the boot.ini to tell it to start from the other disk. In the end, I formated a floppy using simply ‘format A:’ on an XP desktop (would you believe this entire data center lacks a Windows server with a floppy drive?), then copied ntldr, ntdetect.com and boot.ini from another Server 2003 machine with the same service pack to this floppy. Then I changed the boot.ini to contain:
[boot loader]
timeout=60
default=multi(0)disk(0)rdisk(0)partition(1)\WINDOWS
[operating systems]
multi(0)disk(0)rdisk(0)partition(1)\WINDOWS=”DISK 0″ /noexecute=optout /fastdetect /3GB
multi(0)disk(0)rdisk(1)partition(1)\WINDOWS=”DISK 1″ /noexecute=optout /fastdetect /3GB
multi(0)disk(0)rdisk(2)partition(1)\WINDOWS=”DISK 2″ /noexecute=optout /fastdetect /3GB
multi(0)disk(0)rdisk(3)partition(1)\WINDOWS=”DISK 3″ /noexecute=optout /fastdetect /3GB
If you’re not familiar with this file, you may want to read about ARC paths. Remember that ntldr and ntdetect.com are hidden, system and read-only by default, although it’s fine to leave this options unset. ‘attrib -s -h -r C:\ntldr’ will make the file accessible so you can copy it to a floppy. I have to assume when you format a floppy from an NT based operating system it puts a bit of code in the bootsector to look for these files.
I then booted from the floppy and for me I then chose ‘DISK 1′ and the system started up fine. I went pulled the failed disk (carefully guessed which disk it was by the disk order in disk management and the scsi id jumper settings) and replaced it. In disk management, right click the good disk, “remove mirror” and choose the missing disk. Then right click again, “add mirror” and choose the new disk. Drink coffee.
It’s late and I can’t figure out how to run ‘fixboot’ and ‘fixmbr’ with a disk mirror, so I’m still using the floppy disk to boot and choose either disk to start from.
I find that Bart PE is very useful when I’m trying to fix Windows.
I used something similar, and disk management wasn’t interested in bringing any of the software mirrors online for me. Chalk up another reason this should have been a hardware raid mirror.
Devil’s advocate: I recently had a hardware RAID adapter (SAS) utterly fail to handle errors on one of the drives in a mirrored pair, and so reported IO errors back to the OS. Pretty failboat, especially for an expensive piece of hardware whose only job in life is to do exactly not that. C’est la vie, firmware updates inc.
Sure. I had a RAID card fail a long time ago and it sucked bringing the array back up on another controller. In comparison to the number of times I have disks fail under hardware raid and recovery is a matter of swapping the dead disk out with a new one and waiting for the array to recover from the hot spare, fixing software raid is a nightmare.
When you create a software RAID mirror through Disk Administrator it also creates a second line in the boot.ini to boot off of the mirrored drive. Did the previous administrator remove them?
I’ve had more headaches caused by hardware RAID implementations (3ware cards specifically) than software, so I tend to run with software mirroring on my computers.
The second disk in the mirror didn’t have an MBR, I tried booting off this disk and didn’t get the bootloader. There wasn’t a line added to the boot.ini in the mirror for the second disk. After I added the new disk to the mirror, it didn’t have an MBR either and I couldn’t boot off it. As I mentioned, trying to use the recovery console to run fixboot/fixmbr didn’t work as the “c:” drive wasn’t available in the recovery console.
KB 167045 notes a lot of work around for a failed primary disk in a mirror. It seems to put a lot of emphasis on the “fault tolerant boot floppy”, so I wonder why there’s no mentioned of the boot.ini being fixed? Perhaps they are assuming that the primary disk completely failed and you can no longer use the boot code on it, as opposed to Windows marking the disk failed with errors, causing the 0×7b error.
I’ve definitely had the majority of my RAID headaches with software. Every couple of weeks I have a disk failure in a hardware raid and it’s simply a matter of performing the hot swap while the machine continues running. No having to add a floppy drive to the server, creating boot floppies, etc. I’d have to say it’s as close to magic as I can get.
The last hardware raid controller failure I had was near a decade ago on some janky used piece of hardware. I suppose you get what you pay for.
At my work I have exclusively software mirrors (not by choice). They are slower and lots more trouble to repair when a disk fails. Once a few weeks ago I had a customer’s hardware raid controller and scsi disks die after he didn’t turn on the a/c after a power failure.. lol!
Of course both a hard disks and the raid controller went to the scrap pile. The box was 10 years old… I’m not suprised. I have used Bryan’s method as well for repairing failed mirror disks often. By the way, the boot files are not “copied” to the second mirror disk on a software mirror… it won’t mirror the boot files, just the data. That is why you have to reboot and usually create a boot disk depending if the failed disk had the boot files.
I have had excellent luck with hardware raid.. the only problems I have had is once a cheap promise controller (those stupid ones that that use a promise software raid driver with a hard disk controller card and pretend its a raid card) had the drivers get corrupt and cause all kinds of blue screens.
Hence on any kind of important server I always recommend a hardware raid controller due to how much time it takes to recover a failed software mirror.
Saved my ass! Love your work. It went exactly as you wrote. Drive 0 was the problem. Software RAID is the work of the Devil!!
So you’re still booting the MBR from a floppy?
I take it that you cannot see the System Partition from the Recovery Console so you can’t transfer a MBR using FIXBOOT.
My understanding is this - I guess the reason you wouldn’t be able to see the partition is because you need third-party drivers for the SCSI controller, so you’d need to create a Driver floppy disk with the SCSI drivers, then when booting into the Recovery Console, when it says to press to load 3rd-party SCSI drivers, do so, then you should be able to run FIXBOOT?
“I’m no fan of software raid.”
That’s because the software was written by Microsoft.
I use MD on Linux and Solaris (RAID1 for all disks inside the box).
The Linux boxes usually need to be powered off to swap out the failed drive (I haven’t had a hot-swappable SATA drive fail yet) so a re-boot is required.
All the drive failures I’ve had on Solaris boxes were hot swappable SCSI so they just kept running.
How do i do this with a cd? I dont have access to a floppy drive.
Regards
Srinivas
If you’re relatively technical, you can use a tool like virtual floppy drive and apply a boot disk image to it, then use a cd recording program like nero to burn a bootable cdrom by pointing it at the virtual floppy disk at the corresponding point in the process. Otherwise, just go buy a USB floppy drive, they’re good to have around.
Software raids are a big pain. I have dealt with a few software raids that went bad. Called Microsoft and they was unable to recover.
hi,
I’m currently on pratically the same situation: dynamic disks mirrors, first disk failed and when I put the mirror alone in order to boot the server it gaves me the msg: error reading disk press ctrl-alt-del to reboot
I’ll use your technics and see if it could help me to get the server back
any more suggestion?
regards