One of my servers went down recently. Another chance for my brother to dig on Linux. “Linux sucks” he says but don’t be fooled by the statement. He is a big fan boy of Mac but is not a Linux hater, and in so many situations has recommended Linux. He just loves bugging me. Well back to the story. The power supply seemed to have given problems and the hard disk on which the root laid got corrupted and there by the system crashed. For those Linux skeptics, hardware problem not software. So now I had to come up with a solution to see that if the hard disk failed I should be able to boot of with another. RAID!!
For those who don’t know what RAID is, its a Redundant Array of Inexpensive Disks, and more recently people are referring it as Redundant Array of Independent Disks. Basically its a method to combine disks together or to make redundant copies of disks automatically, more like an automated backup. For more information look at the article on Wikipedia.
So I bought two 500 GB hard disks and two 1.5 TB hard disks. The first two I decided to have the /boot and the / partitions. The 1.5 TB was for the /var partition. Started the installation of CentOS 5.2. The system detected my disks as:
/dev/sda , /dev/sdc 500 GB
/dev/sdb, /dev/sdd 1.5 TB
I created one RAID partition on each of the 500 GB HDD, 200MB and created it as the /boot partition with RAID level 1 (Mirroring). Created another on each of the HDDs made it as LVM (logical volume manager), again RAID level 1. I created the swap, /, /home in the LVM. Then created another RAID on each of the 1.5 TB and created a LVM out of that. /var found itself in this LVM.
Thus I had two 500 G mirroring each other and two 1.5 T mirroring. Installation took a while and every thing went smooth. After setting up the basic necessities I tried to test the implementation. Pulled out the second 500 G /dev/sdc the system worked like a charm. Pulled out the second 1.5 T and again the system worked. It was getting late and had to go hit the bed. Went home hoping the test would work well the other way round as synchronization of the RAID was taking a long time when I fixed the disks again.
Next morning, back at work at 7 in the morning. Boss came over at 9 and we tried to test it. I unplugged the main 500G HDD and started the system. FLOP! The system did not boot, got stuck in GRUB. All the system showed was
All efforts in vain I thought as my boss gave a cynical smile. I got a breather and resorted to consult the Gandalf of all search engines, Google. And I was right to call it Gandalf, the solution popped up as the second entry. Linux Raid wiki! You can refer that for the instructions of creating your own RAID system.
From there I figured I had to configure the system to boot of from multiple devices. (refer the “Prepare to Boot” subdivision). I had to make GRUB understand that if the first disk, /dev/sda, failed it needed to use the second one, /dev/sdc. Executing these commands worked up fine.
grub --no-floppy <<EOF root (hd0,0) setup (hd0) device (hd0) /dev/sdc root (hd0,0) setup (hd0) EOF
It was testing time again. I unplugged the first HDD, /dev/sda, and booted. At last, the system booted into CentOS 5.3 (yeah, I updated it). I literally ran shouting out my boss’s name. All the cynical grin on his face was gone when he analysed it. Well he was happy, he needed a fail safe system.
Thus the server was up and all work returned to normal. Hope things work out. One never knows I might have to try RAID 5!