Thursday, May 31, 2007

FSVS for sysadmins

Notes: This entry was based on v1.1.4. The 1.1.5 and later versions of FSVS also place a few files in /etc/fsvs.

Original post follows

Okay, I'm heading back to trying FSVS again for doing system configuration management. The FSVS website has some documentation, but for the full documentation you'll want to download the source tarball and look in the "doc/" folder.

Resource links (and other useful information):

FSVS (fsvs.tigris.org) - the home page for FSVS. See also the Purpose of FSVS and Backup pages.

Subversion (software) - The SubVersion explanatory page over at Wikipedia.

SSH tricks - One of the most important documents to read if you want to setup secure SSH access to your SVN server. It specifies how to lock things down so that "/usr/bin/svnserve" is the only thing they can do with a particular public key.

Setting up Subversion in Linux

Creating Subversion Repositories

Now, here's what I do know about FSVS:

- It uses SubVersion for the backend storage. Which means that if you already have a SVN server up and running, you can use it for the FSVS storage. This also means that you could pull configuration files down onto your laptop with SVN to take a gander at the revision history of a particular file.

- FSVS doesn't pollute your directories with ".svn" folders. Instead, it keeps a central storage database elsewhere (by default this goes in /var/spool/fsvs, but you can move it). This WAA (Working copy Administrative Area) directory only contains file lists and hashes. It does not contain "pristine copies" of any files, so it will use up a lot less space then ".svn" folders.

- FSVS will keep track of file metadata (such as timestamps, chmod flags, etc. - see the FSVS website for particulars). I'm not sure whether this includes information needed by SELinux.

- You can use FSVS to push changes to machines. Not something that I'm interested in (yet), but I might use it down the road.

- And most (all?) tricks you can use in SVN repositories apply to FSVS. Such as cloning a machine from another's configuration using "svn cp" or comparing files between two machines. Or creating a branch for a new configuration.

- FSVS allows you to make "empty" commits to the repository. If nothing has changed in the system and you do a fsvs ci -m "commit message" then FSVS will create a new revision in the repository, but with no actual changes. That may come in handy in certain circumstances.

The method to my madness

My preference for system administration is to have a separate user account and SSH key for each machine that I manage. This allows me to use a no-password SSH key on the machine so that I can do svn/fsvs commands easily (or script svn/fsvs commands). Because the SSH server is locked down, and the keys are locked down with the "command=" syntax of SSH - I'm not terribly worried about keys that don't have passwords. Since SVN doesn't allow you to permanently delete files from a repository, there's a limited amount of damage that an attacker could do if they swipe the private key.

(Lastly, because the SSH private key is stored inside of /root/, it means they've cracked the server security already. We're just trying to limit the damage and keep them from being able to erase things on the central SVN server.)

Naturally, after talking about SSH keys, I'm only using the "svn+ssh" method of accessing the central repositories.

SSH Security considerations

On the SVN server, you should edit /etc/ssh/sshd_config and verify that the following are enforced in the SSH daemon configuration:

AllowTcpForwarding no
X11Forwarding no
PermitTunnel no

That eliminates most abuses that are possible, even if someone edits their ~/.ssh/authorized_key file on the SVN server.

Creating a user account on the SSH server

Make sure you have a naming scheme in place. For us, regular developers and administrators get normal looking usernames (i.e. "thomas" or "tgh" or "haroldt"). For machine accounts, we prefix the server name with "sys-" to create the username (i.e. "sys-fw1", "sys-mail1", "sys-gracie"). Which should make it easy to see if a machine has been added to groups that it shouldn't be in. Any groups that are used to control access to SVN directories are prefixed with "svn-repositoryname" such as "svn-sys-fw" (which owns the "sys-fw" repository).

A) On the client machine:

Login as root (or "su" to root).

# cd /root/
(skip the next 2 commands if the .ssh subfolder already exists)
# mkdir .ssh
# chmod .ssh 700
# cd .ssh
# /usr/bin/ssh-keygen -N '' -C 'svn key for root@hostname' -t rsa -b 2048 -f root@hostname
# cp /root/.ssh/root@svn /root/.ssh/id_rsa
# cat root@hostname.pub
(copy this into the clipboard or send it to the SVN server or the SVN server administrator)

B) On the SVN server

Note: You should use some sort of random password creator (or the output of /dev/random or /dev/urandom) to create a long password that can be copied and pasted into the password prompt. Since we're using SSH keys, the account doesn't need a password that anyone knows.

(I know there's a better way to do make an account with an unguessable password, yet still allow SSH access via pub keys, but I can't find it at the moment.)

# useradd -m username
(i.e. "useradd -m sys-fw1-pri")
# passwd username
(paste in a super-long randomized password, such as a few bytes from /dev/urandom shoved through md5sum)
# cd /home/username
# su username
$ mkdir .ssh
$ chmod 700 .ssh
$ cd .ssh
$ cat > root@hostname.pub
(paste in the public key file from the client system)
$ cat root@hostname.pub >> authorized_keys
$ chmod 600 *

Now to lock the key down, edit the ~/.ssh/authorized_keys file and put the following on the front of the key line that will be used by the client machine:

command="/usr/bin/svnserve -t -r /var/svn",no-agent-forwarding,no-pty

This forces the connection to run the "svnserve" command in tunnel mode. So this SSH key cannot be used to login or run any other commands on the server. It also changes the SVN root path to /var/svn. You will want to also add "no-port-forwarding" and "no-X11-forwarding" if you have not disabled those in your /etc/ssh/sshd_config file.

We now have a user account that can be used with SVN. Go ahead and [Ctrl-D] to escape the "su username" session and get back into the root account.

Setting up the repository (on the SVN server)

Reasons to use a common repository for all similar machines:
- Ability to do a svn diff between two different machines
- Ability to clone machines ("svn cp")
- Backup scripts are less complex (fewer repositories)
- You can "svn diff" between machines

Reasons to use individual repositories
- Easy to secure using chmod/chown
- Easy to get report sizes on how much space a machine is using on the repository server
- Easy to dump/load an individual machine's repository
- Easy to take a machine offline and remove the repository to save space
- A machine's SSH key can only be used to look at their repository (unless you configure per-directory authentication in SVN)

Note: I'm using "username", "hostname" and "machinename" fairly interchangeably on this page. (Someday I'll go back and clean it up.)

On the SVN server:

# cd /var/svn
(your repositories may be stored elsewhere)
# svnadmin create /var/svn/sys-machinename
# chmod -R 770 sys-machinename
# chmod -R g+s sys-machinename/db
# chown -R sys-machinename:sys-machinename sys-machinename

Notes:
- A chmod of "770" allows read/write access to everyone in the same group.
- A chmod of "700" only allows read/write to the owner account.
- The third octet should probably always be "0" to prevent the repository from being world-readable.

Verifying the connection (on the client)

You will need to customize the following URL to point at your SVN repository location.

# svn info svn+ssh://sys-machinename@svn.intra.example.com/sys-machinename/

That should prompt you to accept the server's public key, then display a response fron the SVN server. If things don't work, then you've got connectivity (firewall), account (wrong name? wrong key?) or permissions (chmod or chown goofs?).

Installing FSVS (on the client)

Notes:

In order for the install to succeed, you must have installed the "subversion", "subversion-devel" "apr", "apr-devel", "gcc" and "ctags" packages. Two others that you need are "gdbm" and "pcre" (and the associated developer packages). There may be other dependencies that will also be installed that are required by those packages. The following command worked for me on CentOS5:

# yum install subversion subversion-devel ctags apr apr-devel gcc gdbm gdbm-devel pcre pcre-devel

Head on over to the official project page at freshmeat.net and download the tarball. This is currently "fsvs-1.14.tar.gz". Extract the tarball to a folder somewhere (i.e. /root/fsvs-1.14.tar.gz) and use a terminal session to go to that folder.

# cat README
(look for the section that talks about the install)
# cd src
# make
(you will receive a message that the Makefile has now been updated and that you need to run make again)
# make
(you should see a large stream of gcc output with the following at the end)
-rwxr-xr-x 1 root root 201510 May 31 09:54 fsvs
(you must see the above line to know that you got a good compile)
# cp fsvs /usr/local/bin

At this point, FSVS *should* be installed correctly.

Getting started with FSVS

By default, FSVS will want to use /var/spool/fsvs unless you define the "WAA" variable and point it somewhere else. So according to the README you will need to create that folder and then initialize it (which lets FSVS create the administrative files it needs).

Note #1: I'm not sure how large /var/spool/fsvs will get. You may eventually want to break it off into a separate LVM volume of its own. Since it is mostly file lists and hashes, it shouldn't get too large, but you may want to put it on an ext3 volume with more inodes then normal.

Note #2: You should read both the README and the output of "fsvs help urls".

# mkdir -p /var/spool/fsvs
# chmod 700 /var/spool/fsvs
# cd /
# fsvs urls svn+ssh://username@machine/path/to/repos
(The "fsvs urls" won't display any confirmation text.)

The above will connect the root folder to the repository path. You could also do multiple URLs and only link sub-folders (such as /etc, /usr, /home) up against the SVN repository.

Setting up ignore filters

After telling FSVS that we want to use "/" as our working copy, we'll want to also tell it to ignore various directories and files. While this tends to be somewhat similar across Linux distributions, you should also plan on modifying this list to match your distribution.

Make sure you read "fsvs-*/doc/IGNORING" and the output of "# fsvs help ignore".

# fsvs ignore ./backup
# fsvs ignore ./dev
# fsvs ignore ./mnt
# fsvs ignore './proc/*'
# fsvs ignore ./sys
# fsvs ignore ./tmp
# fsvs ignore ./var/tmp
# fsvs ignore ./var/spool

In addition you may wish to initially ignore all of the binary file directories (such as ./lib, ./lib64, ./sbin, ./usr, ./var) and focus solely on /etc, /home, /boot and /root. That will give you a much slimmer listing when you "fsvs status" from the root directory.

You can use the "fsvs ignore dump" and "fsvs ignore load" commands to backup your listing, edit it, then load it back into FSVS. Note that you must be in the base directory of your working copy, otherwise "fsvs ignore dump" will return an empty listing.

# cd /
# fsvs ignore dump > ~/fsvs-ignore.txt
# vi ~/fsvs-ignore.txt
# sort ~/fsvs-ignore.txt | fsvs ignore load

My initial listing on CentOS5 is:

./backup
./dev
./lost+found
./media
./mnt
./proc/*
./selinux
./sys
./tmp
./var/named/chroot/proc
./var/spool
./var/tmp


Putting /etc under version control

Assuming that you setup FSVS where "/" (root) is the base of the working copy, we can now add the contents of /etc to SVN.

# cd /
# fsvs commit -m "Base check-in of /etc" /etc
# fsvs commit -m "Base check-in of /boot" /boot

If you want to deep-commit a single folder (such as /usr/local/sbin) without doing the intervening folders:

# cd /
# fsvs commit -m "Base check-in" /usr/local/sbin

Working with FSVS

Create a test file in /etc

# cat > /etc/testfile.txt
foo
# fsvs commit -m "Checking in a test file" /etc/testfile.txt

Now delete the test file

# rm testfile.txt
# fsvs status .
D... 4 ./etc/testfile
.mC. dir ./etc
# fsvs commit -m "Removed test file" .
Committing to svn+ssh://sys-fw1-pri@svn.example.com/sys-fw1-pri
.mC. dir ./etc
D... 4 ./etc/testfile
committed revision 7 on 2007-06-04T00:17:13.808327Z as sys-fw1-pri


That just scratches the surface, but covers the majority of day-to-day use. Unlike SVN, FSVS knows (assumes) that when a file is missing that it should implicity do a "delete" operation in the repository to make the repository match the file system.

Other useful commands to know are "fsvs unversion" and "fsvs diff".

Tuesday, May 29, 2007

SVK for system management

I'm a big fan of using a version control system in conjunction with system administration. There's a great feeling to know that even if I screw up a configuration file, I have an easily accessible way to revert or track changes. To accomplish this, I was using SubVersion (SVN) as an administration tool.

However, SVN comes with some downsides, primarily the issue that it creates ".svn" folders in the directory tree. Which can cause issues and maybe even lead to security holes (yes/no? unsure about this).

So, maybe SVK is better suited.

Installing SVK on CentOS5

(See SVK - Distributed Version Control - Part I (Ron Bieber, 2004) for a good tutorial.)

1. Open up the package manager and make sure you have installed the following packages:

"subversion" (possibly not required)
"subversion-perl" (Perl bindings)

Or, using the command-line (for x86_64):

# yum install subversion.x86_64 subversion-perl.x86_64

At least... I think the above command works. I used the GUI package manager for this step.

2. Use Perl and CPAN to install the SVK system.

# perl -MCPAN -e 'install SVK'

You'll be presented with about a dozen questions, and you'll need to install all sorts of modules if this is your first time running that command. It's all pretty self-explanatory (and I was on the phone while doing that, so I wasn't able to jot everything down).

...

Well, after mucking with this for a few hours and getting self-test errors, I'm going to shelve this for now and go look at FSVS insteadk.

Remote install of CentOS5 using VNC

I haven't tried this yet, but there are times when it could come in handy.

Centos 5 vnc remote installation

Post details: Upgrading to CentOS4, over a remote vnc connection

I figured it was possible, I just hadn't looked.

Sunday, May 27, 2007

Squid, SELinux and using a separate volume for the cache_dir

This was a slightly tricky one. I'm running CentOS5 with SELinux and I was trying to setup Squid to put its cache_dir on a LVM volume (to keep it from using up space on the root partition).

# /etc/init.d/squid stop
# cd /var/spool
# lvcreate -L64G -nvar-spool-squid vg
# mke2fs -j /dev/vg/var-spool-squid
# mkdir /mnt/squid ; mount /dev/vg/var-spool-squid squid
# cp -a /var/spool/squid/* /mnt/squid/
# cd /var/spool/squid
# rm -rf *
# cd /var/spool
# mount /dev/vg/var-spools-squid squid
# /etc/init.d/squid start

Starting squid: /etc/init.d/squid: line 53: 9440 Aborted $SQUID $SQUID_OPTS >>/var/log/squid/squid.out 2>&1
[FAILED]

# tail /var/log/messages

May 27 21:50:48 fw1-hosho setroubleshoot: SELinux is preventing /usr/sbin/named (named_t) "write" access to named (named_conf_t). For complete SELinux messages. run sealert -l 663ea169-d194-4c49-a5bb-a6a4bb707990
May 27 22:39:26 fw1-hosho squid: cache_dir /var/spool/squid: (13) Permission denied

# /usr/bin/sealert -l 626e75b4-32aa-4a61-88f7-f36a68fecd35
Summary
SELinux is preventing access to files with the label, file_t.

Detailed Description
SELinux permission checks on files labeled file_t are being denied. file_t
is the context the SELinux kernel gives to files that do not have a label.
This indicates a serious labeling problem. No files on an SELinux box should
ever be labeled file_t. If you have just added a new disk drive to the
system you can relabel it using the restorecon command. Otherwise you
should relabel the entire files system.

Allowing Access
You can execute the following command as root to relabel your computer
system: "touch /.autorelabel; reboot"

Additional Information

Source Context user_u:system_r:squid_t
Target Context user_u:object_r:file_t
Target Objects /var/spool/squid/00 [ dir ]
Affected RPM Packages squid-2.6.STABLE6-4.el5 [application]
Policy RPM selinux-policy-2.4.6-30.el5
Selinux Enabled True
Policy Type targeted
MLS Enabled True
Enforcing Mode Enforcing
Plugin Name plugins.file
Host Name fw1-hosho.intra.example.com.
Platform Linux fw1-hosho.intra.example.com. 2.6.18-8.1.4.el5
#1 SMP Thu May 17 03:16:52 EDT 2007 x86_64 x86_64
Alert Count 10
Line Numbers

Raw Audit Messages

avc: denied { getattr } for comm="squid" dev=dm-0 egid=23 euid=23
exe="/usr/sbin/squid" exit=-13 fsgid=23 fsuid=23 gid=23 items=0 name="00"
path="/var/spool/squid/00" pid=9584 scontext=user_u:system_r:squid_t:s0 sgid=23
subj=user_u:system_r:squid_t:s0 suid=0 tclass=dir
tcontext=user_u:object_r:file_t:s0 tty=(none) uid=23


...

So, the problem is that SELinux had not yet been told to look at the newly created volume (a LVM volume mounted on /var/spool/squid). Fixing this was rather simple once you know about the restorecon command.

# cd /var/spool/squid
# /usr/sbin/squid -z
# /sbin/restorecon -R *
# /etc/init.d/squid start

Friday, May 25, 2007

iSCSITarget on CentOS5

Setting up our test iSCSI SAN box this week. The original plans were to run this on top of Gentoo (which is very powerful and flexible) but after 3 years, I'm not very pleased with Gentoo as a server OS. Which is a whole different topic. So we've migrated over to using CentOS5, which is derived from Red Hat Enterprise Linux 5, a distro that is more suited for corporate use.

There's not much to talk about in terms of the base system. It's a pretty vanilla 64bit CentOS5 install (from DVD) running on top of a dual-CPU dual-core pair of Socket F Opterons. The primary packages that I've installed so far are "Yum Extender" (from stock repositories) and "rdiff-backup" (downloaded as an RPM). The OS runs on top of a 3-disk RAID1 (mirror, all drives active) Software RAID for safety.

I use a semi-customized partition layout on the (3) operating system disks. I have:

a) /boot
b) / (root, the primary OS install area)
c) swap
d) a backup root partition (which is basically a clone of the primary, except for a small change in /etc/fstab) designed for quick recovery from a situation that would hose the primary root partition
e) /var/log (broken out to its own area)
f) /backup/system (a place to store system backups)
g) LVM area (no allocated areas yet)

I mention all that because the first step before installing iscsitarget is to make sure I can recover if things go awry. Since installing iscsitarget involves mucking with the running kernel, I want a good backup of /boot along with making sure GRUB offers me options to boot an older kernel. I'll also freshen my root backup partition.

Step 1 - Backing up /, /boot, and the existing kernel

Simplicity is often best when dealing with the base OS. My methods are crude, but designed to get me back up and running without needing much in the way of software. The primary requirement is a bootable USB pen drive or bootable LiveCD (such as RIPLinuX) with the necessary tools. You could also use the CentOS5 boot DVD.

I'll run with the CentOS install DVD since that's what I have sitting in the optical drive at the moment. When CentOS boots up, enter "linux rescue" at the boot prompt. Note, if you have multiple NICs installed, it's probably better to not start networking (because the CentOS rescue mode takes forever to initialize unconnected NICs).

Select "Skip" when asked about mounting the existing install at /mnt/sysimage. We'll be doing things our own way instead.

Start up Software RAID on the key partitions (/boot, /, the backup /, and the backup partition). The following commands will (usually) startup your existing RAID devices automatically.

# mdadm --examine --scan >> /etc/mdadm.conf
# mdadm --assemble --scan

In my case "md0" is /boot, "md2" is my base CentOS install, "md3" is the backup root partition, and "md5" is where I can store image files. So let's double-check that.

# mkdir /mnt/root ; mount /dev/md3 /mnt/root
# mkdir /mnt/backuproot ; mount /dev/md3 /mnt/backuproot

If we then examine the output of "df -h" or by using "ls" on the mounted volumes we can verify that we know which is which. Let's mount our backup area and create image files. I prefer to kick off the "dd" commands in the background so that I can monitor progress and keep multiple CPUs busy.

# mkdir /mnt/backup ; mount /dev/md5 /mnt/backup
# cd /mnt/backup ; mkdir images ; cd images
# dd if=/dev/md0 | gzip > dd-md0-boot-20070525.img.gz &
# dd if=/dev/md2 | gzip > dd-md2-root-20070525.img.gz &
# dd if=/dev/md3 | gzip > dd-md3-bkproot-20070525.img.gz &

We should also backup the master boot records on each of the hard drives in the unit.

# for i in a b c; do dd if=/dev/sd$i count=1 bs=512 of=dd-sd$i-mbr-20070525.img; done

Unfortunately, the CentOS5 DVD doesn't include tools like "G4L" (Ghost for Linux) or I'd make a second set of backup files using that. I may boot my RIPLinuX CD and see what tools are there. (Because you can never have too many backups.)

Now I can dump the contents of "md2" (the original root) to "md3" (our backup root).

# dd if=/dev/md2 of=/dev/md3

Now for some cleanup stuff...

# mount /dev/md3 /mnt/backuproot
# vi /mnt/backuproot/etc/fstab

We'll need to change any references of "md2" to "md3". Basically flip them around so that "md3" is the official root when /etc/fstab gets processed. I also like to change the prompt and system name to remind myself that I'm using the emergency system. Again, our primary goal is to be able to get a box back up and operational in the case where the primary root partition is hosed. Get it up quickly, then schedule some downtime to deal with it properly.

Now would also be a good time to tune the ext3 file system on your partitions.

The last thing we need to do is edit GRUB's configuration so that we can select our backup root OS from the selection menu.

# mkdir /mnt/boot
# mount /dev/md0 /mnt/boot
# vi /mnt/boot/grub/grub.conf

Things that we'll want to do here (you could also accomplish this by booting the server in normal mode and editing grub.conf there using a more comfortable text editor):

a) Change the timeout=5 value to timeout=15 (or 30 or 60). By default, CentOS doesn't give you very long to pick an alternate boot. I find 5 seconds to be too short of a window, especially on a unit where the storage controller takes a minute or two to scan and setup the drives.

b) Copy the latest "title" section and change "root=/dev/md2" to "root=/dev/md3". I always make the "EMERGENCY" boot option the 2nd one in the list.

# mkdir /mnt/backuproot
# mount /dev/md3 /mnt/backuproot
# vi /mnt/backuproot/etc/sysconfig/network

I like to change the hostname to have "-emergency" tacked onto the end. Which should make it fairly obvious that we are booting up in emergency mode using the backup root partition. I also edit root's .bash_profile to set PS1.

Okay, that was a lot of setup work just to prepare for implementing iSCSITarget (or any other kernel rebuild), but it's always worth it.

Final notes:

- When I test booted the emergency root partition, things didn't work as planned. So while my concept is sound, I may have screwed something up. I think it's an error with /etc/fstab in the emergency partition, so I'll troubleshoot that later.

- It's also possible that you'll need to do a GRUB install on all (3) of the primary mirror disks.

Step 2 - Downloading and compiling the iSCSITarget software

So far, I've found (2) links to be useful here. One is Moving on.... and the other is iSCSI Enterprise Target を CentOS5 にインストールする (japanese). While the 2nd link is in Japanese, it shows the commands in english.

Head over to the The iSCSI Enterprise Target page and download the latest tarball containing the source code. The current version is 0.4.15. If you're using Firefox in CentOS's Gnome shell, it will probably prompt you to open the file with the archive manager. I created a subfolder under /root/iscsitarget-0.4.15 and extracted the contents there.

You will also need to go into Applications -> Add/Remove Software and add the development tools and libraries to your system. (Mostly you just need gcc.)

As noted on jackshck's page, you will also need to install the following packages:

openssl-devel (I installed the x86_64 version)
kernel-devel (again, I'm using the x86_64 version)

Open up a terminal window and go to where you extracted the iscsitarget tarball (I put mine in /root/iscsitarget-0.4.15).

# ls -l /usr/src/kernels
(make note of the kernel folder)
# make KSRC=/usr/src/kernels/2.6.18-8.1.4.el5-x86_64/
# make KSRC=/usr/src/kernels/2.6.18-8.1.4.el5-x86_64/ install

Now we can start up the ietd daemon:

# /etc/init.d/iscsi-target start

And add it to our default runlevel (this is similar to the rc-update command in Gentoo Linux):

# chkconfig iscsi-target on

Step 3 - Creating a target

This is where we get into the nitty-gritty and where I need to take a break and do some research. The /etc/ietd.conf file already exists at this point, but only contains a commented out sample configuration.

Notes:

Dec 21 2007 - The comment about iSCSITarget software for Microsoft Windows really isn't on-topic. But I'll go ahead and list the link to it, but not as an HTML link. Pricing for the real version is currently $395 (Server) or $995 (Professional). And personally, there's no way that I'd recommend running a SAN on top of Microsoft Windows (even Server 2003, which is a nice product).

Thursday, May 10, 2007

Dealing with a failed Software RAID device

As part of my server setup, I like to make sure that plans are working as expected... which means intentionally breaking things like RAID sets.

In this particular case I have a triple-active RAID1 mirror set on the first 3 disks in the system (/dev/sda, /dev/sdb, /dev/sdc). In this RAID1 set, all 3 disks are active, with no hot-spare. I prefer this over a (2) active (1) hot-spare setup because it allows for up to 2 disks to fail before you lose data. And if I'm already dedicating a hot-spare spindle solely for the use of the RAID1 set, I may as well get to use it. The output of /proc/mdstat looks similar to (note that none of the slices are tagged with a "(S)").:

md2 : active raid1 sdc2[2] sdb2[1] sda2[0]
7911936 blocks [3/3] [UUU]


Each disk has quite a few md devices associated with it. In this particular case I have /dev/md0 up through /dev/md5 created. Probably one of the few downsides to SoftwareRAID is that you end up with quite a few md devices to keep track of. But such is the price for just about the ultimate flexibility.

GRUB Note: In a mirrored setup, you must make sure to install GRUB to the MBR (master boot record) on all of the mirror disks. Some Linux distros don't do this on their own and you'll have to do it yourself. Otherwise, when the first disk in the mirror set fails, you'll find you're left with an unbootable system. This is also why I like to make copies of the MBR for each disk in the system (# dd if=/dev/sda of=dd-sda-mbr-date.img bs=512 count=1).

So, after making very good backups, I decided it was time to test whether I could pull a disk and survive. To make sure that I had taken care of the GRUB issue, I shutdown the server and pulled the primary drive in the RAID set.

# cat /proc/mdstat
md2 : active raid1 sdc2[2] sdb2[1]
7911936 blocks [3/3] [_UU]


Ah good, mdadm is *not* happy here (as expected). It knows that one of the disks has failed in the array. So let's shutdown and replace the failed drive with a blank one. In this case, I used a spare drive that I had laying around that had been previously wiped. (Or, with care, you could zero out the drive that you pulled.)

I recomend using "sfdisk" in dump mode to configure the new drive. So if your failed drive is "sda" and one of the good ones is "sdb", you could use:

# sfdisk -d /dev/sdb | sfdisk /dev/sda

After which, you can use the "mdadm" command to add the new slices to the existing RAID arrays.

# mdadm --add /dev/mdX /dev/sdYZ

Last, don't forget to install GRUB to the MBR on the new disk.

Monday, May 07, 2007

Brute force disaster recovery for CentOS5

Today's trick is moving a CentOS5 system from an old set of disks over to a new set of disks. Along the way, I'll create an image of the system to allow me to restore it later on.

The CentOS5 system is a fresh install running RAID-1 across (3) disks using Linux Software RAID (mdadm). There are (4) primary partitions (boot, root, swap, LVM) with no data on the LVM partition.

(Why 3 active disks? The normal setup for this server was RAID-1 across 2 disks with a hot-spare. Rather then have a risky window of time where one disk has failed and the hot-spare is synchronizing with the remaining good disk, I prefer to have all 3 disks running. That way, when a disk dies, we still have 2 disks in action. The mdadm / Software RAID doesn't seem to care and it doesn't seem to affect performance at all.)

Because this is RAID-1, capturing the disk layout and migrating over to the new disks will be very easy. It's also a very fresh install, so I'm just going to grab the disk contents using "dd" (most of the partition's sectors are still zeroed out from the original install). Once I've backed up the (3) partitions on the first drive, I'm going to pull the (3) drives and replace them with the new ones.

I'll get the machine up and running with the first replacement drive, then configure the blank 2nd and 3rd drives and add them to the RAID set. That is, if mdadm doesn't beat me to the punch and start the sync on the 2nd/3rd disks automatically.

If things go bad, I can always drop the original disks back in the unit and power it back up. I plan on keeping them around for a few days, just in case. I'll have to recreate the LVM volumes, but there aren't any yet (just a PV and a VG).

One advantage of pulling the old drives out completely and rebuilding using fresh drives - I'll end up with a tested disaster recovery process.

Now for the nitty gritty. I'm using a USB pocket drive formatted with ext3 for the rescue work. Make sure that you plug this in before booting the rescue CD.

  1. Login to the system and power it down.
  2. Boot the CentOS5 install DVD
  3. At the "boot:" enter "linux rescue"
  4. Work your way through the startup dialogs
  5. When prompted whether to mount your linux install, choose "Skip"

This should give you a command shell with useful tools. So let's poke around and check on our system.

  1. Looking at "cat /proc/mdstat" shows that while the mdadm software is running, it has not assembled any RAID arrays.
  2. The "fdisk -l" command shows us that the (3) existing disks are named sda, sdb, sdc. Each has (4) partitions (boot, root, swap, LVM).
  3. My USB drive showed up as "/dev/sdd" so I'll create a "/backup" folder and mount it using "mkdir /backup ; mount /dev/sdd1 /backup ; df -h"

Naturally, we should create a sub-folder under /backup for each machine and possibly create another folder underneath it using today's date. We should grab information about the current disk layout and store it in a text file (fdisk.txt).

  1. # cd /backup ; mkdir machinename ; cd machinename
  2. # mkdir todaysdate ; cd todaysdate
  3. # fdisk -l > fdisk.txt

Now to grab the boot loader and image the two critical partitions (boot and root). We'll grab the boot loader off of all (3) drives because it's so small (and it may not be properly synchronized).

  1. dd if=/dev/sda bs=512 count=1 of=machinename-date-sda.mbr
  2. dd if=/dev/sdb bs=512 count=1 of=machinename-date-sdb.mbr
  3. dd if=/dev/sdb bs=512 count=1 of=machinename-date-sdc.mbr
  4. dd if=/dev/sda1 | gzip > machinename-date-ddcopy-sda1.img.gz
  5. dd if=/dev/sda2 | gzip > machinename-date-ddcopy-sda2.img.gz

Total disk space for my system was around 1.75GB worth of compressed files (8GB root, 250MB boot). You could also use bzip2 if you need more compression. Unfortunately, the CentOS5 DVD does not include the "split" command, which could cause issues if you're trying to write to a filesystem that can't handle files over 2GB in size.

Now you should shut the box back down, burn those files to DVD-R, install the new (blank) disks, and boot from the install DVD again. Again, mount the drive that holds the rescue image files to a suitable path.

  1. dd of=/dev/sda bs=512 count=1 if=machinename-date-sda.mbr
  2. fdisk /dev/sda (fix the last partition)
  3. dd if=/dev/sda bs=512 count=1 of=/dev/sdb
  4. dd if=/dev/sda bs=512 count=1 of=/dev/sdc

That will restore the MBR and partition table from the old drive to the new one. If your new drive has a different size, then the last partition will be incorrectly sized for the disk. Fire up "fdisk" and delete / recreate the last partition on the disk.

Restore the two partition images:

  1. # gzip -dc machinename-date-ddcopy-sda1.img.gz | dd of=/dev/sda1
  2. # gzip -dc machinename-date-ddcopy-sda2.img.gz | dd of=/dev/sda2

At this point, we should be able to boot the system on the primary drive and have Software RAID come up in degraded mode for the arrays. Things that will need to be done once the unit boots:

  1. Tell mdadm about the 2nd (and 3rd) disks and tell it to bring those partitions into the arrays and synchronize them.
  2. Create a new swap area
  3. Recreate the LVM physical volume (PV) and volume group (VG)
  4. Restore any data from the LVM area (we had none in this example)

Getting the Software RAID back up and happy is the trickiest of the steps.

  1. Login as root, open up a terminal window
  2. # cat /proc/mdstat
  3. # fdisk -l
  4. Notice that the swap area on our system is sda3, sdb3, sdc3 and will need to be loaded as /dev/md1.
  5. # mdadm --create /dev/md1 -v --level=raid1 --raid-devices=3 /dev/sda3 /dev/sdb3 /dev/sdc3
  6. # mkswap /dev/md1 ; swapon /dev/md1
  7. Now we're ready to recreate the LVM area
  8. # mknod /dev/md3 b 9 3
  9. # mdadm --create /dev/md3 -v --level=raid1 --raid-devices=3 /dev/sda4 /dev/sdb4 /dev/sdc4
  10. # pvcreate /dev/md3 ; vgcreate vg /dev/md3
  11. Finally, we should add the 2nd and 3rd drive to md0 and md2.
  12. mdadm --add /dev/md0 /dev/sdc1
  13. mdadm --add /dev/md0 /dev/sdb1

Note: If your triple mirror RAID array puts the additional disks in as spares, make sure that you have (a) grown the number of raid devices to 3 for the RAID1 set and (b) make sure that there are no other arrays synchronizing as the same time. It's also best to add the elements one at a time, rather then adding both at the same time. I'm not sure if it's a bug in mdadm or just the way it works, but it took me two tries to get my triple mirror back up with all disks marked as "active" instead of (2) active and (1) hot-spare.