Friday, May 20, 2011

FSVS on Ubuntu 10.04 LTS (Lucid Lynx)

(See also my older post on installing FSVS on CentOS 5.5.)

While I prefer CentOS / RHEL for our servers, I do have a few Ubuntu machines laying around that I use as desktops.  And given my desire to track things using FSVS as much as possible, that means I need to install FSVS under Ubuntu as well.

Note: While you can install fsvs via apt-get with "apt-get install fsvs", the version included right now in the Ubuntu repositories is only FSVS 1.1.17.  This is fairly old code from around 2008.  The latest version is 1.2.3  and was released in January 2011.

Step 1: Create the server user and repository

On our SVN server, we'll need to setup a user account and create a repository to hold the files.  All of our repositories are kept under /var/svn and we create users and groups named "svn-sys-somesystem".  The individual system repository gets named sys-somesystem.

# cd /var/svn
# svnadmin create sys-somesystem
# chmod -R 750 sys-somesystem
# chmod -R g+s sys-somesystem/db
# useradd -m svn-sys-somesystem
# chown -R svn-sys-somesystem:svn-sys-somesystem sys-somesystem
# passwd  svn-sys-somesystem
(give it a very long, very random password)
Changing password for user svn-sys-somesystem.
New UNIX password:
Retype new UNIX password:
passwd: all authentication tokens updated successfully.
# su svn-sys-somesystem
$ mkdir ~/.ssh
$ chmod 700 ~/.ssh
$ cd ~/.ssh

At which point we're ready to paste the SSH key from the other system in.  Switch to the system that you will be adding FSVS to.

Step 2: Setting up SSH keys

Login to the system which you will be adding as a FSVS client.  Under Ubuntu, this means a lot of 'sudo' work.  Note that lines ending in '\' should be concatenated together to form a single command.  You'll need to create a .ssh/config file so that SSH knows how to talk to the SVN server.

$ sudo mkdir /root/.ssh
$ sudo chmod 700 /root/.ssh
$ sudo /usr/bin/ssh-keygen -N '' \
-C 'svn key for root@hostname' \
-t rsa -b 2048 -f /root/.ssh/fsvs-key
Generating public/private rsa key pair.
Your identification has been saved in /root/.ssh/fsvs-key.
Your public key has been saved in /root/.ssh/fsvs-key.pub.
The key fingerprint is:
ff:ee:dd:cc:bb:aa:99:88:77:66:55:44:33:22:11:00 svn key for root@hostname
$ sudo vim /root/.ssh/config
Host svn.yoursvnserver.com
Port 22
User svn-sys-somesystem
IdentityFile /root/.ssh/fsvs-key
$ sudo chmod 600 /root/.ssh/config
$ sudo chmod 600 /root/.ssh/fsvs-key
$ sudo chmod 600 /root/.ssh/fsvs-key.pub
$ sudo cat /root/.ssh/fsvs-key.pub

Copy this key into the clipboard or send it to the SVN server or the SVN server administrator. Back on the SVN server, you'll need to finish configuration of the user that will add files to the SVN repository.

# su svn-sys-somesystem
$ cd ~/.ssh
$ cat >> ~/.ssh/authorized_keys

The line for the SSH key should start with the following, which locks down the SSH key a bit and should only allow it to be used to run /usr/bin/svnserve.

command="/usr/bin/svnserve -t -r /var/svn",no-agent-forwarding,no-pty,no-port-forwarding,no-X11-forwarding

So a full SSH key line in the authorized_keys files will end up looking like:


command="/usr/bin/svnserve -t -r /var/svn",no-agent-forwarding,no-pty,no-port-forwarding,no-X11-forwarding ssh-rsa (long SSH key) (ssh key comment)

Hit Ctrl-C when finished pasting in the key.

$ chmod 600 ~/.ssh/authorized_keys

Now we can go back to the client machine where FSVS will be installed and test that our SSH connection works.

$ sudo svn.yoursvnserver.com
The authenticity of host '[svn.yoursvnserver.com]:22 ([192.168.0.1]:22)' can't be established.
RSA key fingerprint is 99:88:77:66:55:44:66:33:22:11:00:55:ff:ee:dd:aa.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '[svn.yoursvnserver.com]:22,[192.168.0.1]:22' (RSA) to the list of known hosts.
PTY allocation request failed on channel 0
( success ( 2 2 ( ) ( edit-pipeline svndiff1 absent-entries commit-revprops depth log-revprops partial-replay ) ) ) Connection to svn.yoursvnserver.com closed.

If you don't get the SVN pipeline information, then the SSH keys are not configured properly, or you forgot to chmod a file back to 600 (usually the authorized_keys file).

Step 3: Installing FSVS

The FSVS install tarball is available at fsvs.tigris.org.

$ cd /usr/local/src
$ sudo wget http://download.fsvs-software.org/fsvs-1.2.3.tar.bz2
$ sudo tar xjf fsvs-1.2.3.tar.bz2
$ sudo chown -R username:username fsvs-1.2.3/
$ cd fsvs-1.2.3/

Now we are ready to configure and compile FSVS.  The following command will check the environment and tell us whether libraries are missing.

$ ./configure

Since we already know that we'll need to install a bunch of things, here is the apt-get command.  Note that if you need to find a development version of a particular package, then "apt-cache search apr | grep 'dev'" may be useful.

$ sudo apt-get update
$ sudo apt-get install build-essential
$ sudo apt-get install libpcre3-dev
$ sudo apt-get install libaprutil1-dev
$ sudo apt-get install libsvn-dev
$ sudo apt-get install libgdbm-dev

Once all that is installed, the "./configure" should run cleanly.  If it doesn't, then you're probably missing some library and will have to add it.

$ ./configure
$ make

Which will compile and link the FSVS program.

$ sudo cp src/fsvs /usr/local/sbin/
$ sudo chown root:root /usr/local/sbin/fsvs
$ sudo chmod 700 /usr/local/sbin/fsvs

Step 4: Association with the SVN repository

$ cd /
$ sudo mkdir /var/spool/fsvs
$ sudo mkdir /etc/fsvs/
$ cd /
$ sudo fsvs urls svn+ssh://svn.yoursvnserver.com/sys-somesystem/

Step 5: Telling FSVS what to ignore

When constructing ignore patterns, generally work on adding a few directories at a time to the SVN repository. Everyone has different directories that they won't want to version, so you'll need to tailor the following to match your configuration. However, I generally recommend starting with the following (this is the output from "fsvs ignore dump", which you can pipe into a file, edit, then pipe back into "fsvs ignore load"):

group:ignore,./backup/
group:ignore,./bin/
group:ignore,./cdrom/
group:ignore,./dev/
group:ignore,./etc/fsvs/
group:ignore,./etc/gconf/
group:ignore,./etc/gdm/
group:ignore,./etc/shadow*
group:ignore,./etc/ssh/ssh_host_key
group:ignore,./etc/ssh/ssh_host_dsa_key
group:ignore,./etc/ssh/ssh_host_rsa_key
group:ignore,./home/
group:ignore,./lib/
group:ignore,./lib32/
group:ignore,./lib64/
group:ignore,./lost+found
group:ignore,./media/
group:ignore,./mnt/
group:ignore,./proc/
group:ignore,./root/
group:ignore,./sbin/
group:ignore,./selinux/
group:ignore,./srv/
group:ignore,./sys/
group:ignore,./tmp/
group:ignore,./usr/bin/
group:ignore,./usr/games/
group:ignore,./usr/include/
group:ignore,./usr/lib/
group:ignore,./usr/lib32/
group:ignore,./usr/lib64/
group:ignore,./usr/local/games/
group:ignore,./usr/sbin/
group:ignore,./usr/share/
group:ignore,./usr/src/
group:ignore,./var/backups/
group:ignore,./var/cache/
group:ignore,./var/games/
group:ignore,./var/lib/
group:ignore,./var/lock/
group:ignore,./var/log/
group:ignore,./var/mail/
group:ignore,./var/opt/
group:ignore,./var/run/
group:ignore,./var/spool/
group:ignore,./var/tmp/

$ vim ~/fsvs-ignores-201105
$ sudo fsvs ignore load < ~/fsvs-ignores-201105

You can check what FSVS is going to version by using the "sudo fsvs status pathname" command (such as "fsvs status /etc"). Once you are happy with the selection in a particular path, you can do the following command:

$ sudo fsvs ci -m "base check-in" /etc

Repeat this for the various top level trees until you have checked everything in. Then you should do one last check-in at the root level that catches anything you might have missed.

Wednesday, May 18, 2011

SubVersion - splitting apart a very large repository

Back when we started using SVN in 2006, we went for ease-of-use and easy administration by putting all of our projects into a single repository.  At the time it was a few gigabytes in size and not a big deal.  Fast-forward 4 years and we're starting to wish we had split the repository up by client / project boundaries.  The tree looked like:

/A/ClientA/ProjectA1
/A/ClientA/ProjectA2
/A/ClientA2/ProjectA2A
/B/ClientB/ProjectB1
...

So my current project is to take the 18GB repository with about 13,000 revisions and split it out and re-base the paths so that the project directories are the top level of the repository.  Unfortunately, over the years, files have been copied / moved, folders have vanished / moved / been renamed, etc., so there's the potential for interesting fun.  This is made even trickier since we're doing the re-base a few levels down.

Warning: When you do a split, the default result is that the new split repositories will have the same SVN repository UUID (unique ID) as the original repository.  That is why the last step in this process is "svnadmin setuuid /path/to/new/repo".  You can see the UUID of an existing repository by using "svnlook uuid /path/to/repo".

Step 1: Raw Dump

First off, I suggest making a raw dump of the original repository, piped through 'gzip' which will make the next few steps faster.  Naturally, if anyone commits things to the old repository after this point those changes won't be migrated.  So you will want to address that issue by limiting access to the original repository, or work on the repository in sections and periodically update your raw dump to capture new changes before you start on the next section.  For our purposes, we simply said "these particular projects are off-limits until Thursday" and worked on a set of projects each week.

Note: all of the following is a single command.

# svnadmin dump --quiet /path/to/svn-repo |
gzip > /path/to/svn-raw-dump-svn-repo.may2011.dump.gz

That will create a .gz file that is about 30-50% larger then the old repository.  Our gzip'd dump file ended up at 44% larger (26GB vs 18GB).  Without gzip, the uncompressed dump file would have been a lot larger (between 5x and 6x larger then the gzip'd file).  The main benefits are that it gives you a static source to work with, shortens up the later command lines slightly, and it's easier to see how all this works if you do it bit by bit.  You'll probably want to also copy that .gz file off to permanent archival storage after this is all done. 

(bzip2 would have created a 15-20% smaller file, but it also would have taken 2x-3x longer to create the file.  As it is, the CPU was the bottleneck for creating this initial dump file and is the bottleneck in some other steps as well.)

Step 2: Filtering the dump file

This process breaks out the single project directory that we want and puts it in its own dump file.  We will repeat this command once for every project that we're breaking out to a separate repository.  We drop any empty revisions and renumber those that remain during this set.  It will renumber the revisions starting at 1 and the new file will end up with a much lower revision count.  We're not adjusting the paths within the repository during this step.


# gunzip -c /path/to/svn-raw-dump-svn-repo.may2011.dump.gz |
svndumpfilter include --quiet --drop-empty-revs  --renumber-revs 
A/ClientA/ProjectA1 > /var/svn/svn-raw-ClientA-ProjectA1.dump

Notes:
- Leave the leading '/' off of the path that you want to include.
- Leave the trailing '/' off of the path that you want to include.

Running a search on the new dump file reveals the new revision numbers.

# grep 'Revision-number' /var/svn/svn-raw-ClientA-ProjectA1.dump
...
Revision-number: 60
Revision-number: 61
Revision-number: 62

Note that if you attempt to load the project-specific dump file into a new repository at this point it will fail.  That is because the parent directories do not exist in the repository that you are loading into.  But if you create those parent folders, you can then import the dump file into the new repository at this point.  I suggest creating a new scratch repository with "svnadmin create /var/svn/ProjectA-Test1", create the necessary parent folders, then do a "svnadmin load /path/to/repo < /path/to/dump" to verify that you understand this step.


Step 3: Re-basing the project

Note: Depending on how many folder renames are in your original repository, you may have lots of trouble with the following.  In which case you should skip this and just load the dump file into the new repository without re-basing the paths.  Don't forget to change the UUID on the new repositories after loading.

The next step is to move A/ClientA/ProjectA1 back to the root of the repository during the import process.  We will do this by editing the dump file with 'sed' before loading it back in.  In the dump file, there are two types of lines that contain path information.  One starts with 'Node-path:' and the other starts with 'Node-copyfrom-path:'.  This is how 'svnadmin load' keeps track of what goes where in the repository tree.

# grep '^Node-path:' /var/svn/svn-raw-ClientA-ProjectA1.dump
Node-path: A/ClientA/ProjectA1
Node-path: A/ClientA/ProjectA1/Data
Node-path: A/ClientA/ProjectA1/Doc
Node-path: A/ClientA/ProjectA1/Trunk
...
# grep '^Node-copyfrom-path:' /var/svn/svn-raw-ClientA-ProjectA1.dump


Notes:
- There is never a leading slash ('/') and never a trailing slash ('/').
- The Node-path: argument cannot be empty.
- The parent directory must already exist in the SVN repository in order for a load to succeed.  So in order to load the above node paths, you would have to manually create the "A/ClientA" directory tree first.


As stated, we can use 'sed' to transform these path names on the fly.  And the following set of lines is all a single command.

# cat /var/svn/svn-raw-ClientA-ProjectA1.dump |
sed 's/Node-path: A\/ClientA\//Node-path: /' |
sed 's/Node-copyfrom-path: A\/ClientA\//Node-copyfrom-path: /' >
/var/svn/svn-newbase-ClientA-ProjectA1.dump

So if a line reads "Node-path: A/ClientA/ProjectA1" in the input, it will look like "Node-path: ProjectA1" in the output.

Now you can load this into the new repository.

# svnadmin load /path/to/new/repo < /var/svn/svn-newbase-ClientA-ProjectA1.dump


Step 4: Changing the UUID of the new repository

As I mentioned before, when you do a split like this, the repository UUID will end up as the UUID of the original repository after the "svnadmin load" step.  You can verify this behavior using the "svnlook uuid /path/to/repo" command.  You can change the UUID manually, or just have a new one assigned automatically with the "svnadmin setuuid" command.

# svnadmin setuuid /path/to/new/repo

Step 5: Verify the new repository, make backups

After you load the new repository, take an hour and verify that all of the project folders made it intact and that the version history is intact.  Then make a backup of the new repository.

Tuesday, May 10, 2011

Linux Rescue: mdadm will not assemble arrays

I'm in the process of trying to fix a server with a dead motherboard where we used software RAID.  During one of the repair steps, I had to boot the CentOS install CD and use the "linux rescue" option because I need to do some editing to files on the hard drive to reflect the new disk layout.

Normally, you would expect this to be dead easy, just tell mdadm to assemble the arrays that you're interested in and then mount the file systems.  Instead, you're going to end up stuck on the following error:

mdadm --assemble /dev/md# /dev/sda# /dev/sdb#
mdadm: no devices found for /dev/md#

Why?  Because in CentOS / Red Hat rescue mode, mdadm requires the use of UUIDs and not device partition names.  So for every array that you want to mount (if you don't let the rescue CD do it automatically), you will have to use the format:

mdadm --assemble --uuid 01234568:90abcdef:01234568:90abcdef /dev/md#

To get the UUID, you can use the --examine option of mdadm.

mdadm --examine /dev/sda# | grep 'UUID'
UUID : 01234568:90abcdef:01234568:90abcdef

(This occurred with the CentOS 5.5 install CDs.)