Tuesday, June 27, 2006

DHCP and Dynamic DNS Updates with BIND

After a few hours last night (and a lot of help from DNS & BIND Cookbook) I finally got my DHCP server on the Gentoo box to automatically add records to my local LAN's DNS zone. It's not terribly difficult to do, just a little tricky until you get all the ducks in a row and put everything in the right place.

First off, I can't recommend enough making use of SubVersion to keep track of configuration / zone file changes. It greatly simplifies things in case you have to revert to a previous version. It's also a good way to get comfortable with SVN command-line usage because you're not doing anything complex (mostly 'svn add' and 'svn ci' commands).

Requirements (I will assume the following):

  • net-misc/dhcp-3.0.1-r1 or later
  • net-dns/bind-9.3.2 or later
  • net-dns/bind-tools-9.2.5 or later
  • Simple network configuration using a single Class-C private network range
  • You have SubVersion configured
  • You have configured symbolic links to map /etc/named to /etc/bind (or vice-versa), /var/named to /var/bind (or vice-versa), and /var/log/named to /var/log/bind (or vice-versa).


Most home / small office networks use the DHCP server on the router (usually a LinkSys, D-Link, NetGear, etc. appliance between the internet and the internal LAN) and make use of their ISPs DNS servers. This works well until you want to reference other machines on your local network by name. Once you need that you should look into setting up DNS services and DHCP services on your Gentoo/Linux server.

A. The first step is to define your address range for your home network. One suggestion that I make to ALL of my clients is that they use anything other then "0" in the 3rd octet (i.e. 192.168.0.XXX) of the network address range. Having zero (or one) in the 3rd octet causes problems later if you want to link two network together using a VPN tunnel. So while you're making the change-over to a new DHCP/DNS server you may as well change your network address assignments.

For the example network, I will be using an address range of: 192.168.102.XXX. In addition, I've defined:

192.168.102.1 - The default gateway (internal address of the router)
192.168.102.2 - Static address of our Linux server
192.168.102.100 to .199 - DHCP address range

The DNS zone for my internal network is "lan.example.com", so each machine will get a DNS name of "machine-a.lan.example.com" when it gets assigned a DHCP address. This makes it easy for one machine to contact another machine on the same network segment.

B. Make sure /etc/bind and /etc/dhcp are in SubVersion

# cd /etc
etc # svn add -N bind
etc # svn add -N named
etc # svn add -N dhcp
etc # svn ci -m "initial entry of dynamic DNS configuration"
etc # svn add bind/*
etc # svn add dhcp/*
etc # svn ci -m "initial entry of dynamic DNS configuration"
etc # cd /var
etc # svn add -N bind
etc # svn add -N named
etc # svn ci -m "initial entry of dynamic DNS configuration"
etc # cd /var/bind
bind # svn add *
bind # svn ci -m "initial entry of dynamic DNS configuration"


C. Create a symetric encryption (authentication?) key

In order for the DHCP server to update the DNS zone files in a secure manner, you need to create a symetric key using the "dnssec-keygen" command. This key can be anywhere from 1 to 512 bits but I would recommend at least 128 bits if not 256 bits. The "dnssec-keygen" command will create a pair of files that contain the key (since it's symetric encryption both files will have the same key).

# cd /etc/bind
bind # dnssec-keygen -a HMAC-MD5 -b 256 -n HOST dhcp.lan.example.com
Kdhcp.lan.example.com.+157+20479
bind # ls -l Kdhcp*
-rw------- 1 root root 84 Jun 27 16:24 Kdhcp.lan.example.com.+157+20479.key
-rw------- 1 root root 101 Jun 27 16:24 Kdhcp.lan.example.com.+157+20479.private
bind # cat Kdhcp.lan.example.com.+157+20479.key
dhcp.lan.example.com. IN KEY 512 3 157 swxJJ6mo6tAoSlAlUv6yGxvbCz5DKCLX1FF3U4Jl4Qc=
bind # cat Kdhcp.lan.example.com.+157+20479.private
Private-key-format: v1.2
Algorithm: 157 (HMAC_MD5)
Key: swxJJ6mo6tAoSlAlUv6yGxvbCz5DKCLX1FF3U4Jl4Qc=
bind # svn add K*
A Kdhcp.lan.example.com.+157+20479.key
A Kdhcp.lan.example.com.+157+20479.private
bind # svn ci -m "created DDNS update key"
Adding bind/Kdhcp.lan.example.com.+157+20479.key
Adding bind/Kdhcp.lan.example.com.+157+20479.private
Transmitting file data ..
Committed revision 10.
bind #


Nothing terribly complicated here. Just make sure that you replace "dhcp.lan.example.com" with the name of your DHCP server's FQDN (fully qualified domain name).

D. Now we can construct the named.conf file. I use a semi-complex method with sub-files in order to make things simpler in the long run.

You'll need to replace 192.168.102.XXX with your local LAN address as well as changing "dhcp.lan.example.com" to match the name of your DHCP server's FQDN.

bind # vim named.conf
options {
directory "/var/named"; // sets root dir, use full path to escape
statistics-file "/var/named/named.stats"; // stats are your friend
dump-file "/var/named/named.dump";
zone-statistics yes;
allow-recursion { 127.0.0.1; 192.168.102.0/24; }; // allow recursive lookups
// allow-transfer { 192.168.102.3; }; // allow transfers to these IP's
// notify yes; // notify the above IP's when a zone is updated
// location of pid file:
pid-file "/var/run/named/named.pid";
transfer-format many-answers; // Generates more efficient zone transfers
};

key dhcp.lan.example.com. {
algorithm hmac-md5;
secret "swxJJ6mo6tAoSlAlUv6yGxvbCz5DKCLX1FF3U4Jl4Qc=";
};

// Include logging config file
include "/var/named/conf/logging.conf";

// Include to ACLs
include "/var/named/conf/acls.conf";

// Include custom
include "/var/named/conf/lan.conf";
include "/var/named/conf/reverse.conf";
bind # svn ci -m "updating for DDNS"


E. Create the logging.conf and acls.conf files in /var/bind/conf.

# cd /var/bind/conf
conf # vim logging.conf
logging {

channel default_file { file "/var/log/named/default.log" versions 3 size 5m; severity dynamic; print-time yes; };
channel general_file { file "/var/log/named/general.log" versions 3 size 5m; severity dynamic; print-time yes; };
channel database_file { file "/var/log/named/database.log" versions 3 size 5m; severity dynamic; print-time yes; };
channel security_file { file "/var/log/named/security.log" versions 3 size 5m; severity dynamic; print-time yes; };
channel config_file { file "/var/log/named/config.log" versions 3 size 5m; severity dynamic; print-time yes; };
channel resolver_file { file "/var/log/named/resolver.log" versions 3 size 5m; severity dynamic; print-time yes; };
channel xfer-in_file { file "/var/log/named/xfer-in.log" versions 3 size 5m; severity dynamic; print-time yes; };
channel xfer-out_file { file "/var/log/named/xfer-out.log" versions 3 size 5m; severity dynamic; print-time yes; };
channel notify_file { file "/var/log/named/notify.log" versions 3 size 5m; severity dynamic; print-time yes; };
channel client_file { file "/var/log/named/client.log" versions 3 size 5m; severity dynamic; print-time yes; };
channel unmatched_file { file "/var/log/named/unmatched.log" versions 3 size 5m; severity dynamic; print-time yes; };
channel queries_file { file "/var/log/named/queries.log" versions 3 size 5m; severity dynamic; print-time yes; };
channel network_file { file "/var/log/named/network.log" versions 3 size 5m; severity dynamic; print-time yes; };
channel update_file { file "/var/log/named/update.log" versions 3 size 5m; severity dynamic; print-time yes; };
channel dispatch_file { file "/var/log/named/dispatch.log" versions 3 size 5m; severity dynamic; print-time yes; };
channel dnssec_file { file "/var/log/named/dnssec.log" versions 3 size 5m; severity dynamic; print-time yes; };
channel lame-servers_file { file "/var/log/named/lame-servers.log" versions 3 size 5m; severity dynamic; print-time yes; };

category default { default_file; };
category general { general_file; };
category database { database_file; };
category security { security_file; };
category config { config_file; };
category resolver { resolver_file; };
category xfer-in { xfer-in_file; };
category xfer-out { xfer-out_file; };
category notify { notify_file; };
category client { client_file; };
category unmatched { unmatched_file; };
category queries { queries_file; };
category network { network_file; };
category update { update_file; };
category dispatch { dispatch_file; };
category dnssec { dnssec_file; };
category lame-servers { lame-servers_file; };

};
conf # vim acls.conf
acl "our-networks" {
192.168.102.0/24;
127.0.0.1;
};
conf #


F. Create the two config files for the LAN and the reverse DNS.

# cd /var/bind/conf
conf # vim lan.conf
zone "lan.example.com" {
type master;
file "lan/lan.example.com";
update-policy {
grant dhcp.lan.example.com. wildcard *.lan.example.com. A TXT;
};
};
conf # vim reverse.conf
zone "102.168.192.in-addr.arpa" {
type master;
file "reverse/192.168.102.0";
update-policy {
grant dhcp.lan.example.com. wildcard *.102.168.192.in-addr.arpa. PTR;
};
};
conf # svn add *
conf # svn ci -m "updating config for DDNS"


Make sure that you set the user/group ownership of the config files to "named"

conf # ls -l *.conf
total 16
-rw-r--r-- 1 named named 70 Dec 12 2005 acls.conf
-rw-r--r-- 1 named named 214 Jun 27 18:28 lan.conf
-rw-r--r-- 1 named named 2662 Dec 12 2005 logging.conf
-rw-r--r-- 1 root root 224 Jun 27 18:28 reverse.conf
conf # chown named *.conf
conf # chgrp named *.conf


G. Create the zone files for the forward and reverse DNS zones

Note: Remember to add folders and files to SubVersion and to assign the user/group to "named".

# cd /var/bind/lan
lan # vim lan.example.com
$ORIGIN .
$TTL 600 ; 10 minutes
lan.example.com IN SOA dhcp.lan.example.com. dns.example.com. (
2006062604 ; serial
3600 ; refresh (1 hour)
900 ; retry (15 minutes)
1209600 ; expire (2 weeks)
3600 ; minimum (1 hour)
)
NS dhcp.lan.example.com.
A 192.168.102.2
$ORIGIN lan.example.com.
dhcp A 192.168.102.2
router A 192.168.102.1
localhost A 127.0.0.1
lan # cd /var/bind/reverse
reverse # vim 192.168.102.0
$TTL 600
; 192.168.102.0 (reverse DNS)
@ IN SOA dhcp.lan.example.com. (
dns.example.com.
2006062602 ; serial
1h ; refresh
15m ; retry
2w ; expire
1h ; minimum
)
IN NS dhcp.lan.example.com.

; static PTR records
2.102.168.192.in-addr.arpa. IN PTR servername.lan.example.com.
2.102.168.192.in-addr.arpa. IN PTR dhcp.lan.example.com.
reverse #


H. That should be it

That should be everything that is needed. If not, leave me a note in a comment and I'll re-examine my notes.

Monday, June 26, 2006

SPF gaining traction?

As I was signing up for a few mailing lists, I was glancing through the confirmation / welcome messages and I see that there are now some SPF headers showing up in those checks. The ezmlm list manager program includes the full body of the subscribe/confirmation message in its responses:

Received-SPF: pass (asf.osuosl.org: domain of tgh@tgharold.com designates 69.36.9.168 as permitted sender)
Received: from [69.36.9.168] (HELO omega.jtlnet.com) (69.36.9.168)
by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 25 Jun 2006 19:04:13 -0700


A quick glance through my other mailing list confirmation messages didn't turn up any more. Other mailing list software doesn't reflect back the sign-up mail message so it's not possible to see if SPF checks are being used or not.

I run a very strict SPF record for this domain. I wish companies would check the SPF record before bouncing messages back to me (due to joe-jobs by spammers).

Tuesday, June 20, 2006

VIA EPIA Gentoo Migration

Currently, I was using a VIA EPIA system as my music server, but now I'm thinking about turning it into a smart router for the home office. This will entail adding a ethernet card to the unit in addition to migrating my hard drives from a pair of 300GB drives to a pair of notebook drives (for less power). Since I'm using Software RAID, moving from one set of disks to another should be nearly seamless.

The hardware has changed a little bit since my previous attempt at building the system back in March 2005, but the BIOS settings are identical. The current hardware consists of:

(1) VIA EPIA ME6000 (EPIA M series), 600Mhz fanless CPU
(2) 300GB 5400rpm hard drives
(1) DVD-ROM
(1) Morex Venus 668 Black Case
(1) 1GB PC2100 DIMM

I'm going to replace the two 300GB 3.5" drives with less power-hungry 60GB laptop drives. The basic process is:

  1. Detach the DVD-ROM (which happens to be the master drive on the 2nd cable) and connect the laptop drive. That will allow me to work with the new drives one at a time while I migrate from the old to the new.
  2. Copy the boot sector from the old /dev/hda to the new laptop drive: dd if=/dev/hda bs=512 count=1 of=/dev/hdc
  3. Verify that the first (3) partitions on the new drive are identically sized as the original drive.
  4. Copy the filesystem from the old drive to the new drive: dd if=/dev/hda1 of=/dev/hdc1
  5. Install grub on the new disk.
  6. Shutdown
  7. Remove the old /dev/hda 300GB drive, move the new laptop drive into place
  8. Restart the system, verify the Software RAID. I had to tell mdadm to add the /dev/hda partitions to the arrays: mdadm /dev/md0 -a /dev/hda1


Note: I forgot to install grub on the new disk this last time. So I need to boot from the LiveCD, chroot into the O/S and re-install grub on the new disks.

hde: dma_intr: bad DMA status (dma_stat=35)

Getting the following messages in my system log:

nogitsune etc # grep 'Jun 16 07' /var/log/messages
Jun 16 07:38:51 nogitsune ntpd[6095]: peer 205.166.121.66 now invalid
Jun 16 07:52:27 nogitsune ntpd[6095]: peer 205.166.121.66 now valid
Jun 16 07:07:52 nogitsune hde: dma_intr: bad DMA status (dma_stat=35)
Jun 16 07:07:52 nogitsune hde: dma_intr: status=0x50 { DriveReady SeekComplete }
Jun 16 07:07:52 nogitsune ide: failed opcode was: unknown
Jun 16 07:16:09 nogitsune hde: dma_intr: bad DMA status (dma_stat=35)
Jun 16 07:16:09 nogitsune hde: dma_intr: status=0x50 { DriveReady SeekComplete }
Jun 16 07:16:09 nogitsune ide: failed opcode was: unknown
Jun 16 07:17:10 nogitsune hde: dma_intr: bad DMA status (dma_stat=35)
Jun 16 07:17:10 nogitsune hde: dma_intr: status=0x50 { DriveReady SeekComplete }
Jun 16 07:17:10 nogitsune ide: failed opcode was: unknown
Jun 16 07:18:13 nogitsune hde: dma_intr: bad DMA status (dma_stat=35)
Jun 16 07:18:13 nogitsune hde: dma_intr: status=0x50 { DriveReady SeekComplete }
Jun 16 07:18:13 nogitsune ide: failed opcode was: unknown
Jun 16 07:23:35 nogitsune hdg: dma_intr: bad DMA status (dma_stat=35)
Jun 16 07:23:35 nogitsune hdg: dma_intr: status=0x50 { DriveReady SeekComplete }
Jun 16 07:23:35 nogitsune ide: failed opcode was: unknown
Jun 16 07:25:52 nogitsune hdg: dma_intr: bad DMA status (dma_stat=35)
Jun 16 07:25:52 nogitsune hdg: dma_intr: status=0x50 { DriveReady SeekComplete }
Jun 16 07:25:52 nogitsune ide: failed opcode was: unknown
Jun 16 07:39:52 nogitsune hde: dma_intr: bad DMA status (dma_stat=35)
Jun 16 07:39:52 nogitsune hde: dma_intr: status=0x50 { DriveReady SeekComplete }
Jun 16 07:39:52 nogitsune ide: failed opcode was: unknown
Jun 16 07:42:35 nogitsune hdg: dma_intr: bad DMA status (dma_stat=35)
Jun 16 07:42:35 nogitsune hdg: dma_intr: status=0x50 { DriveReady SeekComplete }
Jun 16 07:42:35 nogitsune ide: failed opcode was: unknown
Jun 16 07:43:11 nogitsune hdg: dma_intr: bad DMA status (dma_stat=35)
Jun 16 07:43:11 nogitsune hdg: dma_intr: status=0x50 { DriveReady SeekComplete }
Jun 16 07:43:11 nogitsune ide: failed opcode was: unknown
Jun 16 07:45:15 nogitsune hdg: dma_intr: bad DMA status (dma_stat=35)
Jun 16 07:45:15 nogitsune hdg: dma_intr: status=0x50 { DriveReady SeekComplete }
Jun 16 07:45:15 nogitsune ide: failed opcode was: unknown
Jun 16 07:48:05 nogitsune hde: dma_intr: bad DMA status (dma_stat=35)
Jun 16 07:48:05 nogitsune hde: dma_intr: status=0x50 { DriveReady SeekComplete }
Jun 16 07:48:05 nogitsune ide: failed opcode was: unknown
Jun 16 07:48:59 nogitsune hdg: dma_intr: bad DMA status (dma_stat=35)
Jun 16 07:48:59 nogitsune hdg: dma_intr: status=0x50 { DriveReady SeekComplete }
Jun 16 07:48:59 nogitsune ide: failed opcode was: unknown
Jun 16 07:52:03 nogitsune hdg: dma_intr: bad DMA status (dma_stat=35)
Jun 16 07:52:03 nogitsune hdg: dma_intr: status=0x50 { DriveReady SeekComplete }
Jun 16 07:52:03 nogitsune ide: failed opcode was: unknown
Jun 16 07:56:23 nogitsune hde: dma_intr: bad DMA status (dma_stat=35)
Jun 16 07:56:23 nogitsune hde: dma_intr: status=0x50 { DriveReady SeekComplete }
Jun 16 07:56:23 nogitsune ide: failed opcode was: unknown
nogitsune etc #


Basically, they happen anytime that I put a sustained load onto the 2 drives attached to that PCI card. At the moment, I'm not sure (might be in the archives) which IDE host adapter I'm using for hde and hdg.

From my limited digging, it appears that it may have something to do with lost interrupts, except that I'm not seeing any other messages in the logs regarding that.

Saturday, June 17, 2006

Editing user IDs associated with a GPG key

Back when one of my users created their GPG keys, they put some bogus text in the Comment field because they didn't realize the public nature of the field. So their name looks like:

Joe Smith (bogus text) jsmith@example.com

Which isn't really what we wanted it to look like. So the question is how to adjust the key on the fly using WinPT and publish the changes. We could just revoke the key and create a new one, but that would require re-signing and re-doing trust information for the new key.

According to various sources, the proper way to do this is with the "REVUID" command (not the "DELUID" command). While you can never remove a UID associated with your keys from the public key servers, a revocation tells people that the old identity (UID) is no longer used.

Performing the key edit in WinPT:

  1. Open up the WinPT key manager, find your key, right-click and choose "Key Edit"
  2. The "Key Edit" window will display showing the keys associated with this key set (top pane) and the User IDs (UIDs) associated with the key set (lower pane).
  3. Highlight the incorrect UID, pick "REVUID" from the Command list and click "Ok"
  4. You will be prompted for your passphrase. Enter it.
  5. You will be asked to confirm this operation.
  6. The Validity column for this UID will now say "Revoked".
  7. In the Command list, choose "ADDUID" and click "OK".
  8. Enter the correct information for your new ID. Remember that all 3 fields are public information. Comment is typically either the name of your company or a website URL.
  9. Backup your keys (especially the secret key).
  10. Distribute your updated public key.


For information on backing up a key, see my previous post on GPG4Win.

Note #1: If you have multiple UIDs associated with a key, you can use the "PRIMARY" command to flag one of the UIDs as the default UID to display in the key list. Simply select "PRIMARY" from the Command list, highlight the UID you want as the primary and click "OK". However, this only works in WinPT... in most other implementations, the default UID is the last one added to the key.

Note #2: Prior to exporting the key or giving it to anyone else, you can use the DELUID to remove UIDs from the key. But once you have published a UID for a particular key, only the REVUID command will do what you want.

Thursday, June 15, 2006

TrueCrypt - Encrypted USB Drive

TrueCrypt comes in handy for securing external USB or Firewire drives. Especially when those drives are used for backups of sensitive files or if you are going to ship the drives from point A to point B. Or even if you are worried about someone swiping the drive and mounting it on another workstation to access files that you have stored there.

Plus, as long as you know the passphrase and/or have the keyfiles used to decrypt the volume, you can move the USB device from workstation to workstation without losing access to the content.

A. right-click on My Computer, choose "Manage"

  1. Under "Storage", go to "Disk Management"
  2. Find the USB drive that you wish to convert to TrueCrypt (note that this will DESTROY all data on the USB drive)
  3. Remove any existing partitions / drive letters assigned to the USB drive.

B. Create the new partition on the USB drive

  1. Right-click, New Partition
  2. Create a "Primary" partition
  3. Use the entire drive (or only part of the drive if you wish)
  4. Do not assign a drive letter
  5. Do not format the partition
  6. Click "Finish", note the "Disk #"

C. Create the TrueCrypt drive on the partition

  1. Open up TrueCrypt, click on "Create Volume"
  2. Create a standard TrueCrypt volume
  3. Click on "Select Device" and choose the empty USB disk and partition
  4. Double-check that you've selected the correct device
  5. Encryption algorithm: AES, Hash: RIPEMD-160
  6. Size cannot be adjusted
  7. Enter your passphrase twice
  8. Begin the format (NTFS for anything over a few gigabytes)


Once the partition has been formatted with TrueCrypt you can then return to the TrueCrypt window and mount the drive to a drive letter. If this drive is always connected to the system you may wish to mount it upon login by making it a "favorite" volume in TrueCrypt.

Wednesday, June 14, 2006

SubVersion for Linux Administrators

Updated: 26 Aug 2006

I *think* I have this figured out. After banging my head against the wall for a few months, I finally figured out how to put my /etc configuration folder (and config files) into SubVersion so that I have version control over them.

Assumptions:

  1. You need to have SubVersion 1.2 or 1.3 (or later) installed
  2. You have a folder called /var/svn where you keep your repositories
  3. I'm assuming that you've su'd to the root account
  4. Make sure you have a clean system (run etc-update first)

I think that's the only requirements.

Step #1 - Create the repository

The name of the repository can be anything you want. I tend to name it after the machine name (but you could use the name "system", "config", "admin" or anything else). In this particular case, I'm setting it up on a machine called "nogitsune" which is my Gentoo AMD64 system.

Creating the repository is easy:

# svnadmin create /var/svn/nogitsune

Replace "nogitsune" with the preferred name of your repository. I'd suggest keeping the name as short as possible in case you have to type it by hand later (i.e. "nogitsune", "copper", "alpha", "san1pri", "xen-athens").

Step #2 - Checkout the repository to the root folder

nogitsune etc # cd /
nogitsune / # svn co file:///var/svn/nogitsune .
Checked out revision 0.
nogitsune / # svn status
? media
? lib64
? tgh
? root
? home
? var
? lost+found
? software
? sbin
? mnt
? tmp
? opt
? boot
? proc
? backup
? lib
? bin
? usr
? lib32
? etc
? dev
? sys
nogitsune / #


As long as the "svn status" command returns something like the above, we know that we've connected properly to the SubVersion repository. You can also look for the ".svn" directory in the root.

nogitsune / # ls -la .svn
total 40
drwxr-xr-x 7 root root 4096 Jun 14 21:28 .
drwxr-xr-x 24 root root 4096 Jun 14 21:28 ..
-r--r--r-- 1 root root 118 Jun 14 21:28 README.txt
-r--r--r-- 1 root root 0 Jun 14 21:28 empty-file
-r--r--r-- 1 root root 283 Jun 14 21:28 entries
-r--r--r-- 1 root root 2 Jun 14 21:28 format
drwxr-xr-x 2 root root 4096 Jun 14 21:28 prop-base
drwxr-xr-x 2 root root 4096 Jun 14 21:28 props
drwxr-xr-x 2 root root 4096 Jun 14 21:28 text-base
drwxr-xr-x 6 root root 4096 Jun 14 21:28 tmp
drwxr-xr-x 2 root root 4096 Jun 14 21:28 wcprops
nogitsune / # cat .svn/entries

xmlns="svn:">
committed-rev="0"
name=""
committed-date="2006-06-15T01:26:51.238552Z"
url="file:///var/svn/nogitsune"
kind="dir"
uuid="21f0cb31-3916-0410-ae38-e44852334012"
revision="0"/>

nogitsune / #


Step #3 - Adding directories and files to SubVersion

It's important to understand a little bit how "svn add" and "svn commit" work hand-in-hand. Just because we've issued the "svn add" command does not mean that our changes have been pushed to the repository. That requires using the "svn commit" command.

For the first example, I'm going to push the contents of /boot into the Subversion repository.

nogitsune / # mount /boot
nogitsune / # ls -la /boot
total 41261
drwxr-xr-x 4 root root 2048 Nov 29 2005 .
drwxr-xr-x 24 root root 4096 Jun 14 21:28 ..
-rw-r--r-- 1 root root 0 Jul 27 2005 .keep
-rw-r--r-- 1 root root 917680 Nov 12 2005 System.map-2.6.13-12Nov2005
-rw-r--r-- 1 root root 910027 Nov 12 2005 System.map-2.6.13-12Nov2005-2300
-rw-r--r-- 1 root root 910027 Nov 12 2005 System.map-2.6.13-12Nov2005-2330
-rw-r--r-- 1 root root 898088 Nov 12 2005 System.map-2.6.13-13Nov2005
-rw-r--r-- 1 root root 898098 Nov 13 2005 System.map-2.6.13-13Nov2005-1700
-rw-r--r-- 1 root root 917799 Nov 13 2005 System.map-2.6.13-13Nov2005-1810
-rw-r--r-- 1 root root 917372 Nov 13 2005 System.map-2.6.13-13Nov2005-1948
-rw-r--r-- 1 root root 837348 Nov 14 2005 System.map-2.6.13-14Nov2005-1500
-rw-r--r-- 1 root root 918716 Nov 14 2005 System.map-2.6.13-14Nov2005-1600
-rw-r--r-- 1 root root 846033 Nov 21 2005 System.map-2.6.13-21Nov2005-2300
-rw-r--r-- 1 root root 888601 Nov 22 2005 System.map-2.6.13-22Nov2005-0030
-rw-r--r-- 1 root root 890145 Nov 29 2005 System.map-2.6.13-29Nov2005-2148
-rw-r--r-- 1 root root 916779 Nov 8 2005 System.map-2.6.13-8Nov2005
-rw-r--r-- 1 root root 919480 Nov 9 2005 System.map-2.6.13-9Nov2005
lrwxrwxrwx 1 root root 1 Nov 8 2005 boot -> .
-rw-r--r-- 1 root root 23975 Nov 12 2005 config-2.6.13-12Nov2005
-rw-r--r-- 1 root root 23749 Nov 12 2005 config-2.6.13-12Nov2005-2300
-rw-r--r-- 1 root root 23738 Nov 12 2005 config-2.6.13-12Nov2005-2330
-rw-r--r-- 1 root root 23782 Nov 12 2005 config-2.6.13-13Nov2005
-rw-r--r-- 1 root root 23197 Nov 13 2005 config-2.6.13-13Nov2005-1700
-rw-r--r-- 1 root root 24091 Nov 13 2005 config-2.6.13-13Nov2005-1810
-rw-r--r-- 1 root root 24106 Nov 13 2005 config-2.6.13-13Nov2005-1948
-rw-r--r-- 1 root root 22579 Nov 14 2005 config-2.6.13-14Nov2005-1500
-rw-r--r-- 1 root root 23986 Nov 14 2005 config-2.6.13-14Nov2005-1600
-rw-r--r-- 1 root root 23084 Nov 21 2005 config-2.6.13-21Nov2005-2300
-rw-r--r-- 1 root root 23427 Nov 22 2005 config-2.6.13-22Nov2005-0030
-rw-r--r-- 1 root root 23416 Nov 29 2005 config-2.6.13-29Nov2005-2148
-rw-r--r-- 1 root root 23986 Nov 8 2005 config-2.6.13-8Nov2005
-rw-r--r-- 1 root root 23964 Nov 9 2005 config-2.6.13-9Nov2005
drwxr-xr-x 2 root root 1024 Nov 9 2005 grub
-rw-r--r-- 1 root root 2139460 Nov 12 2005 kernel-2.6.13-12Nov2005
-rw-r--r-- 1 root root 2081829 Nov 12 2005 kernel-2.6.13-12Nov2005-2300
-rw-r--r-- 1 root root 2081802 Nov 12 2005 kernel-2.6.13-12Nov2005-2330
-rw-r--r-- 1 root root 2056584 Nov 12 2005 kernel-2.6.13-13Nov2005
-rw-r--r-- 1 root root 2064792 Nov 13 2005 kernel-2.6.13-13Nov2005-1700
-rw-r--r-- 1 root root 2105823 Nov 13 2005 kernel-2.6.13-13Nov2005-1810
-rw-r--r-- 1 root root 2099454 Nov 13 2005 kernel-2.6.13-13Nov2005-1948
-rw-r--r-- 1 root root 1958868 Nov 14 2005 kernel-2.6.13-14Nov2005-1500
-rw-r--r-- 1 root root 2142469 Nov 14 2005 kernel-2.6.13-14Nov2005-1600
-rw-r--r-- 1 root root 2008155 Nov 21 2005 kernel-2.6.13-21Nov2005-2300
-rw-r--r-- 1 root root 2017013 Nov 22 2005 kernel-2.6.13-22Nov2005-0030
-rw-r--r-- 1 root root 2024425 Nov 29 2005 kernel-2.6.13-29Nov2005-2148
-rw-r--r-- 1 root root 2139807 Nov 8 2005 kernel-2.6.13-8Nov2005
-rw-r--r-- 1 root root 2151371 Nov 9 2005 kernel-2.6.13-9Nov2005
drwx------ 2 root root 12288 Nov 8 2005 lost+found
nogitsune / # svn add -N boot
A boot
nogitsune / # cd boot
nogitsune boot # ls -la .svn
total 11
drwxr-xr-x 7 root root 1024 Jun 14 21:33 .
drwxr-xr-x 5 root root 2048 Jun 14 21:33 ..
-r--r--r-- 1 root root 118 Jun 14 21:33 README.txt
-r--r--r-- 1 root root 0 Jun 14 21:33 empty-file
-r--r--r-- 1 root root 190 Jun 14 21:33 entries
-r--r--r-- 1 root root 2 Jun 14 21:33 format
drwxr-xr-x 2 root root 1024 Jun 14 21:33 prop-base
drwxr-xr-x 2 root root 1024 Jun 14 21:33 props
drwxr-xr-x 2 root root 1024 Jun 14 21:33 text-base
drwxr-xr-x 6 root root 1024 Jun 14 21:33 tmp
drwxr-xr-x 2 root root 1024 Jun 14 21:33 wcprops
nogitsune boot # svn add .keep System* config* kernel* grub boot
A .keep
A System.map-2.6.17-25Aug2006-2300
A config-2.6.17-25Aug2006-2300
A (bin) kernel-2.6.17-25Aug2006-2300
A grub
A grub/menu.lst
A (bin) grub/splash.xpm.gz
A grub/grub.conf.sample
A (bin) grub/e2fs_stage1_5
A (bin) grub/fat_stage1_5
A (bin) grub/ffs_stage1_5
A (bin) grub/iso9660_stage1_5
A (bin) grub/jfs_stage1_5
A (bin) grub/minix_stage1_5
A (bin) grub/reiserfs_stage1_5
A (bin) grub/stage1
A (bin) grub/stage2
A (bin) grub/stage2_eltorito
A (bin) grub/ufs2_stage1_5
A (bin) grub/vstafs_stage1_5
A (bin) grub/xfs_stage1_5
A grub/grub.conf
A boot
nogitsune boot # svn status
? boot
? lost+found
A .
A grub
A grub/grub.conf
A grub/stage1
A grub/stage2
A grub/e2fs_stage1_5
A grub/xfs_stage1_5
A grub/vstafs_stage1_5
A grub/fat_stage1_5
A grub/grub.conf.sample
A grub/menu.lst
A grub/ffs_stage1_5
A grub/stage2_eltorito
A grub/iso9660_stage1_5
A grub/ufs2_stage1_5
A grub/jfs_stage1_5
A grub/reiserfs_stage1_5
A grub/minix_stage1_5
A grub/splash.xpm.gz
A .keep
A System.map-2.6.17-25Aug2006-2300
A kernel-2.6.17-25Aug2006-2300
A config-2.6.17-25Aug2006-2300
nogitsune boot # cd /
nogitsune / # svn commit -m "Initial snapshot of /boot"
Adding boot
Adding boot/.keep
Adding boot/System.map-2.6.17-25Aug2006-2300
Adding boot/boot
Adding boot/config-2.6.17-25Aug2006-2300
Adding boot/grub
Adding (bin) boot/grub/e2fs_stage1_5
Adding (bin) boot/grub/fat_stage1_5
Adding (bin) boot/grub/ffs_stage1_5
Adding boot/grub/grub.conf
Adding boot/grub/grub.conf.sample
Adding (bin) boot/grub/iso9660_stage1_5
Adding (bin) boot/grub/jfs_stage1_5
Adding boot/grub/menu.lst
Adding (bin) boot/grub/minix_stage1_5
Adding (bin) boot/grub/reiserfs_stage1_5
Adding (bin) boot/grub/splash.xpm.gz
Adding (bin) boot/grub/stage1
Adding (bin) boot/grub/stage2
Adding (bin) boot/grub/stage2_eltorito
Adding (bin) boot/grub/ufs2_stage1_5
Adding (bin) boot/grub/vstafs_stage1_5
Adding (bin) boot/grub/xfs_stage1_5
Adding (bin) boot/kernel-2.6.17-25Aug2006-2300
Transmitting file data ......................
Committed revision 1.
nogitsune / #


Most of that should be self explanatory. You can see that I use "svn add -N boot" from the / (root) directory to add the /boot directory, then I move into the /boot folder and issue a selective "svn add". Pay attention to the "-N" option which prevents the add from recursing down through subdirectories. You should also take care to only add files that you control as an administrator to the svn repository (avoid adding things like "lost+found" or "._cfg*" files).

If you want to version control something that is 3 levels deep, you need to "svn add -N foldername" for each level in the tree until you get deep enough to add the file. It might be possible to do it faster in one command, but I'm still learning SubVersion.

For the second example, I'm going to add everything in /etc to SubVersion.

# cd /
# svn add -N etc
# cd etc
# svn add *
# svn commit -m "Initial snapshot of /etc"


That's the basics. For the third example, I'll show how to add custom scripts stored in /usr/local/sbin.

# cd /
# svn add -N usr ; cd usr
# svn add -N local ; cd local
# svn add -N sbin ; cd sbin
# svn add *
# cd /
# svn commit -m "Initial snapshot of /usr/local/sbin"


Step #4 - Creating a cron job to backup your SubVersion repositories

On my systems, I create a folder called /backup which is a separate set of spindles that I mount for quick backups. Under that folder, I create a sub-folder called "subversion".

# ls -l /backup/subversion
total 96
-rw-r--r-- 1 root root 20 Nov 30 2005 dev.svnadmin.dump.2005.11.gz
-rw-r--r-- 1 root root 20 Dec 31 02:00 dev.svnadmin.dump.2005.12.gz
-rw-r--r-- 1 root root 20 Jan 31 02:00 dev.svnadmin.dump.2006.01.gz
-rw-r--r-- 1 root root 20 Feb 28 02:00 dev.svnadmin.dump.2006.02.gz
-rw-r--r-- 1 root root 20 Mar 31 02:00 dev.svnadmin.dump.2006.03.gz
-rw-r--r-- 1 root root 20 Apr 30 02:00 dev.svnadmin.dump.2006.04.gz
-rw-r--r-- 1 root root 20 May 31 02:00 dev.svnadmin.dump.2006.05.gz
-rw-r--r-- 1 root root 20 Jun 14 02:00 dev.svnadmin.dump.2006.06.gz
-rw-r--r-- 1 root root 20 Nov 30 2005 photo.svnadmin.dump.2005.11.gz
-rw-r--r-- 1 root root 20 Dec 31 02:00 photo.svnadmin.dump.2005.12.gz
-rw-r--r-- 1 root root 20 Jan 31 02:00 photo.svnadmin.dump.2006.01.gz
-rw-r--r-- 1 root root 20 Feb 28 02:00 photo.svnadmin.dump.2006.02.gz
-rw-r--r-- 1 root root 20 Mar 31 02:00 photo.svnadmin.dump.2006.03.gz
-rw-r--r-- 1 root root 20 Apr 30 02:00 photo.svnadmin.dump.2006.04.gz
-rw-r--r-- 1 root root 20 May 31 02:00 photo.svnadmin.dump.2006.05.gz
-rw-r--r-- 1 root root 20 Jun 14 02:00 photo.svnadmin.dump.2006.06.gz
-rw-r--r-- 1 root root 20 Nov 30 2005 web.svnadmin.dump.2005.11.gz
-rw-r--r-- 1 root root 20 Dec 31 02:00 web.svnadmin.dump.2005.12.gz
-rw-r--r-- 1 root root 20 Jan 31 02:00 web.svnadmin.dump.2006.01.gz
-rw-r--r-- 1 root root 20 Feb 28 02:00 web.svnadmin.dump.2006.02.gz
-rw-r--r-- 1 root root 20 Mar 31 02:00 web.svnadmin.dump.2006.03.gz
-rw-r--r-- 1 root root 20 Apr 30 02:00 web.svnadmin.dump.2006.04.gz
-rw-r--r-- 1 root root 20 May 31 02:00 web.svnadmin.dump.2006.05.gz
-rw-r--r-- 1 root root 20 Jun 14 02:00 web.svnadmin.dump.2006.06.gz


As you can see from my backup folder, I have 3 repositories being backed up (dev, photo, web) and I rotate to a new backup filename every month. It's not ideal because a bad backup could cause me to lose up to 30 days of work, but it meets my needs. More risk-adverse admins may want to switch to a new backup file on a daily basis.

To create this backup, I use the following script:

nogitsune / # cd /usr/local/sbin
nogitsune sbin # ls -l svndaily.sh
-rwxr-xr-x 1 root root 646 Nov 27 2005 svndaily.sh
nogitsune sbin # cat svndaily.sh
#!/bin/sh
# backup subversion repositories to /backup/subversion/filename.year.month.gz
# notice the use of the backtick character (`) instead of single-quote character (')
# overwrites the backup file every day

BACKUPDATE=`date +%Y.%m`
#echo $BACKUPDATE

# svnadmin dump /var/svn/reponame | gzip -c > /backup/subversion/reponame.svnadmin.dump.${BACKUPDATE}.gz
svnadmin dump /var/svn/dev | gzip -c > /backup/subversion/dev.svnadmin.dump.${BACKUPDATE}.gz
svnadmin dump /var/svn/photo | gzip -c > /backup/subversion/photo.svnadmin.dump.${BACKUPDATE}.gz
svnadmin dump /var/svn/web | gzip -c > /backup/subversion/web.svnadmin.dump.${BACKUPDATE}.gz

nogitsune sbin #


Note: Each of the "svnadmin dump" lines should be all on one line and not split across two lines.

As far as I know, svndump is adequate to the task of backing up SubVersion repositories.

Tuesday, June 13, 2006

Now hosted at JTLNetworks

I ended up picking JTL Networks for my linux hosting. Now I'm in the middle of making sure that everything is correct in DNS and that the site is configured properly.

The old hosting company (HostOnce) is being a pain about closing down the accounts and allowing me to transfer my 2 other personal domains. But, frankly, their hosting service and my needs have been rapidly diverging for close to 2 years.

As always, getting setup at a new hosting provider involves a bit of trial and error. One thing that I've done is setup a separate FTP account for my blogs so that the blog software cannot write to directories other then the blog folders. (Those sub-accounts also use different passwords then my primary FTP account.)

Monday, June 12, 2006

Getting started with GPG4Win

EMail-Security using GnuPG for Windows - GPG4Win offers better integration of GnuPG into Windows them past products (such as using WinPT with the command-line version of GnuPG). That means that the user experience is a lot nicer and it doesn't seem as clunky.

You can download GPG4Win here. The current version is: 1.0.2

Notes:

  • GPGol (the MS Outlook plugin) only works with Microsoft Outlook 2003 (or later?), so if you are using older versions of MSOutlook be sure to *not* install this
  • You probably won't need to install Sylpheed-Claws either, unless you are looking for a new e-mail program
  • I prefer WinPT over GPA, but your tastes may be different

Installation:

  1. Download and run the gpg4win-1.0.2.exe file
  2. When you reach the "Choose Components" screen, you should deselect GPGol, GPA and Sypheed. And unless you speak German, you should deselect the Novice Manual and Advanced Manual components. So for most users you will only be installing: GnuPG, WinPT and GPGee.
  3. Click "Next" and proceed.
  4. At the "Install Options", I recommend only installing links to the "Start Menu" (and not the Desktop or Quick Launch bar).
  5. Finally, proceed forward (using the "Next") button until you reach the "Install" button.
  6. Clicking on "Install" will begin the installation.
  7. After installation finishes, you can click on "Next" and "Finish" to exit the installation wizard.

Getting started

  1. Go to "Start" --> "Programs" --> GnuPG for Windows --> WinPT
  2. That will start the WinPT application.
  3. If you have pre-existing GnuPG keyrings, you should probably select the import option (Copy GnuPG keyrings from another location). But you can also import existing keys at a later time.
  4. For now, we will create a GnuPG key pair
  5. Click on the "Expert" button
  6. Key type: DSA and ELG (default)
  7. Subkey size in bits: 2048 (you may wish to use 3072 or 4096)
  8. Real name: (enter the name that you wish to associate with this key) This name will appear alongside your key on public keyservers.
  9. Comment (optional): (typically a company name) Note that comments are public information and will appear alongside your key on the keyserver. Most people put their company name in this field, while others enter their website address (i.e. "www.tgharold.com").
  10. Email address: (enter the e-mail address associated with the key) Again, this is public information that will be on the keyservers to allow people to find your public key.
  11. Expire Date: Uncheck "Never" and enter an expiration date of a few years (I'd recommend 2 or 3 years).
  12. Click the "Start" button
  13. Enter the passphrase that you wish to use when protecting this key. I would recommend a rather strong one made up of numerous randomly picked words, letters, numbers and symbols. I will talk about protecting this passphrase later on.
  14. Repeat your passphrase in the new window. This is done to ensure that you didn't mistype it the first time.
  15. The progress dialog will now appear as GnuPG creates the keys for you. This can take a while as GnuPG needs to obtain random data from the system. You can speed the process up by typing nonsense into a document and moving the mouse in an erratic manner.
  16. When GnuPG finishes, it will pop up a window that says "Key Generation Completed"
  17. You will be offered the chance to backup your keyring. Click "Yes" and choose a location. I would recommend a USB key or a floppy disk as a backup target.
  18. The key has been created and is now listed in the WinPT Key Manager

Configuring WinPT Options

  1. Right-click on the WinPT icon in the System Tray
  2. Select Preferences --> WinPT
  3. Any options that I do not mention are optional and can be set to anything you desire. (Meaning that I don't have a specific recommendation for that option.)
  4. CHECK - Do not use any temporary files
  5. CHECK - Use clipboard viewer to display the plaintext
  6. Cache passphrase for N minutes should be set to a value that you are comfortable with. If you set your machine to automatically lock after 5 minutes, you could cache the passphrase for longer. But if you don't automatically lock your workstation whenever you are away from the machine you should choose a shorter timeout period.
  7. CHECK - Automatic keyring backup
  8. SELECT "Backup to" and choose a folder location that is on a drive other then C: (such as a USB key drive or a TrueCrypt volume)

Configuring GnuPG Options

  1. Right-click on the WinPT icon in the System Tray
  2. Select Preferences --> GPG
  3. There's nothing in particular that I feel needs to be changed here, but it does let you add a comment line for ASCII armored files.

Importing old keys into WinPT

  1. Right-click on the WinPT icon in the System Tray
  2. Select "Key Manager"
  3. Under the "Key" menu, select "Import"
  4. Browse to your old secring.gpg file
  5. Highlight the keys that you want to import and click "Import"
  6. For each key that you've imported, you will need to set the "trust" level of the key. Note that you can only set "owner/trust" values for keys that have not expired (see the "Validity" column in the key manager).
  7. Right-click the key and choose "Properties"
  8. If you are able to change the trust level, the "Change" button next to the "Ownertrust" field will be enabled. Click on "Change" and set your trust level for a particular key.
  9. Note: Trust values are important. Never set a trust level higher then you feel comfortable with. Verify that you have the right key and that you have validated the fingerprint of the key through a secure channel.
  10. 2nd Note: WinPT does sometimes crash after importing large quantities of keys. And you sometimes have to exit the Key Manager before you can see newly imported keys.

Final notes:

  • I would recommend not using the "encrypt current window" functionality of WinPT. It is not working properly for me at the moment. However, the encrypt/decrypt clipboard functionality works fine.
  • Make sure that you backup your secret key files

Backing up your secret key and passphrase on paper

  1. In the WinPT Key Manager, highlight your key
  2. From the menu, choose "Key" then "Export Secret Key"
  3. Export this key to a secure location (such as a USB key drive, a floppy disk, or a encrypted volume / folder)
  4. Open the .ASC file in Notepad
  5. Change the font size using "Format, Font...". I would suggest a font of "Courier New" in a 11 or 12 point font.
  6. Print out a copy of your private key block. That way, in a worst-case scenario, you could hand-enter (or OCR) it back into a new machine.
  7. Jot a note to yourself at the bottom of the page to remind yourself what the passphrase is for this secret key. You may wish to be explicit or simply leave yourself vague hints.
  8. Fold the paper up and place it into a "security" envelope. Security envelopes have printing on the inside of the envelope which is designed to prevent the contents of the letter from being read without opening the envelope. For additional security, you may wish to wrap a 2nd sheet of paper around your original sheet.
  9. You may also include the floppy diskette containing the secret key inside of the envelope.
  10. Seal the envelope
  11. Write something memorable (signature, today's date, a song that is playing on the radio) along the sealed flap. That will give you a chance to detect tampering if the attacker does not reseal the envelope in a way that the markings still line up.
  12. For additional security, place clear tape over the flap edge (and over your writing). That makes it more difficult to open without destroying your writing.
  13. Jot a note to yourself on the outside of the envelope (today's date, the e-mail address of the key)
  14. Place the envelope in a secure location (such as a bank vault, document safe), preferably at a location that is physically distant from your computer. You should keep this envelope as secure as you would your will or other important financial papers.

Sunday, June 11, 2006

Failing hard drive in a Software RAID

So today's fun is that I have a drive that is failing in my 566Mhz Celeron server. This is a small server with (3) 120GB hard drives.

hda - 120GB (primary drive, 4 partitions)
hdc - CD-ROM
hde - 120GB (second drive in the RAID1 sets, 4 partitions)
hdg - 120GB (backup drive)

During the rebuild of md3 (which is hda4+hde4) I'm getting constant aborts due to a bad block (or blocks) on hda.

# tail -n 500 /var/log/messages
Jun 10 23:17:16 coppermine hda: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
Jun 10 23:17:16 coppermine hda: task_in_intr: error=0x40 { UncorrectableError }, LBAsect=100789712, sector=100789712
Jun 10 23:17:16 coppermine ide: failed opcode was: unknown
Jun 10 23:17:16 coppermine end_request: I/O error, dev hda, sector 100789712
Jun 10 23:17:20 coppermine hda: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
Jun 10 23:17:20 coppermine hda: task_in_intr: error=0x40 { UncorrectableError }, LBAsect=100789723, sector=100789720
Jun 10 23:17:20 coppermine ide: failed opcode was: unknown
Jun 10 23:17:20 coppermine end_request: I/O error, dev hda, sector 100789720
Jun 10 23:17:20 coppermine raid1: hda: unrecoverable I/O read error for block 92537216
Jun 10 23:17:20 coppermine md: md3: sync done.
Jun 10 23:17:20 coppermine RAID1 conf printout:
Jun 10 23:17:20 coppermine --- wd:1 rd:2
Jun 10 23:17:20 coppermine disk 0, wo:0, o:1, dev:hda4
Jun 10 23:17:20 coppermine disk 1, wo:1, o:1, dev:hde4
Jun 10 23:17:20 coppermine RAID1 conf printout:
Jun 10 23:17:20 coppermine --- wd:1 rd:2
Jun 10 23:17:20 coppermine disk 0, wo:0, o:1, dev:hda4
Jun 10 23:17:20 coppermine RAID1 conf printout:
Jun 10 23:17:20 coppermine --- wd:1 rd:2
Jun 10 23:17:20 coppermine disk 0, wo:0, o:1, dev:hda4
Jun 10 23:17:20 coppermine disk 1, wo:1, o:1, dev:hde4
Jun 10 23:17:20 coppermine md: syncing RAID array md3
Jun 10 23:17:20 coppermine md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
Jun 10 23:17:20 coppermine md: using maximum available idle IO bandwith (but not more than 200000 KB/sec) for reconstruction.
Jun 10 23:17:20 coppermine md: using 128k window, over a total of 115926464 blocks.


And mdadm will continue to attempt to rebuild the array until the end of time. Which is rather pointless. So the second step is to more closely examine /dev/hda and see whether we're seeing the same block number.

# grep 'hda:' /var/log/messages
May 29 08:43:06 coppermine hda: Maxtor 4R120L0, ATA DISK drive
May 29 08:43:06 coppermine hda: max request size: 128KiB
May 29 08:43:06 coppermine hda: 240121728 sectors (122942 MB) w/2048KiB Cache, CHS=65535/16/63
May 29 08:43:06 coppermine hda: cache flushes supported
May 29 08:43:06 coppermine hda: hda1 hda2 hda3 hda4
Jun 8 22:32:02 coppermine hda: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
Jun 8 22:32:02 coppermine hda: task_in_intr: error=0x40 { UncorrectableError }, LBAsect=80342494, sector=80342480
Jun 8 22:32:04 coppermine hda: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
Jun 8 22:32:04 coppermine hda: task_in_intr: error=0x40 { UncorrectableError }, LBAsect=80342494, sector=80342488
Jun 8 22:32:05 coppermine raid1: hda: unrecoverable I/O read error for block 72089984
Jun 9 05:10:36 coppermine hda: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
Jun 9 05:10:36 coppermine hda: task_in_intr: error=0x40 { UncorrectableError }, LBAsect=100789712, sector=100789712
Jun 9 05:10:39 coppermine hda: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
Jun 9 05:10:39 coppermine hda: task_in_intr: error=0x40 { UncorrectableError }, LBAsect=100789722, sector=100789720
Jun 9 05:10:39 coppermine raid1: hda: unrecoverable I/O read error for block 92537216
Jun 9 08:26:40 coppermine hda: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
Jun 9 08:26:40 coppermine hda: task_in_intr: error=0x40 { UncorrectableError }, LBAsect=54393160, sector=54393152
Jun 9 08:26:42 coppermine hda: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
Jun 9 08:26:42 coppermine hda: task_in_intr: error=0x40 { UncorrectableError }, LBAsect=54393160, sector=54393160
Jun 9 08:26:42 coppermine raid1: hda: unrecoverable I/O read error for block 46140544
Jun 9 13:13:53 coppermine hda: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
Jun 9 13:13:53 coppermine hda: task_in_intr: error=0x40 { UncorrectableError }, LBAsect=100789712, sector=100789712
Jun 9 13:13:55 coppermine hda: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
Jun 9 13:13:55 coppermine hda: task_in_intr: error=0x40 { UncorrectableError }, LBAsect=100789722, sector=100789720
Jun 9 13:13:55 coppermine raid1: hda: unrecoverable I/O read error for block 92537216
Jun 10 18:30:21 coppermine hda: Maxtor 4R120L0, ATA DISK drive
Jun 10 18:30:21 coppermine hda: max request size: 128KiB
Jun 10 18:30:21 coppermine hda: 240121728 sectors (122942 MB) w/2048KiB Cache, CHS=65535/16/63
Jun 10 18:30:21 coppermine hda: cache flushes supported
Jun 10 18:30:21 coppermine hda: hda1 hda2 hda3 hda4
Jun 10 23:17:16 coppermine hda: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
Jun 10 23:17:16 coppermine hda: task_in_intr: error=0x40 { UncorrectableError }, LBAsect=100789712, sector=100789712
Jun 10 23:17:20 coppermine hda: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
Jun 10 23:17:20 coppermine hda: task_in_intr: error=0x40 { UncorrectableError }, LBAsect=100789723, sector=100789720
Jun 10 23:17:20 coppermine raid1: hda: unrecoverable I/O read error for block 92537216
Jun 11 04:08:06 coppermine hda: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
Jun 11 04:08:06 coppermine hda: task_in_intr: error=0x40 { UncorrectableError }, LBAsect=100789712, sector=100789712
Jun 11 04:08:08 coppermine raid1: hda: unrecoverable I/O read error for block 92537216


This shows me that I have a drive that almost always fails at the same block number each time. Another grep of the log files makes this even more clear:

# grep 'unrecoverable' /var/log/messages
Jun 8 22:32:05 coppermine raid1: hda: unrecoverable I/O read error for block 72089984
Jun 9 05:10:39 coppermine raid1: hda: unrecoverable I/O read error for block 92537216
Jun 9 08:26:42 coppermine raid1: hda: unrecoverable I/O read error for block 46140544
Jun 9 13:13:55 coppermine raid1: hda: unrecoverable I/O read error for block 92537216
Jun 10 23:17:20 coppermine raid1: hda: unrecoverable I/O read error for block 92537216
Jun 11 04:08:08 coppermine raid1: hda: unrecoverable I/O read error for block 92537216


So the first step (after backing up the system) is to stop the software RAID from attempting to constantly rebuild array "md3". You can do this with the mdadm tool's "manage mode" commands.

Well, maybe not. I've done a lot of digging in Google, but I can't figure out how to force mdadm to stop a sync that is in progress. So, I'm booting back to the original 2005.1 Gentoo boot CD so that I can manually control the process.

Note that an excellent resource is:
LVM2 and Software RAID in Linux (May 2005)

livecd ~ # fdisk -l

Disk /dev/hda: 122.9 GB, 122942324736 bytes
16 heads, 63 sectors/track, 238216 cylinders
Units = cylinders of 1008 * 512 = 516096 bytes

Device Boot Start End Blocks Id System
/dev/hda1 * 1 249 125464+ fd Linux raid autodetect
/dev/hda2 250 4218 2000376 fd Linux raid autodetect
/dev/hda3 4219 8187 2000376 fd Linux raid autodetect
/dev/hda4 8188 238200 115926552 fd Linux raid autodetect

Disk /dev/hde: 122.9 GB, 122942324736 bytes
16 heads, 63 sectors/track, 238216 cylinders
Units = cylinders of 1008 * 512 = 516096 bytes

Device Boot Start End Blocks Id System
/dev/hde1 * 1 249 125464+ fd Linux raid autodetect
/dev/hde2 250 4218 2000376 fd Linux raid autodetect
/dev/hde3 4219 8187 2000376 fd Linux raid autodetect
/dev/hde4 8188 238200 115926552 fd Linux raid autodetect

Disk /dev/hdg: 122.9 GB, 122942324736 bytes
16 heads, 63 sectors/track, 238216 cylinders
Units = cylinders of 1008 * 512 = 516096 bytes

Device Boot Start End Blocks Id System
/dev/hdg1 1 238000 119951968+ 8e Linux LVM

livecd ~ # modprobe md
livecd ~ # modprobe raid1
livecd ~ # ls -l /dev/md*
livecd ~ # for i in 0 1 2 3; do mknod /dev/md$i b 9 $i; done
livecd ~ # ls -l /dev/md*
brw-r--r-- 1 root root 9, 0 Jun 12 00:01 /dev/md0
brw-r--r-- 1 root root 9, 1 Jun 12 00:01 /dev/md1
brw-r--r-- 1 root root 9, 2 Jun 12 00:01 /dev/md2
brw-r--r-- 1 root root 9, 3 Jun 12 00:01 /dev/md3
livecd ~ # mdadm --assemble /dev/md0 /dev/hda1 /dev/hde1
mdadm: /dev/md0 has been started with 2 drives.
livecd ~ # cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 hda1[0] hde1[1]
125376 blocks [2/2] [UU]

unused devices:
livecd ~ #


So that starts up the /boot partition. Now I can check it for errors using e2fsck. The "-c" checks for bad blocks, the "-C" updates any inodes on the system with bad block information, and "-y" answers 'yes' to any questions.

livecd ~ # e2fsck -c -C -y -v /dev/md0
e2fsck 1.37 (21-Mar-2005)
Checking for bad blocks (read-only test): done 376
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information

/dev/md0: ***** FILE SYSTEM WAS MODIFIED *****

40 inodes used (0%)
3 non-contiguous inodes (7.5%)
# of inodes with ind/dind/tind blocks: 12/6/0
12593 blocks used (10%)
0 bad blocks
0 large files

26 regular files
3 directories
0 character device files
0 block device files
0 fifos
0 links
2 symbolic links (2 fast symbolic links)
0 sockets
--------
31 files
livecd ~ #


Next, I assemble the RAID1 set for the root volume.

livecd ~ # mdadm --assemble /dev/md2 /dev/hda3 /dev/hde3
mdadm: /dev/md2 has been started with 2 drives.
livecd ~ # cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 hda3[0] hde3[1]
2000256 blocks [2/2] [UU]

md1 : active raid1 hda2[0] hde2[1]
2000256 blocks [2/2] [UU]

md0 : active raid1 hda1[0] hde1[1]
125376 blocks [2/2] [UU]

unused devices:
livecd ~ # e2fsck -c -C -y -v /dev/md2
e2fsck 1.37 (21-Mar-2005)
Checking for bad blocks (read-only test): done 064
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information

/dev/md2: ***** FILE SYSTEM WAS MODIFIED *****

6434 inodes used (2%)
16 non-contiguous inodes (0.2%)
# of inodes with ind/dind/tind blocks: 75/2/0
390601 blocks used (78%)
0 bad blocks
0 large files

928 regular files
153 directories
1055 character device files
4025 block device files
0 fifos
0 links
264 symbolic links (264 fast symbolic links)
0 sockets
--------
6425 files
livecd ~ #


The rest of the system is more complex, LVM2 volumes on top of software RAID.

livecd ~ # modprobe dm-mod
livecd ~ # pvscan
PV /dev/hdg1 VG vgbackup lvm2 [114.39 GB / 82.39 GB free]
Total: 1 [114.39 GB] / in use: 1 [114.39 GB] / in no VG: 0 [0 ]
livecd ~ # vgscan
Reading all physical volumes. This may take a while...
Found volume group "vgbackup" using metadata type lvm2
livecd ~ # lvscan
inactive '/dev/vgbackup/backup' [32.00 GB] inherit
livecd ~ # lvchange -a y /dev/vgbackup/backup
/dev/cdrom: open failed: Read-only file system
livecd ~ # lvscan
ACTIVE '/dev/vgbackup/backup' [32.00 GB] inherit
livecd ~ # e2fsck -c -C -y -v /dev/vgbackup/backup
e2fsck 1.37 (21-Mar-2005)
Checking for bad blocks (read-only test): done 608
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information

/dev/vgbackup/backup: ***** FILE SYSTEM WAS MODIFIED *****

70 inodes used (0%)
14 non-contiguous inodes (20.0%)
# of inodes with ind/dind/tind blocks: 35/18/0
954693 blocks used (11%)
0 bad blocks
0 large files

55 regular files
6 directories
0 character device files
0 block device files
0 fifos
0 links
0 symbolic links (0 fast symbolic links)
0 sockets
--------
61 files
livecd ~ #


So far so good. But most of the errors are in /dev/md3. So I'm going to assemble /dev/md3 using just one of the drives (/dev/hde4).

livecd ~ # mdadm -v --assemble /dev/md3 /dev/hde4
mdadm: looking for devices for /dev/md3
mdadm: /dev/hde4 is identified as a member of /dev/md3, slot 2.
mdadm: added /dev/hde4 to /dev/md3 as 2
mdadm: /dev/md3 assembled from 0 drives and 1 spare - not enough to start the array.
livecd ~ # cat /proc/mdstat
Personalities : [raid1]
md3 : inactive hde4[2]
115926464 blocks
md2 : active raid1 hda3[0] hde3[1]
2000256 blocks [2/2] [UU]

md1 : active raid1 hda2[0] hde2[1]
2000256 blocks [2/2] [UU]

md0 : active raid1 hda1[0] hde1[1]
125376 blocks [2/2] [UU]

unused devices:


Unfortunately, mdadm is refusing to mount /dev/md3 using just /dev/hde4. So we have to force it:

livecd ~ # mdadm --create /dev/md3 --level 1 --force --raid-disks=1 /dev/hde4
mdadm: Cannot open /dev/hde4: Device or resource busy
mdadm: create aborted
livecd ~ # mdadm --stop /dev/md3
livecd ~ # mdadm --create /dev/md3 --level 1 --force --raid-disks=1 /dev/hde4
mdadm: /dev/hde4 appears to be part of a raid array:
level=1 devices=2 ctime=Sat Oct 22 20:51:12 2005
Continue creating array? y
mdadm: array /dev/md3 started.
livecd ~ # cat /proc/mdstat
Personalities : [raid1]
md3 : active raid1 hde4[0]
115926464 blocks [1/1] [U]

md2 : active raid1 hda3[0] hde3[1]
2000256 blocks [2/2] [UU]

md1 : active raid1 hda2[0] hde2[1]
2000256 blocks [2/2] [UU]

md0 : active raid1 hda1[0] hde1[1]
125376 blocks [2/2] [UU]

unused devices:
livecd ~ #livecd ~ # mdadm --create /dev/md3 --level 1 --force --raid-disks=1 /dev/hde4
mdadm: Cannot open /dev/hde4: Device or resource busy
mdadm: create aborted
livecd ~ # mdadm --stop /dev/md3
livecd ~ # mdadm --create /dev/md3 --level 1 --force --raid-disks=1 /dev/hde4
mdadm: /dev/hde4 appears to be part of a raid array:
level=1 devices=2 ctime=Sat Oct 22 20:51:12 2005
Continue creating array? y
mdadm: array /dev/md3 started.
livecd ~ # cat /proc/mdstat
Personalities : [raid1]
md3 : active raid1 hde4[0]
115926464 blocks [1/1] [U]

md2 : active raid1 hda3[0] hde3[1]
2000256 blocks [2/2] [UU]

md1 : active raid1 hda2[0] hde2[1]
2000256 blocks [2/2] [UU]

md0 : active raid1 hda1[0] hde1[1]
125376 blocks [2/2] [UU]

unused devices:
livecd ~ #


Now I can scan for LVM2 volumes on the md3 array.

livecd ~ # pvscan
PV /dev/md3 VG vgmirror lvm2 [110.55 GB / 52.55 GB free]
PV /dev/hdg1 VG vgbackup lvm2 [114.39 GB / 82.39 GB free]
Total: 2 [224.95 GB] / in use: 2 [224.95 GB] / in no VG: 0 [0 ]
livecd ~ # vgscan
Reading all physical volumes. This may take a while...
Found volume group "vgmirror" using metadata type lvm2
Found volume group "vgbackup" using metadata type lvm2
livecd ~ # lvscan
inactive '/dev/vgmirror/tmp' [4.00 GB] inherit
inactive '/dev/vgmirror/vartmp' [4.00 GB] inherit
inactive '/dev/vgmirror/opt' [2.00 GB] inherit
inactive '/dev/vgmirror/usr' [4.00 GB] inherit
inactive '/dev/vgmirror/var' [4.00 GB] inherit
inactive '/dev/vgmirror/home' [4.00 GB] inherit
inactive '/dev/vgmirror/pgsqldata' [16.00 GB] inherit
inactive '/dev/vgmirror/www' [4.00 GB] inherit
inactive '/dev/vgmirror/svn' [16.00 GB] inherit
ACTIVE '/dev/vgbackup/backup' [32.00 GB] inherit
livecd ~ # lvchange -a y /dev/vgmirror/tmp
/dev/cdrom: open failed: Read-only file system
livecd ~ # lvchange -a y /dev/vgmirror/vartmp
/dev/cdrom: open failed: Read-only file system
livecd ~ # lvchange -a y /dev/vgmirror/opt
/dev/cdrom: open failed: Read-only file system
livecd ~ # lvchange -a y /dev/vgmirror/usr
/dev/cdrom: open failed: Read-only file system
livecd ~ # lvchange -a y /dev/vgmirror/var
/dev/cdrom: open failed: Read-only file system
livecd ~ # lvchange -a y /dev/vgmirror/home
/dev/cdrom: open failed: Read-only file system
livecd ~ # lvchange -a y /dev/vgmirror/pgsqldata
/dev/cdrom: open failed: Read-only file system
livecd ~ # lvchange -a y /dev/vgmirror/www
/dev/cdrom: open failed: Read-only file system
livecd ~ # lvchange -a y /dev/vgmirror/svn
/dev/cdrom: open failed: Read-only file system
livecd ~ #


Now I can check all of the LVM2 file systems:

livecd ~ # lvscan
ACTIVE '/dev/vgmirror/tmp' [4.00 GB] inherit
ACTIVE '/dev/vgmirror/vartmp' [4.00 GB] inherit
ACTIVE '/dev/vgmirror/opt' [2.00 GB] inherit
ACTIVE '/dev/vgmirror/usr' [4.00 GB] inherit
ACTIVE '/dev/vgmirror/var' [4.00 GB] inherit
ACTIVE '/dev/vgmirror/home' [4.00 GB] inherit
ACTIVE '/dev/vgmirror/pgsqldata' [16.00 GB] inherit
ACTIVE '/dev/vgmirror/www' [4.00 GB] inherit
ACTIVE '/dev/vgmirror/svn' [16.00 GB] inherit
ACTIVE '/dev/vgbackup/backup' [32.00 GB] inherit
livecd ~ # e2fsck -c -C -y -v /dev/vgmirror/tmp
e2fsck 1.37 (21-Mar-2005)
Checking for bad blocks (read-only test): done 576
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information

/dev/vgmirror/tmp: ***** FILE SYSTEM WAS MODIFIED *****

15 inodes used (0%)
0 non-contiguous inodes (0.0%)
# of inodes with ind/dind/tind blocks: 0/0/0
16472 blocks used (1%)
0 bad blocks
0 large files

2 regular files
4 directories
0 character device files
0 block device files
0 fifos
0 links
0 symbolic links (0 fast symbolic links)
0 sockets
--------
6 files
livecd ~ # e2fsck -c -C -y -v /dev/vgmirror/vartmp
e2fsck 1.37 (21-Mar-2005)
Checking for bad blocks (read-only test): done 576
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information

/dev/vgmirror/vartmp: ***** FILE SYSTEM WAS MODIFIED *****

4771 inodes used (0%)
524 non-contiguous inodes (11.0%)
# of inodes with ind/dind/tind blocks: 285/1/0
52582 blocks used (5%)
0 bad blocks
0 large files

4480 regular files
282 directories
0 character device files
0 block device files
0 fifos
0 links
0 symbolic links (0 fast symbolic links)
0 sockets
--------
4762 files
livecd ~ # e2fsck -c -C -y -v /dev/vgmirror/opt
e2fsck 1.37 (21-Mar-2005)
Checking for bad blocks (read-only test): done 288
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information

/dev/vgmirror/opt: ***** FILE SYSTEM WAS MODIFIED *****

12 inodes used (0%)
0 non-contiguous inodes (0.0%)
# of inodes with ind/dind/tind blocks: 0/0/0
16443 blocks used (3%)
0 bad blocks
0 large files

1 regular file
2 directories
0 character device files
0 block device files
0 fifos
0 links
0 symbolic links (0 fast symbolic links)
0 sockets
--------
3 files
livecd ~ # e2fsck -c -C -y -v /dev/vgmirror/usr
e2fsck 1.37 (21-Mar-2005)
Checking for bad blocks (read-only test): done 576
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information

/dev/vgmirror/usr: ***** FILE SYSTEM WAS MODIFIED *****

202520 inodes used (38%)
3582 non-contiguous inodes (1.8%)
# of inodes with ind/dind/tind blocks: 2317/17/0
439977 blocks used (41%)
0 bad blocks
0 large files

172474 regular files
26704 directories
0 character device files
0 block device files
0 fifos
2487 links
3333 symbolic links (3248 fast symbolic links)
0 sockets
--------
204998 files
livecd ~ # e2fsck -c -C -y -v /dev/vgmirror/var
e2fsck 1.37 (21-Mar-2005)
Checking for bad blocks (read-only test): done 576
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information

/dev/vgmirror/var: ***** FILE SYSTEM WAS MODIFIED *****

30344 inodes used (5%)
181 non-contiguous inodes (0.6%)
# of inodes with ind/dind/tind blocks: 54/1/0
100055 blocks used (9%)
0 bad blocks
0 large files

29856 regular files
474 directories
0 character device files
0 block device files
0 fifos
0 links
3 symbolic links (3 fast symbolic links)
2 sockets
--------
30335 files
livecd ~ # e2fsck -c -C -y -v /dev/vgmirror/home
e2fsck 1.37 (21-Mar-2005)
Checking for bad blocks (read-only test): done 576
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information

/dev/vgmirror/home: ***** FILE SYSTEM WAS MODIFIED *****

58 inodes used (0%)
0 non-contiguous inodes (0.0%)
# of inodes with ind/dind/tind blocks: 0/0/0
24717 blocks used (2%)
0 bad blocks
0 large files

33 regular files
15 directories
0 character device files
0 block device files
0 fifos
0 links
1 symbolic link (1 fast symbolic link)
0 sockets
--------
49 files
livecd ~ # e2fsck -c -C -y -v /dev/vgmirror/pgsqldata
e2fsck 1.37 (21-Mar-2005)
Checking for bad blocks (read-only test): done 304
Pass 1: Checking inodes, blocks, and sizes
Inode 1802356, i_blocks is 26312, should be 23952. Fix? yes

Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences: -(3625600--3625704) -(3625710--3625711) -(3625716--3625719) -(3625724--3625907)
Fix? yes

Free blocks count wrong for group #110 (6797, counted=7092).
Fix? yes

Free blocks count wrong (4056868, counted=4057163).
Fix? yes


/dev/vgmirror/pgsqldata: ***** FILE SYSTEM WAS MODIFIED *****

1003 inodes used (0%)
90 non-contiguous inodes (9.0%)
# of inodes with ind/dind/tind blocks: 167/19/0
137141 blocks used (3%)
0 bad blocks
0 large files

964 regular files
30 directories
0 character device files
0 block device files
0 fifos
0 links
0 symbolic links (0 fast symbolic links)
0 sockets
--------
994 files
livecd ~ # e2fsck -c -C -y -v /dev/vgmirror/www
e2fsck 1.37 (21-Mar-2005)
Checking for bad blocks (read-only test): done 576
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information

/dev/vgmirror/www: ***** FILE SYSTEM WAS MODIFIED *****

478 inodes used (0%)
0 non-contiguous inodes (0.0%)
# of inodes with ind/dind/tind blocks: 0/0/0
25147 blocks used (2%)
0 bad blocks
0 large files

455 regular files
14 directories
0 character device files
0 block device files
0 fifos
0 links
0 symbolic links (0 fast symbolic links)
0 sockets
--------
469 files
livecd ~ # e2fsck -c -C -y -v /dev/vgmirror/svn
e2fsck 1.37 (21-Mar-2005)
Checking for bad blocks (read-only test): done 304
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information

/dev/vgmirror/svn: ***** FILE SYSTEM WAS MODIFIED *****

128 inodes used (0%)
17 non-contiguous inodes (13.3%)
# of inodes with ind/dind/tind blocks: 15/10/0
146674 blocks used (3%)
0 bad blocks
0 large files

98 regular files
21 directories
0 character device files
0 block device files
0 fifos
0 links
0 symbolic links (0 fast symbolic links)
0 sockets
--------
119 files
livecd ~ #


So all of the filesystems on /dev/hde4 check out okay. Now I want to take a closer look at the drives to verify that they have no bad blocks. The best way to do this is with a read-only disk test using badblocks.

# badblocks -sv /dev/hdg1

From the looks of my testing on the various drives, hda is the problem drive with a few surface errors. So I'm going to wholy replace drive hda with a fresh 120GB drive.

So I've moved the cables from hda to connect with hde, and I've put a new 120GB hard drive into the hde position. Since I setup the box properly way back when (installing grub to both disks) things are working very well and the machine booted right back up.

First we copy the partition layout from hda to hde, then I copy the boot sector from hda to hde.

coppermine thomas # sfdisk -d /dev/hda | sfdisk /dev/hde
Checking that no-one is using this disk right now ...
OK

Disk /dev/hde: 238216 cylinders, 16 heads, 63 sectors/track
Old situation:
Warning: The partition table looks like it was made
for C/H/S=*/255/63 (instead of 238216/16/63).
For this listing I'll assume that geometry.
Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0

Device Boot Start End #cyls #blocks Id System
/dev/hde1 0+ 14945 14946- 120053713+ 6 FAT16
/dev/hde2 0 - 0 0 0 Empty
/dev/hde3 0 - 0 0 0 Empty
/dev/hde4 0 - 0 0 0 Empty
New situation:
Units = sectors of 512 bytes, counting from 0

Device Boot Start End #sectors Id System
/dev/hde1 * 63 250991 250929 fd Linux raid autodetect
/dev/hde2 250992 4251743 4000752 fd Linux raid autodetect
/dev/hde3 4251744 8252495 4000752 fd Linux raid autodetect
/dev/hde4 8252496 240105599 231853104 fd Linux raid autodetect
Successfully wrote the new partition table

Re-reading the partition table ...

If you created or changed a DOS partition, /dev/foo7, say, then use dd(1)
to zero the first 512 bytes: dd if=/dev/zero of=/dev/foo7 bs=512 count=1
(See fdisk(8).)
coppermine thomas # dd if=/dev/hda bs=512 count=1 of=/dev/hde
1+0 records in
1+0 records out
coppermine thomas # fdisk -l /dev/hda

Disk /dev/hda: 122.9 GB, 122942324736 bytes
16 heads, 63 sectors/track, 238216 cylinders
Units = cylinders of 1008 * 512 = 516096 bytes

Device Boot Start End Blocks Id System
/dev/hda1 * 1 249 125464+ fd Linux raid autodetect
/dev/hda2 250 4218 2000376 fd Linux raid autodetect
/dev/hda3 4219 8187 2000376 fd Linux raid autodetect
/dev/hda4 8188 238200 115926552 fd Linux raid autodetect
coppermine thomas # fdisk -l /dev/hde

Disk /dev/hde: 122.9 GB, 122942324736 bytes
16 heads, 63 sectors/track, 238216 cylinders
Units = cylinders of 1008 * 512 = 516096 bytes

Device Boot Start End Blocks Id System
/dev/hde1 * 1 249 125464+ fd Linux raid autodetect
/dev/hde2 250 4218 2000376 fd Linux raid autodetect
/dev/hde3 4219 8187 2000376 fd Linux raid autodetect
/dev/hde4 8188 238200 115926552 fd Linux raid autodetect
coppermine thomas #


Now I need to add the new partitions to the software RAID arrays.

coppermine thomas # cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 hda2[1]
2000256 blocks [2/1] [_U]

md2 : active raid1 hda3[1]
2000256 blocks [2/1] [_U]

md3 : active raid1 hda4[0]
115926464 blocks [1/1] [U]

md0 : active raid1 hda1[1]
125376 blocks [2/1] [_U]

unused devices:
coppermine thomas # mdadm /dev/md0 -a /dev/hde1
mdadm: hot added /dev/hde1
coppermine thomas # cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 hda2[1]
2000256 blocks [2/1] [_U]

md2 : active raid1 hda3[1]
2000256 blocks [2/1] [_U]

md3 : active raid1 hda4[0]
115926464 blocks [1/1] [U]

md0 : active raid1 hde1[2] hda1[1]
125376 blocks [2/1] [_U]
[=>...................] recovery = 9.7% (12928/125376) finish=0.5min speed=3232K/sec

unused devices:
coppermine thomas #


Repeat the above for the other 3 RAID1 arrays that are degraded.

At this point, I'm basically done. It's time to make another backup and maybe swap the hda/hde cables to verify that I copied the boot sector correctly.

...

The big problem is that md3 is showing up with only a single drive "[U]" instead of "[U_]". So I need to figure out how to tell mdadm to add /dev/hde4 to the array and force it to resync. (To fix this, you use the "grow" command of mdadm.)

coppermine thomas # mdadm --grow /dev/md3 --raid-disks=2
coppermine thomas # cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 hde2[0] hda2[1]
2000256 blocks [2/2] [UU]

md2 : active raid1 hde3[0] hda3[1]
2000256 blocks [2/2] [UU]

md3 : active raid1 hda4[0]
115926464 blocks [2/1] [U_]

md0 : active raid1 hde1[0] hda1[1]
125376 blocks [2/2] [UU]

unused devices:
coppermine thomas # mdadm /dev/md3 --add /dev/hde4
mdadm: hot added /dev/hde4
coppermine thomas # cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 hde2[0] hda2[1]
2000256 blocks [2/2] [UU]

md2 : active raid1 hde3[0] hda3[1]
2000256 blocks [2/2] [UU]

md3 : active raid1 hde4[2] hda4[0]
115926464 blocks [2/1] [U_]
[>....................] recovery = 0.0% (6656/115926464) finish=1153.4min speed=1664K/sec

md0 : active raid1 hde1[0] hda1[1]
125376 blocks [2/2] [UU]

unused devices:
coppermine thomas #

Saturday, June 10, 2006

lm_sensors and Gigabyte GA-6VA7+ (take 2)

A while back I had tried to configure lm_sensors on an old Gigabyte GA-6VA7+ motherboard. That didn't work out so well, so I'm going to give it another shot. Which is important because I had a drive overheat this week due to a failed fan.

There's a new version of lm_sensors out (2.10) while I'm still running the old 2.09. Most of the steps are the same, I'm simply emerging the new version and then following the instructions. On my slow little 566Mhz Celeron box, this takes a while. Especially since I'm also rebuilding a failed raid element (thank goodness for Software RAID).

Output of the end of the emerge process:

>>> /etc/init.d/fancontrol
*
* Next you need to run:
* /usr/sbin/sensors-detect
* to detect the I2C hardware of your system and create the file:
* /etc/conf.d/lm_sensors
*
* You will also need to run the above command if you're upgrading from
* <=lm_sensors-2.9.0, as the needed entries in /etc/conf.d/lm_sensors has
* changed.
*
* Be warned, the probing of hardware in your system performed by
* sensors-detect could freeze your system. Also make sure you read
* the documentation before running lm_sensors on IBM ThinkPads.
*
* Please see the lm_sensors documentation and website for more information.
*
>>> Regenerating /etc/ld.so.cache...
>>> sys-apps/lm_sensors-2.10.0 merged.

sys-apps/lm_sensors
selected: 2.9.2
protected: 2.10.0
omitted: none

>>> 'Selected' packages are slated for removal.
>>> 'Protected' and 'omitted' packages will not be removed.

>>> Waiting 5 seconds before starting...
>>> (Control-C to abort)...
>>> Unmerging in: 5 4 3 2 1
>>> Unmerging sys-apps/lm_sensors-2.9.2...
No package files given... Grabbing a set.
--- !mtime obj /usr/share/man/man8/sensors-detect.8.gz
...
(snip)
...
--- !targe sym /usr/lib/libsensors.so
>>> Regenerating /etc/ld.so.cache...
>>> Regenerating /etc/ld.so.cache...
>>> Auto-cleaning packages ...

>>> No outdated packages were found on your system.


* GNU info directory index is up-to-date.
* IMPORTANT: 14 config files in /etc need updating.
* Type emerge --help config to learn how to update config files.

#


Output of the sensors-detect phase:

# /usr/sbin/sensors-detect
# sensors-detect revision 1.413 (2006/01/19 20:28:00)

This program will help you determine which I2C/SMBus modules you need to
load to use lm_sensors most effectively. You need to have i2c and
lm_sensors installed before running this program.
Also, you need to be `root', or at least have access to the /dev/i2c-*
files, for most things.
If you have patched your kernel and have some drivers built in, you can
safely answer NO if asked to load some modules. In this case, things may
seem a bit confusing, but they will still work.

It is generally safe and recommended to accept the default answers to all
questions, unless you know what you're doing.

We can start with probing for (PCI) I2C or SMBus adapters.
You do not need any special privileges for this.
Do you want to probe now? (YES/no): Yes
Probing for PCI bus adapters...
Use driver `i2c-matroxfb' for device 01:00.0: MGA G200 AGP
Use driver `i2c-viapro' for device 00:07.3: VIA Technologies VT82C596 Apollo ACPI
Probe succesfully concluded.

We will now try to load each adapter module in turn.
Load `i2c-matroxfb' (say NO if built into your kernel)? (YES/no): yes
FATAL: Module i2c_matroxfb not found.
Loading failed... skipping.
Module `i2c-viapro' already loaded.
If you have undetectable or unsupported adapters, you can have them
scanned by manually loading the modules before running this script.

To continue, we need module `i2c-dev' to be loaded.
If it is built-in into your kernel, you can safely skip this.
i2c-dev is not loaded. Do you want to load it now? (YES/no): yes
Module loaded succesfully.

We are now going to do the adapter probings. Some adapters may hang halfway
through; we can't really help that. Also, some chips will be double detected;
we choose the one with the highest confidence value in that case.
If you found that the adapter hung after probing a certain address, you can
specify that address to remain unprobed. That often
includes address 0x69 (clock chip).

Next adapter: SMBus Via Pro adapter at 5000
Do you want to scan it? (YES/no/selectively): yes
Client at address 0x50 can not be probed - unload all client drivers first!
Client at address 0x51 can not be probed - unload all client drivers first!
Client at address 0x52 can not be probed - unload all client drivers first!
Client found at address 0x69

Next adapter: ISA main adapter
Do you want to scan it? (YES/no/selectively): yes

Some chips are also accessible through the ISA bus. ISA probes are
typically a bit more dangerous, as we have to write to I/O ports to do
this. This is usually safe though.

Do you want to scan the ISA bus? (YES/no): yes
Probing for `National Semiconductor LM78'
Trying address 0x0290... Failed!
Probing for `National Semiconductor LM78-J'
Trying address 0x0290... Failed!
Probing for `National Semiconductor LM79'
Trying address 0x0290... Failed!
Probing for `Winbond W83781D'
Trying address 0x0290... Failed!
Probing for `Winbond W83782D'
Trying address 0x0290... Failed!
Probing for `Winbond W83627HF'
Trying address 0x0290... Failed!
Probing for `Winbond W83627EHF'
Trying address 0x0290... Failed!
Probing for `Silicon Integrated Systems SIS5595'
Trying general detect... Failed!
Probing for `VIA Technologies VT82C686 Integrated Sensors'
Trying general detect... Failed!
Probing for `VIA Technologies VT8231 Integrated Sensors'
Trying general detect... Failed!
Probing for `ITE IT8712F'
Trying address 0x0290... Failed!
Probing for `ITE IT8705F / SiS 950'
Trying address 0x0290... Failed!
Probing for `IPMI BMC KCS'
Trying address 0x0ca0... Failed!
Probing for `IPMI BMC SMIC'
Trying address 0x0ca8... Failed!

Some Super I/O chips may also contain sensors. Super I/O probes are
typically a bit more dangerous, as we have to write to I/O ports to do
this. This is usually safe though.

Do you want to scan for Super I/O sensors? (YES/no): yes
Probing for `ITE 8702F Super IO Sensors'
Failed! (skipping family)
Probing for `Nat. Semi. PC87351 Super IO Fan Sensors'
Failed! (skipping family)
Probing for `SMSC 47B27x Super IO Fan Sensors'
Failed! (skipping family)
Probing for `VT1211 Super IO Sensors'
Failed! (skipping family)
Probing for `Winbond W83627EHF/EHG Super IO Sensors'
Failed! (skipping family)

Do you want to scan for secondary Super I/O sensors? (YES/no): yes
Probing for `ITE 8702F Super IO Sensors'
Failed! (skipping family)
Probing for `Nat. Semi. PC87351 Super IO Fan Sensors'
Failed! (skipping family)
Probing for `SMSC 47B27x Super IO Fan Sensors'
Failed! (skipping family)
Probing for `VT1211 Super IO Sensors'
Failed! (skipping family)
Probing for `Winbond W83627EHF/EHG Super IO Sensors'
Failed! (skipping family)

Sorry, no chips were detected.
Either your sensors are not supported, or they are
connected to an I2C bus adapter that we do not support.
See doc/FAQ, doc/lm_sensors-FAQ.html, or
http://www2.lm-sensors.nu/~lm78/cvs/lm_sensors2/doc/lm_sensors-FAQ.html
(FAQ #4.24.3) for further information.
If you find out what chips are on your board, see
http://secure.netroedge.com/~lm78/newdrivers.html for driver status.
#


Mmm, doesn't look good, does it?

The chips that I know are on this motherboard (VIA Apollo chipset series) are:

Winbond 83977 (I/O chipset)
VIA VT82C596B
VIA VT82C693A

Hmm... those VIA chips are supposedly supported via the i2c-viapro module. I'll need to reboot at some point and check out my BIOS settings to make sure the proper things are turned on.

Monday, June 05, 2006

Toshiba introduces 200GB 2.5" drive

Toshiba has now released a hard drive based on the perpendicular recording technology. It's 200GB, 2.5" form factor (useful for notebooks / laptops). Areal density is 178.8 gigabits (per square inch).

In comparison, Hitachi has demonstrated 230 gigabits per square inch (see also PCWorld where current drives top out at 120-130 gigabits/sq inch while perpendicular should allow areal density of 230 gigabits/sq inch). Back in 2004, Toshiba demonstrated a density of 133 gigabits per square inch, which was done using the older recording technology. Back in 2002, Seagate announced drives that had areal densities of 100 gigabits/sq in. Seagate expects areal densities to hit 500 gigabits/sq in over the next few years, compared with today's current density of ~110 gigabits/sq in.

So it looks like perpendicular recording will definitely allow hard drive sizes to double and possibly quadruple over the next few years. That means we can reasonably expect 1TB drives in the short-term with the possibility of 2TB drives down the road.