Wednesday, January 28, 2009

Removing a failed, non-existent drive from Software RAID

So, you have a drive that has failed, you've replaced the drive on the fly (using hot-swap SATA) and now you need to remove the old RAID slice.

For example:

md0 : active raid1 sdi1[0] sdc1[2] sdb1[3](F) sda1[1]
264960 blocks [3/3] [UUU]


In this case, sdb1 is marked as failed, and sdi1 was the slice from the newly added drive (via SATA hot-plug). So we want to remove it with mdadm's remove command:

# mdadm /dev/md0 --remove /dev/sdb1
mdadm: cannot find /dev/sdb1: No such file or directory


Oops, we can't do that because we already swapped out the failed drive (sdb).

The answer is found in the mdadm man page for the remove feature:

-r, --remove remove listed devices. They must not be active. i.e. they should be failed or spare devices. As well as the name of a device file (e.g. /dev/sda1) the words failed and detached can be given to --remove. The first causes all failed device to be removed. The second causes any device which is no longer connected to the system (i.e an open returns ENXIO) to be removed. This will only succeed for devices that are spares or have already been marked as failed.

So instead of specifying the name of the failed RAID slice we should instead us the following command:

# mdadm /dev/md0 -r detached  
mdadm: hot removed 8:17


And there you have it, the failed raid slice that is no longer connected to the system has been removed. It will not show up in "/proc/mdstat" any more.

Wednesday, January 14, 2009

Yum Error - Metadata file does not match checksum

Ran into this issue today when the pgsql folks updated their repository. All of our CentOS 5 machines are behind a transparent HTTP proxy cache server (squid).

filelists.sqlite.bz2      100% |=========================| 157 kB    00:00     
http://yum.pgsqlrpms.org/8.3/redhat/rhel-5-x86_64/repodata/filelists.sqlite.bz2: [Errno -1] Metadata file does not match checksum
Trying other mirror.
Error: failure: repodata/filelists.sqlite.bz2 from pgdg83: [Errno 256] No more mirrors to try.


It doesn't really matter what the package name is, the primary issue is the "[Errno -1] Metadata file does not match checksum" error message.

Solution:

1) Edit /etc/yum.conf and add the following line

http_caching=packages

2) Run "yum clean metadata"

3) Retry the yum install

References:

FedoraForum.org > Fedora Support > Installation Help > yum "Metadata file does not match checksum" problem

Monday, January 12, 2009

PostgreSQL - Errors during insertions

ERROR: could not access status of transaction 84344832
DETAIL: Could not open file "pg_clog/0050": No such file or directory.

ERROR: could not access status of transaction 84344832
DETAIL: Could not open file "pg_clog/0050": No such file or directory.

Running into this issue while doing heavy inserts on a table. The error also shows up when doing a "vacuum analyze" on the table. This is with PostgreSQL 8.3.5.

Well, troubleshooting steps.

1) # grep "was terminated" /var/lib/pgsql/data/pg_log/*.log

postgresql-2008-12-23_125649.log:LOG: server process (PID 15996) was terminated by signal 6: Aborted

postgresql-2008-12-28_131324.log:LOG: server process (PID 25745) was terminated by signal 6: Aborted

postgresql-2009-01-01_212245.log:LOG: startup process (PID 27003) was terminated by signal 6: Aborted

postgresql-2009-01-01_212334.log:LOG: startup process (PID 27097) was terminated by signal 6: Aborted


All of my terminate statements are due to "signal 6: aborted", so I don't think there's anything to be seen there.

2) Going to look for "Could not open file" in the log files. The command is:

# grep "Could not open file" /var/lib/pgsql/data/pg_log/*.log

postgresql-2009-01-11_000000.log:DETAIL:  Could not open file "pg_clog/0050": No such file or directory.

postgresql-2009-01-11_000000.log:DETAIL: Could not open file "pg_clog/0050": No such file or directory.

postgresql-2009-01-11_170257.log:DETAIL: Could not open file "pg_clog/0050": No such file or directory.

postgresql-2009-01-11_172436.log:DETAIL: Could not open file "pg_clog/0050": No such file or directory.

postgresql-2009-01-12_000000.log:DETAIL: Could not open file "pg_clog/0050": No such file or directory.


Which is the errors that I'm seeing. They started on Jan 11th.

...

Fix attempt #1

A) Shutdown the database, make sure you have good backups.

B) Find out the size of the pg_clog files. The following shows that ours are 256KB in size.

# ls -lk /var/lib/pgsql/data/pg_clog/ | head

-rw------- 1 postgres postgres 256 Dec 29 17:56 007E
-rw------- 1 postgres postgres 256 Dec 29 19:05 007F
-rw------- 1 postgres postgres 256 Dec 29 20:15 0080
-rw------- 1 postgres postgres 256 Dec 29 21:23 0081
-rw------- 1 postgres postgres 256 Dec 29 22:29 0082
-rw------- 1 postgres postgres 256 Dec 29 23:23 0083
-rw------- 1 postgres postgres 256 Dec 30 00:15 0084
-rw------- 1 postgres postgres 256 Dec 30 01:08 0085

C) Create the missing clog file:

# dd if=/dev/zero of=/var/lib/pgsql/data/pg_clog/0050 bs=1k count=256
# chmod 600 /var/lib/pgsql/data/pg_clog/0050
# chown postgres:postgres /var/lib/pgsql/data/pg_clog/0050

D) Restart the pgsql service.

...

Final notes. This is probably absolutely NOT the proper way to fix this error. Proceed at your own risk. The chance of lost data is VERY HIGH.

For us, it was a table that was append only, that we were filling out with test data. So I'm not all that concerned.

Saturday, January 10, 2009

Subversion Backups - Finding SVN repositories

The first trick when backing up SVN repositories is finding them so that you can run the svnadmin hotcopy command. Well, you *could* just setup a list of export DIRS in your backup script - but as you add new SVN repositories, you have to constantly edit that script.

Caveat #1: This setup probably only works for FSFS repositories. I don't use BerkleyDB repositories (a.k.a. BDB) so I can't guarantee that it correctly locates them. I've chosen to look for folders that contain the "db/uuid" file as our "marker" file. Which should result in zero false-positives or mis-identified repositories.

Caveat #2: I'm making the assumption that all of your repositories are stored in a central location (/var/svn), but they do *not* all have to be at the same depth.

Step #1 - Find the uuid files

This is pretty simple, we're just going to use find and grep.

# find /var/svn -name uuid | grep "db/uuid"

/var/svn/tgh-photo/db/uuid
/var/svn/tgh-dev/db/uuid
/var/svn/tgh-web/db/uuid

Step #2 - Clean up the pathnames

Even better, we can tack a sed command onto the end to trim off the "/db/uuid" portion, which gets us exactly what we need for passing to the "svnadmin hotcopy" command.

# find /var/svn -name uuid | grep "db/uuid" | sed 's/\/db\/uuid//'

/var/svn/tgh-photo
/var/svn/tgh-dev
/var/svn/tgh-web

(Make sure that you get all of the "\" and "/" in the right places.)

Step #3 - Strip off the base directory

Since I'm going to create a script variable called "BASE" that equals "/var/svn/", I'm also going to strip that off of the front of the paths.

However, we'll need to convert the BASE variable into something that sed can properly deal with. Otherwise the slashes won't be escaped properly for the sed replacement.

# echo "/var/svn/" | sed 's/\//\\\//g'
\/var\/svn\/

# find /var/svn -name uuid | grep "db/uuid" | sed 's/\/db\/uuid//' | sed 's/^\/var\/svn\///'
tgh-photo
tgh-dev
tgh-web

Or, even better, we can use a different delimiter for sed. That gives us a search line of:

DIRS=`find ${BASE} -name uuid | grep 'db/uuid$' | sed 's:/db/uuid$::' | sed 's:^/var/svn/::'`

Which puts a list of directory names into the DIRS varaible in our bash script.

Step #4 - Putting it all together

Here's our basic script.

#!/bin/bash
BASE="/var/svn/"
HOTCOPY="/var/svn-hotcopy/"
DIRS=`find ${BASE} -name uuid | grep 'db/uuid$' | sed 's:/db/uuid$::' | sed 's:^/var/svn/::'`

for DIR in ${DIRS}
do
echo "svnadmin hotcopy ${BASE}${DIR} to ${HOTCOPY}${DIR}"
rm -r ${HOTCOPY}${DIR}
svnadmin hotcopy ${BASE}${DIR} ${HOTCOPY}${DIR}
done


However, we're not quite done yet because the svn hotcopy command doesn't like it when the destination folder does not exist. So let's add the following scrap of code into the loop.

if ! test -d ${HOTCOPY}${DIR}
then
mkdir -p ${HOTCOPY}${DIR}
fi


The final script

#!/bin/bash

BASE="/var/svn/"
HOTCOPY="/var/svn-hotcopy/"

FIND=/usr/bin/find
GREP=/bin/grep
RM=/bin/rm
SED=/bin/sed
SVNADMIN=/usr/bin/svnadmin

DIRS=`find ${BASE} -name uuid | $GREP 'db/uuid$' | $SED 's:/db/uuid$::' | $SED 's:^/var/svn/::'`

for DIR in ${DIRS}
do
echo "svnadmin hotcopy ${BASE}${DIR} to ${HOTCOPY}${DIR}"

if ! test -d ${HOTCOPY}${DIR}
then
mkdir -p ${HOTCOPY}${DIR}
fi

$RM -r ${HOTCOPY}${DIR}
$SVNADMIN hotcopy ${BASE}${DIR} ${HOTCOPY}${DIR}
done

# insert rdiff-backup line here


Hopefully that works out. Note the use of "rm -r", which could cause data loss if there are errors in the script. You will want to be very careful while working on the script.

PostgreSQL - Backup Speed Tests

Our backup script for pgsql dumps the databases out in plain text SQL format (my preferred method for a variety of reasons). The question was, do we leave it as plain text and/or which compressor do we use?

...

Here are sample times for a plain-text SQL backup.

real 3m30.523s
user 0m14.053s
sys 0m5.132s

The raw database cluster is 22GB (22150820 KB), but that includes indexes. The resulting size of the backups is 794MB (812436 KB). The specific command line used is:

pg_dump -a -b -O -t $s.$t -x $d -f $DIR/$d/$s.$t.sql

($s, $t, $d and $DIR are variables denoting the schema, table, database, and base backup directory)

...

gzip (default compression level of "6")

pg_dump -a -b -O -t $s.$t -x $d | gzip -c > $DIR/$d/$s.$t.sql

CPU usage is pegged at 100% on one of the four CPUs in the box during this operation (due to gzip compressing the streams). So we are bottlenecked by the somewhat slow CPUs in the server.

real 3m0.337s
user 2m17.289s
sys 0m6.740s

So we burned up a lot more CPU time (user 2m 17s) compared to the plain text dump (user 14s). But the overall operation still completed fairly quickly. So how much space did we save? The resulting backups are only 368MB (376820 KB), which is a good bit smaller.

(It would be better, but a large portion of our current database is comprised of the various large "specialized" tables, which are extremely random and difficult to compress. I can't talk about the contents of those tables, but the data in them is generated by a PRNG.)

...

gzip (compression level of "9")

pg_dump -a -b -O -t $s.$t -x $d | gzip -c9 > $DIR/$d/$s.$t.sql.gz

We're likely to be even more CPU-limited here due to telling gzip to "try harder". The resulting backups are 369MB (376944 KB), which is basically the same size.

real 9m39.513s
user 7m28.784s
sys 0m12.585s

So we burn up 3.2x more CPU time, but we don't really change the backup size. Probably not worth it.

...

bzip2

pg_dump -a -b -O -t $s.$t -x $d | bzip2 -c9 > $DIR/$d/$s.$t.sql.bz2

real 19m45.280s
user 3m52.559s
sys 0m11.709s

Interesting, while bzip2 is about twice as slow as gzip (the default compression level), it didn't perform as badly as the maximum compression option of gzip. The resulting backup files are only 330MB (337296 KB), which is a decent improvement over the gzip compression level.

Now, the other interesting thing is that bzip2 took a lot longer to run then gzip, but the server is pretty busy at the moment.

...

Ultimately, we ended up going with bzip2 for a variety of reasons.

- Better compression
- The additional CPU usage was not an issue
- We could change to a smaller block size (-c2) to be more friendly to rsync

Thursday, January 08, 2009

PostgreSQL - Basic backup scheme

Here's a basic backup scheme. We're using pg_dump in plain-text mode, compressing the output with bzip2, and writing the results out to files named after the database, schema and table name. It's not the most efficient method, but allows us to go back to:

- any of the past 7 days
- any Sunday within the past month
- the last week of each month in the past quarter
- the last week of each quarter within the past year
- the last week of each year

Which is about 24-25 copies of the data, stored on the hard drive. So you'll need to make sure that you have enough space on the drive to handle all of these copies.

Most of the grunt work is handled by the include script, the daily / weekly / monthly backup scripts simply setup a few variables and then call the main include script.

backup_daily.sh
#!/bin/bash
# DAILY BACKUPS (writes to a daily folder each day)
DAYNR=`date +%w`
echo $DAYNR
DIR=/backup/pgsql/daily/$DAYNR/
echo $DIR

source ~/bin/include_backup_compressed.sh


backup_weekly.sh
#!/bin/bash
# WEEKLY BACKUPS
# Backups go to a five directories based on the day of the month
# converted into 1-5 based on modulus arithmetic. The fifth week
# will sometimes be left over for a few months depending on how
# many weeks there are in the year.
WEEKNR=`date +%d`
echo $WEEKNR
let "WEEKNR = (WEEKNR+6) / 7"
echo $WEEKNR
DIR=/backup/pgsql/weekly/$WEEKNR/
echo $DIR

source ~/bin/include_backup_compressed.sh


backup_monthly.sh
#!/bin/bash
# MONTHLY BACKUPS
# Backups go to three directories based on the month of year
# converted into 1-3 based on modulus arithmetic.
MONTHNR=`date +%m`
echo $MONTHNR
let "MONTHNR = ((MONTHNR -1) % 3) + 1"
echo $MONTHNR
DIR=/backup/pgsql/monthly/$MONTHNR/
echo $DIR

source ~/bin/include_backup_compressed.sh


backup_quarterly.sh
#!/bin/bash
# QUARTERLY BACKUPS
# Backups go to a four directories based on the quarter of the year
# converted into 1-4 based on modulus arithmetic.
QTRNR=`date +%m`
echo $QTRNR
let "QTRNR = (QTRNR+2) / 3"
echo $QTRNR
DIR=/backup/pgsql/quarterly/$QTRNR/
echo $DIR

source ~/bin/include_backup_compressed.sh


backup_yearly.sh
#!/bin/bash
# ANNUAL BACKUPS
YEARNR=`date +%Y`
echo $YEARNR
DIR=/backup/pgsql/yearly/$YEARNR/
echo $DIR

source ~/bin/include_backup_compressed.sh


include_backup_compressed.sh
#!/bin/bash
# Compressed backups to $DIR
echo $DIR
DBS=$(psql -l | grep '|' | awk '{ print $1}' | grep -vE '^-|^Name|template[0|1]')
for d in $DBS
do
echo $d
DBDIR=$DIR/$d
if ! test -d $DBDIR
then
mkdir -p $DBDIR
fi
SCHEMAS=$(psql -d $d -c '\dn' | grep '|' | awk '{ print $1}' \
| grep -vE '^-|^Name|^pg_|^information_schema')
for s in $SCHEMAS
do
echo $d.$s
TABLES=$(psql -d $d -c "SELECT schemaname, tablename FROM pg_catalog.pg_tables WHERE schemaname = '$s';" \
| grep '|' | awk '{ print $3}' | grep -vE '^-|^tablename')
for t in $TABLES
do
echo $d.$s.$t
if [ $s = 'public' ]
then
pg_dump -a -b -O -t $t -x $d | bzip2 -c2 > $DIR/$d/$s.$t.sql.bz2
else
pg_dump -a -b -O -t $s.$t -x $d | bzip2 -c2 > $DIR/$d/$s.$t.sql.bz2
fi
done
done
done


We tried using gzip instead of bzip2, but found that bzip2 worked a little better even though it uses up more CPU. We use a block size of only 200k for bzip2 in order to be more friendly to an rsync push to an external server.

Wednesday, January 07, 2009

Very basic rsync / cp backup rotation with hardlinks

Here's a very basic script that I use with RSync that makes use of hard links to reduce the overall size of the backup folder. The limitations are:

- Every morning, a server copies the current version of all files across SSH (using scp) into a "current" folder. There are two folders on the source server that get backed up daily (/home and /local).

- Later on that day, we run the following script to rsync any new files into a daily folder (daily.0 through daily.6).

- In order to bootstrap those daily.# folders, you have to use "cp -al current/* daily.2/" on each, which fills out the seven daily backup folders with hardlinks. Change the number in "daily.2" to 0-6 and run the command once for each of the seven days. Do this after the "current" folder has been populated with data pushed by the source server.

- Ideally, the source server should be pushing changes to the "current" folder using rsync. But in our case, the current server is an old Solaris 9 server without rsync. Which means that our backups are likely to be about 2x to 3x larger then they should be.

- RDiff-Backup may have been a better solution for this particular problem (and we may switch).

- This shows a good example of how to calculate the current day of week number (0-6) as well as calculating what the previous day number was (using modulus arithmetic).

- I make no guarantees that permissions or ownership will be preserved. But since the source server strips all of that information in the process of sending the files over the wire with scp, it's a moot point for our current situation. (rdiff-backup is probably a better choice for that.)

#!/bin/bash
# DAILY BACKUPS (writes to a daily folder each day)
DAYNR=`date +%w`
echo DAYNR=${DAYNR}
let "PREVDAYNR = ((DAYNR + 6) % 7)"
echo PREVDAYNR=${PREVDAYNR}
DIRS="home local"

for DIR in ${DIRS}
do
echo "----- ----- ----- -----"
echo "Backup:" ${DIR}
SRCDIR=/backup/cfmc1/$DIR/current/
DESTDIR=/backup/cfmc1/$DIR/daily.${DAYNR}/
PREVDIR=/backup/cfmc1/$DIR/daily.${PREVDAYNR}/
echo SRCDIR=${SRCDIR}
echo DESTDIR=${DESTDIR}
echo PREVDIR=${PREVDIR}

cp -al ${PREVDIR}* ${DESTDIR}
rsync -a --delete-after ${SRCDIR} ${DESTDIR}

echo "Done."
done


It's not pretty, but it will work better once the source server starts pushing the daily changes via rsync instead of completely overwriting the "current" directory every day.

The code should be pretty self explanatory but I'll explain the two key lines.

cp -al ${PREVDIR}* ${DESTDIR}

This overwrites all files in ${DESTDIR}, which is today, with the files from yesterday, but does it by creating hard links of all files. Old files which were deleted since last week will be left behind until the rsync step.

rsync -a --delete-after ${SRCDIR} ${DESTDIR}

This then brings today's folder up to date with any changes as compared to the source directory (a.k.a. "current"). It also deletes any file in today's folder that don't exist in the source directory.

References:

Easy Automated Snapshot-Style Backups with Linux and Rsync

Local incremental snap shots with rsync

Tuesday, January 06, 2009

Setting up FreeNX/NX on CentOS 5

Quick guide to setting up FreeNX/NX. This the approximate minimums on a fresh CentOS 5.1 box. We're limiting things to using public-key authentication from the outside and we already have a second ssh daemon running (listening on localhost, allowing password authentication).

Note: If you have ATRPMs configured as a repository, make sure that you exclude nx* and freenx*. (Add/edit the exclude= line in the ATRPMs .repo file.)

# yum install nx freenx
# cp /etc/nxserver/node.conf.sample /etc/nxserver/node.conf
# vi /etc/nxserver/node.conf

Change the following lines in the node.conf file:

ENABLE_SSH_AUTHENTICATION="1"
-- remove the '#' at the start of the line

ENABLE_SU_AUTHENTICATION="1"
-- remove the '#' at the start of the line
-- change the zero to a one

ENABLE_FORCE_ENCRYPTION="1"
-- remove the '#' at the start of the line
-- change the zero to a one

Change the server's public/private key pair:

# mv /etc/nxserver/client.id_dsa.key /etc/nxserver/client.id_dsa.key.OLD
# mv /etc/nxserver/server.id_dsa.pub.key /etc/nxserver/server.id_dsa.pub.key.OLD
# ssh-keygen -t dsa -N '' -f /etc/nxserver/client.id_dsa.key
# mv /etc/nxserver/client.id_dsa.key.pub /etc/nxserver/server.id_dsa.pub.key
# cat /etc/nxserver/client.id_dsa.key

You'll need to give the DSA Private Key information to people who should be allowed to use FreeNX/NX to access the server.

You'll also need to put the new public key into the authorized_keys2 file:

# cat /etc/nxserver/server.id_dsa.pub.key >> /var/lib/nxserver/home/.ssh/authorized_keys2

# vi /var/lib/nxserver/home/.ssh/authorized_keys2

Comment out the old key, put the following at the start of the good key line.

no-port-forwarding,no-X11-forwarding,no-agent-forwarding,command="/usr/bin/nxserver"

Restart the FreeNX/NX service:

# service freenx-server restart

You should now be able to connect (assuming that you specify the proper SSH port and paste the private key into the configuration).

Setup sshd to run a second instance

In order to lock down the servers like I prefer to, yet still allow FreeNX/NX to work, I have to setup a second copy of the sshd daemon. The FreeNX/NX client requires that you have sshd running with password access (not just public key), but we prefer to only allow public-key access to our servers.

I did the following on CentOS 5, it should also work for Fedora or Red Hat Enterprise Linux (RHEL). But proceed at your own risk.

1) Create a hard link to the sshd program. This allows us to distinguish it in the process list. It also makes sure that our cloned copy stays up to date as the sshd program is patched.

# ln /usr/sbin/sshd /usr/sbin/sshd_nx

2) Copy /etc/init.d/sshd to a new name

This is the startup / shutdown script for the base sshd daemon. Make a copy of this script:

# cp -p /etc/init.d/sshd /etc/init.d/sshd_nx
# vi /etc/init.d/sshd_nx

Change the following lines:

# processname: sshd_nx
# config: /etc/ssh/sshd_nx_config
# pidfile: /var/run/sshd_nx.pid
prog="sshd_nx"
SSHD=/usr/sbin/sshd_nx
PID_FILE=/var/run/sshd_nx.pid
OPTIONS="-f /etc/ssh/sshd_nx_config -o PidFile=${PID_FILE} ${OPTIONS}"
[ "$RETVAL" = 0 ] && touch /var/lock/subsys/sshd_nx
[ "$RETVAL" = 0 ] && rm -f /var/lock/subsys/sshd_nx
if [ -f /var/lock/subsys/sshd_nx ] ; then

Note: The OPTIONS= line is probably new and will have to be added right after the PID_FILE= line in the file. There are also multiple lines that reference /var/lock/subsys/sshd, you will need to change all of them.

3) Copy the old sshd configuration file.

# cp -p /etc/ssh/sshd_config /etc/ssh/sshd_nx_config

4) Edit the new sshd configuration file and make sure that it uses a different port number.

Port 28822

5) Clone the PAM configuration file.

# cp -p /etc/pam.d/sshd /etc/pam.d/sshd_nx

6) Set the new service to startup automatically.

# chkconfig --add sshd_nx

...

Test it out

# service sshd_nx start
# ssh -p 28222 username@localhost

Check for errors in the log file:

# tail -n 25 /var/log/secure

...

At this point, I would go back and change the secondary configuration to only listen on the localhost ports:

ListenAddress 127.0.0.1
ListenAddress ::1

...

References:

How to Add a New "sshd_adm" Service on Red Hat Advanced Server 2.1

How to install NX server and client under Ubuntu/Kubuntu Linux (revised)

...

Follow-on notes:

This setup is specifically for cases where you do not want to allow password access to the server from outside. The way that the NXClient authenticates is basically:

1) NXClient uses a SSH public key pair to authenticate with the server

2) It then logs into the server using the supplied username/password via SSH.

So by setting things up this way, we keep public-key authentication as the only way to login to the server - but the NX server daemon is able to cross-connect to a localhost-only SSH daemon.

Friday, January 02, 2009

Samba3: Upgrading to v3.2 on CentOS 5

CentOS 5 currently only has Samba 3.0.28 in their BASE repository. The DAG/RPMForge projects don't have updated Samba3 RPMs either (although I do see an OpenPkg RPM). So the question that I've been dealing with for the past few weeks is "where do I get newer Samba RPMs"?

Ideally, I would get these RPMs from a repository, so that I could be notified via "yum check-update" for when there are security / feature updates. While I don't mind the occasional source package in .tar.gz or .tar.bz2 format, they rapidly become a maintenance nightmare. Especially for security-sensitive packages like Samba which tend to be attack targets.

What I've found that looks promising is:

http://ftp.sernet.de/pub/samba/recent/centos/5/

Which has a .repo file and looks like it might be usable as a repository for yum. (See "Get the latest Samba from Sernet" for confirmation of this.)

# cd /etc/yum.repos.d/
# wget http://ftp.sernet.de/pub/samba/recent/centos/5/sernet-samba.repo

Now, the major change is that the RedHat/CentOS packages are named "samba.x86_64" while the sernet.de packages are named "samba3.x86_64". Also, the sernet.de folks don't sign their packages, so you will need to add "gpgcheck=0" to the end of the .repo file.

(At least, I don't think they do...)

Note: As always, before doing a major upgrade like this, make backups. At a minimum, make sure you have good backups of your Samba configuration files. We use FSVS with a SVN backend for all of our configuration files, which makes an excellent change tracking tool for Linux servers.

# yum remove samba.x86_64
# yum install samba3.x86_64
# service smb start

With luck, you should now be up and running with v3.2 of Samba. You can verify this by looking at the latest log file in the /var/log/samba/ directory.

Thursday, January 01, 2009

LVM Maximum number of Physical Extents

Working on a 15-disk system (750GB SATA drives) so this issue came up again:

What is the maximum number of physical extents in LVM?

The answer is that there's no limit on the number of PEs within a Physical Volume (PV) or Volume Group (VG). The limitation is, instead, the maximum number of PEs that can be formed into a Logical Volume (LV). So a 32MB PE size allows for LVs up to 2TB in size.

Note: Older LVM implementations may only allow a maximum of 65k PEs across the entire PV. The VG can be composed of multiple PVs however.

So if you want to be safe, make your PE size large enough that you only end up with less then 65k PEs. Just remember that all PVs within a VG need to use the same PE size. So if you're planning for lots of expansion down the road, with very large PVs to be added later, you may wish to bump up your PE size by a factor of 4 or 8.

A good example of this is a Linux box using Software RAID across multiple 250GB disks. The plan down the road is to replace those disks with larger models, create new Software RAID arrays across the extended areas on the larger disks, then extend VGs across new PVs. At the start, you might only have a net space (let's say RAID6, 2 hot-spares, 15 total disks) of around 2.4TB. That's small enough that a safe PE size might be 64MB (about 39,000 PEs).

Except that down the road, disks are getting much larger (1.5TB drives are now easily obtainable). So if we had stuck with a 64MB PE size, our individual LVs could be no larger then 4TB. If we were to put in 2TB disks (net space of about 20TB), the number of PEs would end up growing by about 8x (312,000). We might even see 4TB drives in a 3.5" size, which would be closer to 40TB of net space.

A PE size of 256MB might have served us better when we setup that original PV area. It would allow individual LVs sized up to 16TB. The only downside is that you won't want to create LVs smaller then 256MB and you'll want to make sure all LVs are multiples of 256MB.

Bottom line, when setting up your PE sizes, plan for a 4x or 8x growth.

Example:

The following shows how to create a volume group (VG) with 512MB physical extents instead of the default 4MB extent size (also known as "PE size" in the vgdisplay command output).

# vgcreate -s 512M vg6 /dev/md10

References:

LVM Manpage - Talks about the limit in IA32.
Maximum Size Of A Logical Volume In LVM - Walker News (blog entry)
Manage Linux Storage with LVM and Smartctl - Enterprise Networking Planet