73 Using a USB external hard disk for backups with Linux
Monday 23rd April, 2007
In this article, I show how I set up a recently purchased USB external hard disk drive as a backup drive for my Linux desktop PC. I'll delete the default FAT32 partition, create a new partition, make a reiserfs filesystem, and show how to use rsync to backup your important data.
- Why backup with Linux?
- Considerations
- Getting the hardware working
- Partition the disk
- Creating a new filesystem
- The mountpoint
- Running the backup (
/home
) - Running the restore (
/home
) - Running the backup (
/etc
) - Automating the backup
- Other options
Why backup with Linux?
The question should really be "Why backup with Linux instead of Windows, if I have a dual boot PC?" Personally, I don't dual boot, so for me the question never arises. However, I can think of a number of reasons why I would use Linux as my OS for running backups.
- Windows cannot read your Linux partitions without third-party addons.
- Conversely, Linux can (natively) read your FAT32 and NTFS partitions.
- It is straightforward to create a backup script which mounts those partitions, backs them up, then unmounts them.
- Backing up to Linux renders any virus-infected files inert.
- Would you leave a Windows PC running overnight, unattended?
Considerations
What to backup?
The old school attitude is to backup everything, the whole system. As they say, "There's no school like the old school", and that really is the safest approach. It's not really a space issue to run full system backups since a fully installed system might take up a few gig, perhaps 3-10 GB including system logs and other stuff.
On a desktop system I'm not convinced this is necessary. My distro of choice (Arch Linux), and many popular distros today such as the many Debian-based distros, incorporate a rolling release. Installing and updating such systems is pretty easy (as well as being a useful learning experience). I'm not sure how much time you would save by running a full system backup.
You might want to backup some essential directories like /etc
and parts of /var
. /root
may be useful, it depends how you run your system.
It will also be useful to have a list of all the installed packages on your system. pacman -Qi
will provide this for Arch, dpkg --list
will provide similar for Debian & Co. Redirect the output to a file in your home directory. If you're going to do this, cron
it to run daily.
The important stuff for most desktop systems will be in /home
; documents, e-mails, mp3s, movies, photos, etc., so backing up /home
is what I'll be concentrating on for this article.
Backup as root or normal user?
A sensible rule is to try and run as much as possible without using the root account. Backup your system files as root, and your home directory seperately using your normal user account. This makes it a little easier to restore your normal user account data, since it will be easily accessible as yourself rather than root.
Leave backup partition mounted or unmount after each backup?
You should try to ensure that your backup partition is unmounted after each backup. This will help prevent data corruption in the event of power surges or outages.
Only use your backup disk for backups. Don't be tempted to store live data on it. If your backup disk fails, then you've lost everything on it. If it was only storing your backups, you can get a replacement and press it into service quickly. If it also had live data on there, then that live data is gone.
What backup software to use?
This article focuses on rsync. I like rsync. It has some very clever features, which include:
- Internal pipelining reduces latency for multiple files.
- If a file has changed, it sends just the differences in the files to the new location.
- It can used over a network; I used to use rsync with SSH to backup to a test machine I have at home. This test machine is sometimes off (electricity is getting expensive; it's hard to justify having all these machines on all the time, it puts extra load on my UPS, it's noisy, etc), and so sometimes several days would go by before the backup got run. The USB connection is faster and is always on, so I never miss a backup now ;) .
Getting the hardware working
This varies according to which external USB HDD you've purchased. Read the documentation which comes with your kit. Before you start, run tail -f /var/log/messages.log
in a seperate terminal (as root). This shouldn't really be needed, and is just to demonstrate that your system recognises the USB drive, which it should unless you have a very old (or perhaps a self-compiled) kernel. The basic steps probably involve something like:
- Connect the USB drive to your system.
- Attach power cables.
- Power up.
Once you've got your drive connected and powered up, you should be able to see it on your system.
Your terminal running tail
should display something like:
Apr 16 23:17:40 aquilonia usb 1-7: new high speed USB device using ehci_hcd and address 7 Apr 16 23:17:40 aquilonia usb 1-7: configuration #1 chosen from 1 choice Apr 16 23:17:40 aquilonia scsi4 : SCSI emulation for USB Mass Storage devices Apr 16 23:17:40 aquilonia scsi 4:0:0:0: Direct-Access SAMSUNG HD400LD WQ10 PQ: 0 ANSI: 0 Apr 16 23:17:40 aquilonia SCSI device sda: 781422768 512-byte hdwr sectors (400088 MB) Apr 16 23:17:40 aquilonia sda: Write Protect is off Apr 16 23:17:40 aquilonia SCSI device sda: 781422768 512-byte hdwr sectors (400088 MB) Apr 16 23:17:40 aquilonia sda: Write Protect is off Apr 16 23:17:40 aquilonia sda: sda1 Apr 16 23:17:40 aquilonia sd 4:0:0:0: Attached scsi disk sda
We can see here that the new USB disk drive is presented as SCSI device /dev/sda
with a single partition /dev/sda1
Ctrl-c to close your tail
command.
Your newly visible partition will probably be a FAT32 filesystem. If you're okay with that, you can make an entry in /etc/fstab
something like this to access it:
# <file system> <dir> <type> <options> <dump> <pass> ... /dev/sda1 /mnt/usb vfat user,noauto,rw 0 0
The options mean respectively that any user can mount the filesystem, but only the user that mounted the filesystem can unmount it (user
), the filesystem will not mount at boot time (noauto
), and the filesystem will be mounted with read and write access (rw
).
You can test your new hard disk as a normal user:
$ mount /mnt/usb
$ df -k
Filesystem 1K-blocks Used Available Use%25 Mounted on ... /dev/sda1 390610848 64 390610784 1%25 /mnt/usb
The default FAT32 filesystem is convenient in that is already there, ready to be used. The disadvantages are:
- Fragmentation.
- No journaling/poor recovery in the event of a power failure etc.
- Poor security - FAT32 files automatically belong to whoever mounts the filesystem.
Partition the disk
I prefer reiserfs (Reiser 3). I know I'll never need to backup a Windows machine since I don't do Windows, so putting up with the inadequacies of FAT32 is simply not required. You may wish to rethink which fileystem to go for, or perhaps a different partitioning strategy if you have a seperate Windows PC. I'm going to go for a single large Reiser 3 partition. To do this, I first need to use cfdisk
to delete the old partition and create a new one. You will probably need root access or sudo
for this, depending on how your system is configured.
# cfdisk /dev/sda
cfdisk 2.12r Disk Drive: /dev/sda Size: 400088457216 bytes, 400.0 GB Heads: 255 Sectors per Track: 63 Cylinders: 48641 Name Flags Part Type FS Type [Label] Size (MB) ------------------------------------------------------------------------------ sda1 Primary W95 FAT32 400085.85 [Bootable] [ Delete ] [ Help ] [Maximize] [ Print ] [ Quit ] [ Type ] [ Units ] [ Write ] Toggle bootable flag of the current partition
A FAT32 partition. Let's get rid of it and then see what we have left.
Select [ Delete ]
from the menu.
cfdisk 2.12r Disk Drive: /dev/sda Size: 400088457216 bytes, 400.0 GB Heads: 255 Sectors per Track: 63 Cylinders: 48641 Name Flags Part Type FS Type [Label] Size (MB) ------------------------------------------------------------------------------ sda1 Primary W95 FAT32 400085.85 [Bootable] [ Delete ] [ Help ] [Maximize] [ Print ] [ Quit ] [ Type ] [ Units ] [ Write ] Delete the current partition
cfdisk 2.12r Disk Drive: /dev/sda Size: 400088457216 bytes, 400.0 GB Heads: 255 Sectors per Track: 63 Cylinders: 48641 Name Flags Part Type FS Type [Label] Size (MB) ------------------------------------------------------------------------------ Pri/Log Free Space 400085.85 [ Help ] [ New ] [ Print ] [ Quit ] [ Units ] [ Write ] Print help screen
Right, nothing left. Time to create a new partition. This will be a primary partition, and I'll only make one - this whole disk is for backups. Select
[ New ]
, [Primary]
to create a new primary partition. Accept the default size offered, which should be all the disk space available.
cfdisk 2.12r Disk Drive: /dev/sda Size: 400088457216 bytes, 400.0 GB Heads: 255 Sectors per Track: 63 Cylinders: 48641 Name Flags Part Type FS Type [Label] Size (MB) ------------------------------------------------------------------------------ sda1 Primary Linux 400085.85 [Bootable] [ Delete ] [ Help ] [Maximize] [ Print ] [ Quit ] [ Type ] [ Units ] [ Write ] Toggle bootable flag of the current partition
Now write the new partition table to disk.
cfdisk 2.12r Disk Drive: /dev/sda Size: 400088457216 bytes, 400.0 GB Heads: 255 Sectors per Track: 63 Cylinders: 48641 Name Flags Part Type FS Type [Label] Size (MB) ------------------------------------------------------------------------------ sda1 Primary Linux 400085.85 [Bootable] [ Delete ] [ Help ] [Maximize] [ Print ] [ Quit ] [ Type ] [ Units ] [ Write ] Write partition table to disk (this might destroy data)
cfdisk 2.12r
Disk Drive: /dev/sda
Size: 400088457216 bytes, 400.0 GB
Heads: 255 Sectors per Track: 63 Cylinders: 48641
Name Flags Part Type FS Type [Label] Size (MB)
------------------------------------------------------------------------------
sda1 Primary Linux 400085.85
Are you sure you want to write the partition table to disk? (yes or no):
Warning!! This may destroy data on your disk!
Type yes
and enter to continue.
cfdisk 2.12r Disk Drive: /dev/sda Size: 400088457216 bytes, 400.0 GB Heads: 255 Sectors per Track: 63 Cylinders: 48641 Name Flags Part Type FS Type [Label] Size (MB) ------------------------------------------------------------------------------ sda1 Primary Linux 400085.85 [Bootable] [ Delete ] [ Help ] [Maximize] [ Print ] [ Quit ] [ Type ] [ Units ] [ Write ] No primary partitions are marked bootable. DOS MBR cannot boot this. Toggle bootable flag of the current partition
You should now be able to [ Quit ]
.
So to recap, I've created a single primary Linux partition, using all the available disk space on my USB drive.
Creating a new filesystem
Now to create our filesystem of choice. I'll be using reiserfs, you can choose ext3 or whichever you want.
# /sbin/mkreiserfs /dev/sda1
mkreiserfs 3.6.20 Copyright (C) 2001-2005 by Hans Reiser, licensing governed by reiserfsprogs/COPYING. A pair of credits: Continuing core development of ReiserFS is mostly paid for by Hans Reiser from money made selling licenses in addition to the GPL to companies who don't want it known that they use ReiserFS as a foundation for their proprietary product. And my lawyer asked 'People pay you money for this?'. Yup. Life is good. If you buy ReiserFS, you can focus on your value add rather than reinventing an entire FS. Vladimir Saveliev started as the most junior programmer on the team, and became the lead programmer. He is now an experienced highly productive programmer. He wrote the extent handling code for Reiser4, plus parts of the balancing code and file write and file read. Guessing about desired format.. Kernel 2.6.20-ARCH is running. Format 3.6 with standard journal Count of blocks on the device: 97677200 Number of blocks consumed by mkreiserfs formatting process: 11192 Blocksize: 4096 Hash function used to sort names: "r5" Journal Size 8193 blocks (first block 18) Journal Max transaction length 1024 inode generation number: 0 UUID: 19cfe77a-8913-4aed-bc25-775b42c9ce88 ATTENTION: YOU SHOULD REBOOT AFTER FDISK! ALL DATA WILL BE LOST ON '/dev/sda1'! Continue (y/n):y Initializing journal - 0%25....20%25....40%25....60%25....80%25....100%25 Syncing..ok Tell your friends to use a kernel based on 2.4.18 or later, and especially not a kernel based on 2.4.9, when you use reiserFS. Have fun. ReiserFS is successfully created on /dev/sda1.
Because this is an external hard disk being used for backups only, we can ignore the bit about rebooting.
The mountpoint
I'm going to create a new, more appropriately named mount point at /mnt/backup
:
# mkdir /mnt/backup
Now we need to make an entry in /etc/fstab
:
# <file system> <dir> <type> <options> <dump> <pass> ... /dev/sda1 /mnt/backup reiserfs user,noauto,rw 0 0
Lets mount the filesystem as our normal user (we need to be able to access it). Exit from any root shell you might be in.
$ mount /mnt/backup
$ df -h
Filesystem Size Used Avail Use%25 Mounted on ... /dev/sda1 373G 33M 373G 1%25 /mnt/backup
Remember that this is a Linux filesystem. FAT32 is somewhat relaxed when it comes to ownership of filesystems. We need to change ownership of the new fileystem to our normal user (you'll only need to do this once):
$ su -
# cd /mnt
# chown -R rob:rob backup
# exit
$ cd /mnt/backup
aquilonia
in this example), just in case I want to backup more than one host (I might have a laptop as well as a desktop, for example).
$ mkdir /mnt/backup/aquilonia
Running the backup (/home
)
Finally we need some software to backup our data. As I mentioned earlier, I'm going to use rsync. Install it if required.
One of the many things I like about rsync is the ability to use include or exclude lists. An include list allows you to specify exactly what you want to be backed up, while an exclude list allows you to specify exactly what you don't want to be backed up.
I prefer the exclude list approach, since if I forget to add a file or directory to the list, it just gets backed up. If I later decide I don't want it, I can add it to the exclude list and delete it from the backup archive. If you forget to add something to an include list, the first time you'll find out that you wanted that file is probably when you try to run a restore!
The command I now use for backing up my home directory (/home/rob
) is:
(This section has been edited. See the comments by Allen below this article for the reasons.)
$ rsync -vrlptg /home/rob/ /mnt/backup/aquilonia/home/rob --exclude-from=/home/rob/.rsync/exclude
Running through this quickly, the rsync
options and parameters I've used are as follows:
-v
verbose output-r
recurse into directories-l
copy symlinks as symlinks-p
preserve permissions-t
preserve times-g
preserve group/home/rob/
source: my home directory (note the trailing slash)/mnt/backup/aquilonia/home/rob
destination: the backup archive location--exclude-from=
use an exclude list/home/rob/.rsync/exclude
read exclude patterns from this file
Exclude list format
The format of the exclude list is fairly straightforward. Mine mainly consists of dot files put there by applications. Wild cards can be used. If you get the rsync command right (thanks Allen!), you can place a leading slash in front of each entry in your exclude list. If you don't put a leading slash in front of an entry, for example putting tmp
in the exclude list rather than /tmp
, then you will end up excluding all files and directories which are named "tmp
" within your hierarchy of files to be backed up, rather than excluding just ~/tmp
. (This could be useful for an entry like core
.)
Here is a heavily abridged example.
/.adobe /.aspell.en.prepl /.aspell.en.pws /.backup_fsck_last_run /.bash_history /.cddb /.cddbslave /.config /.dbus /download/gkrellm/plugins /.gconf /.gconfd /.gimp-2.2 /.gkrellm2 /.mozilla /.nautilus /.openoffice.org2 /paniclog /.qf /.qt /.realplayerrc /.recently-used /.sane /.serverauth.* /tmp /.viminfo /.Xauthority /.xsession-errors
If you're going to be using rsync, I would recommend reading the rsync
manpage. There are lots of options which you may find useful.
My first backup (most of my home directory) ran at 21.5 MB/s which is 172 Mb/s, not quite the 480 Mb/s it says on the tin, but much faster than my home Ethernet connection.
Having run the backup, you can now navigate to your backup archive and see what's been backed up. You'll see a very familiar file structure (a replica of your home directory, with all the bits you don't want missing!), which can be easily navigated, and from which you can easily copy files and directories as required.
Running the restore (/home
)
In the event of a disk failure or other catastrophe, you can restore the whole of your home directory (what's been backed up, anyway :) ) with a single command.
$ rsync -vrlptg /mnt/backup/aquilonia/home/rob /home
The options are the same as before, except the source is the backup archive (/mnt/backup/aquilonia/home/rob
) and the destination is your home directory. You are only need to specify /home
as the destination because you've already specified the rob
sub-directory as part of the archive source. rsync
will expect /home/rob
to be present and to be owned (or at least writeable) by the account running the command.
Running the backup (/etc
)
Very briefly (as root):
# rsync -av /etc /mnt/backup/aquilonia
This will create /mnt/backup/aquilonia/etc
and copy the contents of /etc
into it. You'll notice that file ownership and permissions are maintained, which is a good thing for security. Keep that USB hard disk safe!
If you're going to be backing up your system files with rsync, the manpage is your friend. Check your options. You can replace /etc
in the command above with /var
or /root
as required, and you can always use exclude lists if you want.
Automating the backup
Your backup should be automated. An automated backup will work best if your PC is on all the time.
I recommend that you write a script if you're going to automate your backups. You can then run the script regularly using cron
.
Ideally, this script should:
- check if your backup disk is mounted / mount it if required / if it can't be mounted, fail gracefully
- run the backup
- unmount the backup disk
- log the output of the backup somewhere
I'll leave that as an exercise for you!
Other options
If you don't have a USB hard drive, there are other options available to you. All of these can be used with rsync
.
- Backup to a second internal hard disk drive
- pro: faster
- con: more hassle setting up or transferring to another machine
- Backup to a Network Attached Storage (Samba or NFS)
- Backup to a networked host using rsync with SSH
- Backup to an External hard disk drive (esata)
I hope this has been helpful. Happy backing up.