basically tech

73 Using a USB external hard disk for backups with Linux

Monday 23rd April, 2007

In this article, I show how I set up a recently purchased USB external hard disk drive as a backup drive for my Linux desktop PC. I'll delete the default FAT32 partition, create a new partition, make a reiserfs filesystem, and show how to use rsync to backup your important data.

Why backup with Linux?

The question should really be "Why backup with Linux instead of Windows, if I have a dual boot PC?" Personally, I don't dual boot, so for me the question never arises. However, I can think of a number of reasons why I would use Linux as my OS for running backups.

  • Windows cannot read your Linux partitions without third-party addons.
  • Conversely, Linux can (natively) read your FAT32 and NTFS partitions.
  • It is straightforward to create a backup script which mounts those partitions, backs them up, then unmounts them.
  • Backing up to Linux renders any virus-infected files inert.
  • Would you leave a Windows PC running overnight, unattended?

Considerations

What to backup?

The old school attitude is to backup everything, the whole system. As they say, "There's no school like the old school", and that really is the safest approach. It's not really a space issue to run full system backups since a fully installed system might take up a few gig, perhaps 3-10 GB including system logs and other stuff.

On a desktop system I'm not convinced this is necessary. My distro of choice (Arch Linux), and many popular distros today such as the many Debian-based distros, incorporate a rolling release. Installing and updating such systems is pretty easy (as well as being a useful learning experience). I'm not sure how much time you would save by running a full system backup.

You might want to backup some essential directories like /etc and parts of /var. /root may be useful, it depends how you run your system.

It will also be useful to have a list of all the installed packages on your system. pacman -Qi will provide this for Arch, dpkg --list will provide similar for Debian & Co. Redirect the output to a file in your home directory. If you're going to do this, cron it to run daily.

The important stuff for most desktop systems will be in /home; documents, e-mails, mp3s, movies, photos, etc., so backing up /home is what I'll be concentrating on for this article.

Backup as root or normal user?

A sensible rule is to try and run as much as possible without using the root account. Backup your system files as root, and your home directory seperately using your normal user account. This makes it a little easier to restore your normal user account data, since it will be easily accessible as yourself rather than root.

Leave backup partition mounted or unmount after each backup?

You should try to ensure that your backup partition is unmounted after each backup. This will help prevent data corruption in the event of power surges or outages.

Only use your backup disk for backups. Don't be tempted to store live data on it. If your backup disk fails, then you've lost everything on it. If it was only storing your backups, you can get a replacement and press it into service quickly. If it also had live data on there, then that live data is gone.

What backup software to use?

This article focuses on rsync. I like rsync. It has some very clever features, which include:

  • Internal pipelining reduces latency for multiple files.
  • If a file has changed, it sends just the differences in the files to the new location.
  • It can used over a network; I used to use rsync with SSH to backup to a test machine I have at home. This test machine is sometimes off (electricity is getting expensive; it's hard to justify having all these machines on all the time, it puts extra load on my UPS, it's noisy, etc), and so sometimes several days would go by before the backup got run. The USB connection is faster and is always on, so I never miss a backup now ;) .

Getting the hardware working

This varies according to which external USB HDD you've purchased. Read the documentation which comes with your kit. Before you start, run tail -f /var/log/messages.log in a seperate terminal (as root). This shouldn't really be needed, and is just to demonstrate that your system recognises the USB drive, which it should unless you have a very old (or perhaps a self-compiled) kernel. The basic steps probably involve something like:

  • Connect the USB drive to your system.
  • Attach power cables.
  • Power up.

Once you've got your drive connected and powered up, you should be able to see it on your system.

Your terminal running tail should display something like:

Apr 16 23:17:40 aquilonia usb 1-7: new high speed USB device using ehci_hcd and address 7
Apr 16 23:17:40 aquilonia usb 1-7: configuration #1 chosen from 1 choice
Apr 16 23:17:40 aquilonia scsi4 : SCSI emulation for USB Mass Storage devices
Apr 16 23:17:40 aquilonia scsi 4:0:0:0: Direct-Access     SAMSUNG  HD400LD          WQ10 PQ: 0 ANSI: 0
Apr 16 23:17:40 aquilonia SCSI device sda: 781422768 512-byte hdwr sectors (400088 MB)
Apr 16 23:17:40 aquilonia sda: Write Protect is off
Apr 16 23:17:40 aquilonia SCSI device sda: 781422768 512-byte hdwr sectors (400088 MB)
Apr 16 23:17:40 aquilonia sda: Write Protect is off
Apr 16 23:17:40 aquilonia sda: sda1
Apr 16 23:17:40 aquilonia sd 4:0:0:0: Attached scsi disk sda

We can see here that the new USB disk drive is presented as SCSI device /dev/sda with a single partition /dev/sda1

Ctrl-c to close your tail command.

Your newly visible partition will probably be a FAT32 filesystem. If you're okay with that, you can make an entry in /etc/fstab something like this to access it:

# <file system>        <dir>         <type>    <options>          <dump> <pass>
...
/dev/sda1              /mnt/usb  vfat      user,noauto,rw          0      0

The options mean respectively that any user can mount the filesystem, but only the user that mounted the filesystem can unmount it (user), the filesystem will not mount at boot time (noauto), and the filesystem will be mounted with read and write access (rw).

You can test your new hard disk as a normal user:

$ mount /mnt/usb
$ df -k

Filesystem           1K-blocks      Used Available Use%25 Mounted on
...
/dev/sda1            390610848        64 390610784   1%25 /mnt/usb

The default FAT32 filesystem is convenient in that is already there, ready to be used. The disadvantages are:

  • Fragmentation.
  • No journaling/poor recovery in the event of a power failure etc.
  • Poor security - FAT32 files automatically belong to whoever mounts the filesystem.

Partition the disk

I prefer reiserfs (Reiser 3). I know I'll never need to backup a Windows machine since I don't do Windows, so putting up with the inadequacies of FAT32 is simply not required. You may wish to rethink which fileystem to go for, or perhaps a different partitioning strategy if you have a seperate Windows PC. I'm going to go for a single large Reiser 3 partition. To do this, I first need to use cfdisk to delete the old partition and create a new one. You will probably need root access or sudo for this, depending on how your system is configured.

# cfdisk /dev/sda

                                  cfdisk 2.12r

                              Disk Drive: /dev/sda
                       Size: 400088457216 bytes, 400.0 GB
             Heads: 255   Sectors per Track: 63   Cylinders: 48641

    Name        Flags      Part Type  FS Type          [Label]        Size (MB)
 ------------------------------------------------------------------------------
    sda1                    Primary   W95 FAT32                       400085.85 











     [Bootable]  [ Delete ]  [  Help  ]  [Maximize]  [ Print  ]
     [  Quit  ]  [  Type  ]  [ Units  ]  [ Write  ]

                 Toggle bootable flag of the current partition

A FAT32 partition. Let's get rid of it and then see what we have left. Select [ Delete ] from the menu.

                                  cfdisk 2.12r

                              Disk Drive: /dev/sda
                       Size: 400088457216 bytes, 400.0 GB
             Heads: 255   Sectors per Track: 63   Cylinders: 48641

    Name        Flags      Part Type  FS Type          [Label]        Size (MB)
 ------------------------------------------------------------------------------
    sda1                    Primary   W95 FAT32                       400085.85 











     [Bootable]  [ Delete ]  [  Help  ]  [Maximize]  [ Print  ]
     [  Quit  ]  [  Type  ]  [ Units  ]  [ Write  ]

                          Delete the current partition
                                  cfdisk 2.12r

                              Disk Drive: /dev/sda
                       Size: 400088457216 bytes, 400.0 GB
             Heads: 255   Sectors per Track: 63   Cylinders: 48641

    Name        Flags      Part Type  FS Type          [Label]        Size (MB)
 ------------------------------------------------------------------------------
                            Pri/Log   Free Space                      400085.85 











     [  Help  ]  [  New   ]  [ Print  ]  [  Quit  ]  [ Units  ]
     [ Write  ]

                               Print help screen

Right, nothing left. Time to create a new partition. This will be a primary partition, and I'll only make one - this whole disk is for backups. Select [  New   ], [Primary] to create a new primary partition. Accept the default size offered, which should be all the disk space available.

                                  cfdisk 2.12r

                              Disk Drive: /dev/sda
                       Size: 400088457216 bytes, 400.0 GB
             Heads: 255   Sectors per Track: 63   Cylinders: 48641

    Name        Flags      Part Type  FS Type          [Label]        Size (MB)
 ------------------------------------------------------------------------------
    sda1                    Primary   Linux                           400085.85 











     [Bootable]  [ Delete ]  [  Help  ]  [Maximize]  [ Print  ]
     [  Quit  ]  [  Type  ]  [ Units  ]  [ Write  ]

                 Toggle bootable flag of the current partition

Now write the new partition table to disk.

                                  cfdisk 2.12r
 
                              Disk Drive: /dev/sda
                       Size: 400088457216 bytes, 400.0 GB
             Heads: 255   Sectors per Track: 63   Cylinders: 48641
 
    Name        Flags      Part Type  FS Type          [Label]        Size (MB)
 ------------------------------------------------------------------------------
    sda1                    Primary   Linux                           400085.85 
 
 
 
 
 
 
 
 
 
 

     [Bootable]  [ Delete ]  [  Help  ]  [Maximize]  [ Print  ]
     [  Quit  ]  [  Type  ]  [ Units  ]  [ Write  ]

            Write partition table to disk (this might destroy data)
                                  cfdisk 2.12r
 
                              Disk Drive: /dev/sda
                       Size: 400088457216 bytes, 400.0 GB
             Heads: 255   Sectors per Track: 63   Cylinders: 48641
 
    Name        Flags      Part Type  FS Type          [Label]        Size (MB)
 ------------------------------------------------------------------------------
    sda1                    Primary   Linux                           400085.85 
 
 
 
 
 
 
 
 
 
 

     Are you sure you want to write the partition table to disk? (yes or no):
     
                 Warning!!  This may destroy data on your disk!

Type yes and enter to continue.

                                  cfdisk 2.12r
 
                              Disk Drive: /dev/sda
                       Size: 400088457216 bytes, 400.0 GB
             Heads: 255   Sectors per Track: 63   Cylinders: 48641
 
    Name        Flags      Part Type  FS Type          [Label]        Size (MB)
 ------------------------------------------------------------------------------
    sda1                    Primary   Linux                           400085.85 
 
 
 
 
 
 
 
 
 
 

     [Bootable]  [ Delete ]  [  Help  ]  [Maximize]  [ Print  ]
     [  Quit  ]  [  Type  ]  [ Units  ]  [ Write  ]
      No primary partitions are marked bootable. DOS MBR cannot boot this.
                 Toggle bootable flag of the current partition

You should now be able to [  Quit  ].

So to recap, I've created a single primary Linux partition, using all the available disk space on my USB drive.

Creating a new filesystem

Now to create our filesystem of choice. I'll be using reiserfs, you can choose ext3 or whichever you want.

# /sbin/mkreiserfs /dev/sda1

mkreiserfs 3.6.20
Copyright (C) 2001-2005 by Hans Reiser, licensing governed by reiserfsprogs/COPYING.
A pair of credits:
Continuing core development of ReiserFS is  mostly paid for by Hans Reiser from
money made selling licenses  in addition to the GPL to companies who don't want
it known that they use ReiserFS  as a foundation for their proprietary product.
And my lawyer asked 'People pay you money for this?'. Yup. Life is good. If you
buy ReiserFS, you can focus on your value add rather than reinventing an entire
FS.

Vladimir Saveliev started as the most junior programmer on the team, and became
the lead programmer.  He is now an experienced highly productive programmer. He
wrote the extent  handling code for Reiser4,  plus parts of  the balancing code 
and file write and file read.


Guessing about desired format.. Kernel 2.6.20-ARCH is running.
Format 3.6 with standard journal
Count of blocks on the device: 97677200
Number of blocks consumed by mkreiserfs formatting process: 11192
Blocksize: 4096
Hash function used to sort names: "r5"
Journal Size 8193 blocks (first block 18)
Journal Max transaction length 1024
inode generation number: 0
UUID: 19cfe77a-8913-4aed-bc25-775b42c9ce88
ATTENTION: YOU SHOULD REBOOT AFTER FDISK!
   ALL DATA WILL BE LOST ON '/dev/sda1'!
Continue (y/n):y
Initializing journal - 0%25....20%25....40%25....60%25....80%25....100%25
Syncing..ok

Tell your friends to use a kernel based on 2.4.18 or later, and especially not a
kernel based on 2.4.9, when you use reiserFS. Have fun.

ReiserFS is successfully created on /dev/sda1.

Because this is an external hard disk being used for backups only, we can ignore the bit about rebooting.

The mountpoint

I'm going to create a new, more appropriately named mount point at /mnt/backup:

# mkdir /mnt/backup

Now we need to make an entry in /etc/fstab:

# <file system>        <dir>         <type>    <options>          <dump> <pass>
...
/dev/sda1              /mnt/backup reiserfs user,noauto,rw         0      0

Lets mount the filesystem as our normal user (we need to be able to access it). Exit from any root shell you might be in.

$ mount /mnt/backup
$ df -h

Filesystem            Size  Used Avail Use%25 Mounted on
...
/dev/sda1             373G   33M  373G   1%25 /mnt/backup

Remember that this is a Linux filesystem. FAT32 is somewhat relaxed when it comes to ownership of filesystems. We need to change ownership of the new fileystem to our normal user (you'll only need to do this once):

$ su -
# cd /mnt
# chown -R rob:rob backup
# exit
$ cd /mnt/backup

Now the specifics of how you go about this are down to you. I like to create a directory named for the host I'm backing up (aquilonia in this example), just in case I want to backup more than one host (I might have a laptop as well as a desktop, for example).

$ mkdir /mnt/backup/aquilonia

Running the backup (/home)

Finally we need some software to backup our data. As I mentioned earlier, I'm going to use rsync. Install it if required.

One of the many things I like about rsync is the ability to use include or exclude lists. An include list allows you to specify exactly what you want to be backed up, while an exclude list allows you to specify exactly what you don't want to be backed up.

I prefer the exclude list approach, since if I forget to add a file or directory to the list, it just gets backed up. If I later decide I don't want it, I can add it to the exclude list and delete it from the backup archive. If you forget to add something to an include list, the first time you'll find out that you wanted that file is probably when you try to run a restore!

The command I now use for backing up my home directory (/home/rob) is:

(This section has been edited. See the comments by Allen below this article for the reasons.)

$ rsync -vrlptg /home/rob/ /mnt/backup/aquilonia/home/rob --exclude-from=/home/rob/.rsync/exclude

Running through this quickly, the rsync options and parameters I've used are as follows:

  • -v   verbose output
  • -r   recurse into directories
  • -l   copy symlinks as symlinks
  • -p   preserve permissions
  • -t   preserve times
  • -g   preserve group
  • /home/rob/   source: my home directory (note the trailing slash)
  • /mnt/backup/aquilonia/home/rob   destination: the backup archive location
  • --exclude-from=   use an exclude list
  • /home/rob/.rsync/exclude   read exclude patterns from this file

Exclude list format

The format of the exclude list is fairly straightforward. Mine mainly consists of dot files put there by applications. Wild cards can be used. If you get the rsync command right (thanks Allen!), you can place a leading slash in front of each entry in your exclude list. If you don't put a leading slash in front of an entry, for example putting tmp in the exclude list rather than /tmp, then you will end up excluding all files and directories which are named "tmp" within your hierarchy of files to be backed up, rather than excluding just ~/tmp. (This could be useful for an entry like core.)

Here is a heavily abridged example.

/.adobe
/.aspell.en.prepl
/.aspell.en.pws
/.backup_fsck_last_run
/.bash_history
/.cddb
/.cddbslave
/.config
/.dbus
/download/gkrellm/plugins
/.gconf
/.gconfd
/.gimp-2.2
/.gkrellm2
/.mozilla
/.nautilus
/.openoffice.org2
/paniclog
/.qf
/.qt
/.realplayerrc
/.recently-used
/.sane
/.serverauth.*
/tmp
/.viminfo
/.Xauthority
/.xsession-errors

If you're going to be using rsync, I would recommend reading the rsync manpage. There are lots of options which you may find useful.

My first backup (most of my home directory) ran at 21.5 MB/s which is 172 Mb/s, not quite the 480 Mb/s it says on the tin, but much faster than my home Ethernet connection.

Having run the backup, you can now navigate to your backup archive and see what's been backed up. You'll see a very familiar file structure (a replica of your home directory, with all the bits you don't want missing!), which can be easily navigated, and from which you can easily copy files and directories as required.

Running the restore (/home)

In the event of a disk failure or other catastrophe, you can restore the whole of your home directory (what's been backed up, anyway :) ) with a single command.

$ rsync -vrlptg /mnt/backup/aquilonia/home/rob /home

The options are the same as before, except the source is the backup archive (/mnt/backup/aquilonia/home/rob) and the destination is your home directory. You are only need to specify /home as the destination because you've already specified the rob sub-directory as part of the archive source. rsync will expect /home/rob to be present and to be owned (or at least writeable) by the account running the command.

Running the backup (/etc)

Very briefly (as root):

# rsync -av /etc /mnt/backup/aquilonia

This will create /mnt/backup/aquilonia/etc and copy the contents of /etc into it. You'll notice that file ownership and permissions are maintained, which is a good thing for security. Keep that USB hard disk safe!

If you're going to be backing up your system files with rsync, the manpage is your friend. Check your options. You can replace /etc in the command above with /var or /root as required, and you can always use exclude lists if you want.

Automating the backup

Your backup should be automated. An automated backup will work best if your PC is on all the time.

I recommend that you write a script if you're going to automate your backups. You can then run the script regularly using cron. Ideally, this script should:

  • check if your backup disk is mounted / mount it if required / if it can't be mounted, fail gracefully
  • run the backup
  • unmount the backup disk
  • log the output of the backup somewhere

I'll leave that as an exercise for you!

Other options

If you don't have a USB hard drive, there are other options available to you. All of these can be used with rsync.

  • Backup to a second internal hard disk drive
    • pro: faster
    • con: more hassle setting up or transferring to another machine
  • Backup to a Network Attached Storage (Samba or NFS)
  • Backup to a networked host using rsync with SSH
  • Backup to an External hard disk drive (esata)

I hope this has been helpful. Happy backing up.

Home