9.7 Developing a Backup Strategy
This is a hardware book, so we
don't spend much time on software. But, in our
experience, many people who buy a tape drive have no idea how to use
it effectively. We won't try to explain how to use
your backup software because the specifics vary and nearly any
software bundled with a tape drive is sufficient for the task, but we
will devote some space to explaining how to get the most from your
tape drive and backup software.
9.7.1 File Attributes and the Archive Bit
If you have a tape drive
large enough to back up your entire hard disk and the time necessary
to use only complete backups, the status of any particular file
doesn't matter. Every file gets backed up every
time, whether it was created that day or has been sitting unchanged
for a year. But if you need to use some combination of complete and
partial backups, the status of each file becomes critical. If a file
is unchanged since the last complete backup, you want to ignore it
when doing partial backups. If the file was created or changed since
the last complete backup, it needs to be copied to the partial backup
tape.
Windows maintains a file attribute for each file called the
archive bit. When a file is created or changed,
Windows toggles the archive bit on, indicating that that file is a
candidate for backup. Backup software can manipulate the archive bit,
either turning it off after it backs up the file, or leaving it on so
that file will again be backed up the next time you do a partial
backup.
The archive bit exists to provide a certain indication that a file
requires archiving. Early Windows versions stored one timestamp for a
file. In theory, that timestamp was changed when the file was created
or modified. In practice, it was possible for an application to
modify a file without changing the timestamp, which meant that a
backup application that depended on the timestamp could fail to back
up a file that had changed contents, which meant that the archive bit
was the only reliable indicator of whether a file required archiving.
Linux stores more timestamp information about each file, including
the date it was created, last accessed, and last modified, as does
the Windows NT/2000/XP NTFS filesystem. In theory, that means such
systems can be backed up reliably based on timestamp information. In
practice, we still prefer using the archive bit as a flag because
that bit always indicates the archive status of a file. If a backup
is done based on timestamp, no indication remains with the file
itself as to when (or whether) it was last backed up.
9.7.2 Understanding Backup Types
Backup software can use or ignore the
archive bit in determining which files to back up, and can either
turn the archive bit off or leave it unchanged when the backup is
complete. How the archive bit is used and manipulated determines what
type of backup is done, as follows:
- Full backup
-
A full
backup, which Microsoft calls a normal
backup, backs up every selected file, regardless of the
status of the archive bit. When the backup completes, the backup
software turns off the archive bit for every file that was backed up.
Note that "full" is a misnomer
because a full backup backs up only the files you have
selected, which may be as little as one
directory or even a single file, so in that sense
Microsoft's terminology is actually more accurate.
Given the choice, full backup is the method to use because all files
are on one tape, which makes it much easier to retrieve files from
tape when necessary. Relative to partial backups, full backups also
increase redundancy because all files are on all tapes. That means
that if one tape fails, you may still be able to retrieve a given
file from another tape.
- Differential backup
-
A
differential backup is a partial backup that
copies a selected file to tape only if the archive bit for that file
is turned on, indicating that it has changed since the last full
backup. A differential backup leaves the archive bits unchanged on
the files it copies. Accordingly, any differential backup set
contains all files that have changed since the last full backup. A
differential backup set run soon after a full backup will contain
relatively few files. One run soon before the next full backup is due
will contain many files, including those contained on all previous
differential backup sets since the last full backup. When you use
differential backup, a complete backup set comprises only two tapes
or tape sets: the tape that contains the last full backup and the
tape that contains the most recent differential backup.
- Incremental backup
-
An
incremental backup is another form of partial
backup. Like differential backups, Incremental Backups copy a
selected file to tape only if the archive bit for that file is turned
on. Unlike the differential backup, however, the incremental backup
clears the archive bits for the files it backs up. An incremental
backup set therefore contains only files that have changed since the
last full backup or the last incremental backup.
If you run an incremental backup daily, files changed on Monday are
on the Monday tape, files changed on Tuesday are on the Tuesday tape,
and so forth. When you use an incremental backup scheme, a complete
backup set comprises the tape that contains the last full backup and
all of the tapes that contain every incremental backup done since the
last normal backup. The only advantages of incremental backups are
that they minimize backup time and keep multiple versions of files
that change frequently. The disadvantages are that backed-up files
are scattered across multiple tapes, making it difficult to locate
any particular file you need to restore, and that there is no
redundancy. That is, each file is stored only on one tape.
- Full copy backup
-
A full copy backup
(which Microsoft calls a copy backup) is
identical to a full backup except for the last step. The full backup
finishes by turning off the archive bit on all files that have been
backed up. The full copy backup instead leaves the archive bits
unchanged. The full copy backup is useful only if you are using a
combination of full backups and incremental or differential partial
backups. The full copy backup allows you to make a duplicate
"full" backup—e.g., for
storage offsite, without altering the state of the hard drive you are
backing up, which would destroy the integrity of the partial backup
rotation.
 |
Some Microsoft backup software provides a
bizarre backup method Microsoft calls a daily copy
backup. This method ignores the archive bit entirely and
instead depends on the date- and timestamp of files to determine
which files should be backed up. The problem is,
it's quite possible for software to change a file
without changing the date- and timestamp, or to change the date- and
timestamp without changing the contents of the file. For this reason,
we regard the daily copy backup as entirely unreliable and recommend
you avoid using it.
|
|
9.7.3 Choosing a Tape Rotation Method
A tape rotation
method is a procedure that specifies when each particular
tape will be used, and what will be backed up to it. For example, for
a simple tape rotation scheme, you might label five tapes Monday
through Friday and then do a complete full backup to the
corresponding tape each day. Some tape rotation methods are simple
and use only a few tapes. Others are immensely complex and use many
tapes. Choosing the most appropriate tape rotation method is a
critical step in developing and implementing your backup plan.
On one extreme, you could use the same tape
everyday, but that has obvious dangers, including the risk of that
one tape being lost or damaged, the inability to retrieve a file that
was deleted or corrupted more than a day previous, and the inability
to keep an offsite copy. On the other extreme, Robert once did some
consulting for a law firm that never reuses a backup tape. Every
evening they do a complete backup and compare of their
"active" volumes to a new tape,
which is then stored indefinitely in their vault. They regard the
small daily cost of a new backup tape as trivial relative to the
benefit of being able to reconstruct their data exactly for any
specified day.
Chances are, the best tape rotation method for you falls somewhere
between those extremes. Here are some issues to think about when you
choose a tape rotation method:
- Availability
-
When you need to do a restore, whether of a single file accidentally
deleted or of an entire volume whose hard drive crashed, time is
often important. A proper tape rotation scheme ensures that the most
recent backup data is immediately available to restore.
- Archiving
-
The most recent version of your backup data may not be good enough.
Perhaps a file was accidentally deleted or a database improperly
modified some time ago, but that was only recently discovered. The
most recent backup may, for various reasons, be missing the file you
need. An ideal tape rotation method allows you to retrieve a version
of a file from days, weeks, or months previous, before the file had
been deleted or improperly modified. Tape sets created with the best
and most powerful tape rotation methods allow you to select from
multiple versions of the file so that you can retrieve the most
recent good version. A good tape rotation method also makes provision
for periodically removing a tape from the rotation and archiving it
for historical reasons.
- Redundancy
-
Tapes
can break or be misplaced. Someone may overwrite the wrong tape. A
good tape rotation scheme recognizes these facts, and uses redundancy
to minimize the effect of such problems. If the file
can't be retrieved from one tape, it should be
retrievable from another.
- Equalized tape wear
-
Ideally, you'd like all tapes in the set to be used
equally often to distribute wear evenly across the set. The simpler
tape rotation methods usually fall down in this regard. For example,
the popular Grandfather-Father-Son rotation, described later in this
section, requires writing to some tapes in the set once a week, to
others once a month, and to still others only once a year. Although
equalizing tape wear is a less important consideration for most users
than the others described, doing so is desirable in that it minimizes
the chance that a tape will break, stretch, or otherwise become
unusable because it has been used too frequently.
Many standard tape rotation methods exist. Some are simple and use
few tapes, but fail to meet some of the goals described earlier.
Others meet each goal, or nearly so, but are difficult to manage and
require many tapes. Some methods use only full backups, others use
both full and partial backups, and still others may be modified to
use either only full backups or a combination of full and partial
backups.
 |
If you have a choice, use only full backups. Use partial backups only
if you are forced to do so by limited tape drive capacity or a backup
window that is too short to allow using all complete backups. When it
comes time to restore, you will find that it makes your life much
easier to have the entire data set in one place rather than
distributed among multiple tapes.
|
|
Here are the most common
backup rotations:
- Daily full
-
The simplest rotation is to do a complete
full backup each day, assuming you have both adequate tape drive
capacity and a long enough backup window. Most sites that use this
method use 10 tapes, labeled "Monday
A" through "Friday
A" and "Monday B"
through "Friday B." Using this
method offers the considerable advantages of simple administration
and extreme data redundancy. It's always obvious
which tape you should be backing up to. If you start a restore and
your most recent backup tape breaks, you simply use the next most
recent tape. All tapes receive equal wear, and can be replaced
periodically as a set. You can cycle each backup tape offsite as it
is replaced by today's backup, leaving your most
recent backup available onsite for easy restores, while having an
offsite tape that is only one day old. The sole disadvantage of this
rotation is that it limits you to retrieving historical data from
only two weeks prior, assuming that you use 10 tapes. This problem is
easily addressed. Simply add four Quarterly tapes or 12 Monthly tapes
to the rotation, and do a duplicate backup to the appropriate archive
tape at the end of each quarter or month.
- Weekly full with daily differential
-
This is probably the most commonly used
rotation on PC-class systems. In its simplest form, it requires only
three tapes: "Weekly A,"
"Weekly B," and
"Daily." On
"odd" Fridays, you do a full backup
to Weekly A. On "even" Fridays, you
do a full backup to Weekly B. Monday through Thursday, you do a
differential backup to the Daily tape. This rotation is simple to
manage and requires few tapes, but has the following disadvantages:
Historical data can be retrieved for a period of at most two weeks.
If you accidentally delete a file and don't realize
it for a couple of weeks, that file is gone for good. If the Daily tape fails during a restore, your next most recent tape
is the last Weekly tape, which means you may lose as much as four
days worth of data. Only one current copy of the normal backup exists, so you must either
keep it onsite for easy retrieval or offsite for safety. Tape wear is very uneven, since the Daily tape is used eight times
more often than the Weekly tape.
Simply adding more tapes and making minor changes to the rotation
solves most of these problems. For example, add a tape to do a second
full backup each Friday, and store that tape offsite. Add a second
Daily tape and alternate using them, or simply use a tape for each
workday. To extend historical data, add four Quarterly tapes (or 12
Monthly tapes), do a full backup to the appropriate tape on the final
day of the corresponding quarter (or month), and then store the tape.
 |
The weekly full with daily differential rotation
described earlier is an excellent choice for most people, but beware
the similar-sounding weekly full with daily
incremental rotation, which is the worst possible choice
short of not backing up at all. For some reason, this rotation is
recommended in many books and even in some tape drive manuals.
Don't use it if you value your
data! Its only advantage is that it minimizes backup
times, but at the expense of data security. Because this rotation
uses incremental backups, each Daily tape contains a different group
of files. Restoring one file may require looking at multiple tapes to
ensure that you are restoring the most recent version. Doing a
complete restore requires that you be able to restore the most recent
normal backup tape and all subsequent Daily tapes successfully. If
any Daily tape fails during the restore, you must either revert to
the last normal backup, losing all subsequent changes to files, or
risk incoherent file versions caused by restoring only some of the
Daily tapes.
|
|
- The Grandfather-Father-Son rotation
-
The
Grandfather-Father-Son (GFS) tape rotation method is more commonly
used on servers than on personal systems, but it's
worth considering if your data is very valuable and you think
it's worth going to some trouble and expense to
secure it. GFS is the easiest to manage of any of the
"complex" tape rotations, requires
relatively few tapes, and is supported directly by every backup
program on the market. A typical GFS rotation tape set requires 21
tapes, as follows:
Daily tapes. Label four tapes Monday through Thursday. Back up each
day to the tape for the corresponding day, overwriting each tape once
a week. Weekly tapes. Label five tapes Friday-1 through Friday-5. Back up
each Friday to the corresponding weekly tape, using the Friday-5 tape
only in months that have five Fridays. Weekly tapes 1 through 4 are
overwritten once a month, with Friday-5 being overwritten less
frequently. Monthly tapes. Label 12 tapes January through December. Back up the
first (or last) of each month to the corresponding monthly tape.
Monthly tapes are overwritten only once per year.
GFS meets most of the goals of an ideal tape rotation method. You can
keep recent tape sets onsite, and migrate others offsite. GFS
provides weekly granularity for the preceding month and monthly
granularity for the preceding year. GFS provides numerous copies of
both recent and older data. The disadvantage to GFS is that tape wear
is uneven. Daily tapes are written once a week, weekly tapes once a
month, and monthly tapes only once a year. Uneven tape wear is a
small price to pay for the other advantages of GFS, however. Most GFS
rotations use differential backup for daily tapes and full backup for
weekly and monthly tapes, but nothing prevents you from using full
backup for all tapes.
 |