You Will Lose Data!

Last modified 28 June 2009 00:30 ET

Data is destroyed all the time

You can go on thinking that it won’t happen, but you will lose data. You’ll save the wrong data, or toss out something you need later, or your kid will toast your files, or your operating system will fry your disk directory, or your disk drive hardware will fail. Or some combination of the above.

It’s not a question. It will happen to you! It’s just a matter of time. Even if you’ve bought the most expensive redundant RAID set up, it's still all too possible to trump disk redundancy by overwriting data or tossing something out.

What's the worst that could happen?

It’s best to think of these horrible ideas ahead of time so you can have a plan in place.

Who am I to tell you this?

I’m a system manager. I’ve worked with Windows, UNIX (MacOS X, Linux, Tru64, Solaris, AIX), MacOS, and VMS. I’ve seen it happen everywhere, from large corporations to home users. It doesn’t matter if you spend more money on better disk drive hardware if you destroy the data yourself, it’s still gone.

As an aside, I’m not trying to sell you anything. What I’m trying to do is make sure you’re aware of the risks involved with computer systems. You can then, with your own situation in mind, decide what to do. While I could design a backup solution for you, no one I’m not already working for has ever hired me for that, so I’m not expecting it. Of course, you could be the first.

What can you do about it?

Like the Boy Scouts say, “Be prepared”. It’s too late when your data is figuratively, or literally, disk parts spattered all over the walls. Right now is the best time to start, because you can’t start yesterday or last week.

You’ve got to make backups. The typical home user can start small, as you’ve got less (quantitatively) to lose. Determine what’s important. You may not need to copy everything you’ve got, as you can reinstall the operating system and applications from their distribution disks or by downloading them again, but your personal data (financial info, pictures of your family, recipies you developed over the years, the song you wrote last night) can’t be.

Determine what you need to backup and how big it is, collectively. That will help determine how you’re going to do your backups.

For small backups such as a single file, a small ( < 2GB ) memory chip or a CD-R disk might do. For a ton of pictures or video of your child, you’ll need something bigger and more expensive. How much is that data worth to you? Can you afford to lose it? Can you recreate it?

Backup Types

There are two kinds of backups.

Archival Backups

For simple backups, you can use a small memory chip, a CD-R, or a DVD-R. You probably already have at least one of these available, so the cost is low.

More comprehensive solutions cost more. You could get a tape drive for your system. These generally come with software to run them and to manage your backups by tracking which file is on what tape. Unfortunately, they're horribly expensive for home users.

I’m lazy. I use a set of disk drives so that I can backup everything (operating system, applications, data, and settings/preferences) and not have to concern myself with reinstalling everything if there’s a major problem. I used to use a tape drive. Before I got my first tape drive in 1994, I backed up my 100MB disk drive to more than 100 800KB floppy disks. Sitting there putting floppies in and pulling them out again for an hour or two got rather old very quickly. In order to give the appearence of archival backups, in case I needed to recover something older than the newest backup, I had two of these sets of 100+ floppy disks, used in turn. This was so much fun to maintain.

As an aside, note that the harder your backup solution is to keep up to date, the less likely you are to do so. I was happy to run my floppy-based backups every two or three weeks. The kicker is that the more out of date your most recent backup is, the less useful it is if you need to recover something. Anyway...

The situation has changed somewhat since 1994. You can now get a relatively cheap tape drive for only a few hundred dollars (US), and have a much easier time with your backups. Unfortunately, cheap tape drives are small compared to the size of files, such as music, image, and other documents, we keep these days. That means you have to buy lots of media (tape cartridges) and swap them in and out of the drive frequently, like my 100+ floppies.

It’s not necessary to save everything if you don’t mind reinstalling everything in the event of a hardware failure. It is cheaper up-front to use CD-R disks, for instance, but it’ll take longer to get back to normal when you need to restore everything after a hardware failure since you’ll have to find all your installation media and download sites before you can restore your data.

I do not recommend using a disk drive for archival backups. While easy to understand (’tis a big floppy disk, guv’nor!), it has the same problems as any other single disk drive, and can only be expanded via buying a second one. Some optical disks might be better, but at this time they're either too small (CD-R) or unproven (DVD-R, BluRay). Tape is slower than a disk drive and costs more up front. Once upon a time, adding more tapes (read: more capacity) was significantly cheaper than adding more external disks. This is no longer the case. While I’ve got 20+ year old 9-track tapes which still contained their data last time I tried them, and adding more 9-track tapes would be cheap if I can find them at all, the biggest such tape holds about 120MB. That's megabytes. Much bigger tapes exist, but the cost of a single multi-hundred gigabyte (GB) cartridge is approximately the same price as a bare drive which holds more. I got a 1 TB (terrabyte) drive for $90 a few weeks ago (early June 2009). Even adding an external enclosure to the drive added less than $50. One terrabyte for under $150, and it'll only get cheaper.

On my Macintosh, I currently use Retrospect and an arrangement of external disk drives. I have a 1TB drive for Retrospect, and a 300GB drive for Time Machine. Retrospect is archival backup, and Time Machine is more like live backup, as it runs every hour. It’s not as good as a RAID, but I had the 300GB drive on a shelf and Time Machine came with Mac OS X, so the combination cost me zero. Free is good.

The two drives I added a few weeks ago, two of the 1TB drives I described above, are for off-site backup. Off-site backup means it's not anywhere near your computer, the farther away, the better. I bring them back and update them occasionally, then get them off-site again.

This won’t be the right answer for everyone, but for the amount of data I have, and because I don’t want to have to reinstall every little thing, it’s right for me.

Redundant Backups

In order to prevent everything from coming to a halt when you lose a disk drive, you can use more than one. Here’s where I’ll talk about RAID, which stands for, depending on who you ask, either Redundant Array of Inexpensive Disks, or Redundant Array of Independant Disks, and probably others as well.

Terms I’ll be using:

There are a number of different kinds of RAID.

RAID
level
Description
0 (Striping)This is a way to make a larger virtual disk out of a bunch of smaller physical disks. It’s not redundant and you shouldn’t use it by itself because loss of any one of its component physical disks loses all your data. It’s useful in combination with other kinds of RAID.
1 (Mirroring)RAID 1 makes one virtual disk from two identical physical disks with the same content. This has several advantages. It’s fast, as a read or write is completed by whichever disk gets to it first. It’s redundant, since both disks contain the same thing at all times. The loss of one of the disks just means you’ve lost redundancy, not that you’ve lost data. Best for databases because it’s fast, but it uses twice the number of disks. You can use this with RAID 0 to make a bigger redundant disk.
5 (Distributed parity)Using at least 3 physical disks, this is a version of RAID 0 with redundant error correction code scattered about the physical disks. This creates a nicely distributed storage arrangement wherein the loss of a single disk not only doesn’t lose any of your data, but is easy to fix. The data on the lost disk can be recreated by using the remaining disks. It’s slower than RAID 1 because it has to access all the disks to get or save your data.

There are also combinations of the above used for various reasons, like mirrored striping, sometimes written as 0+1 or 10. I did leave several options off the list. Feel free to Google on “RAID types” for more if you’re interested. The most common ones are levels 1 and 5.

With appropriate “hot swappable” hardware, in the event of a disk failure with RAID 1 or 5, you just pull out the bad drive and replace it. Your system never sees anything significant go wrong, doesn’t lose any data, and there’s no downtime.

Note that RAID 1 and 5 assume that you’ll never lose more than one disk at a time. Since that’s usually a safe bet, they’re good by far most of the time. Delaying too long before replacing a bad drive might allow a second drive to fail. If that happens, you've lost your data. If you need to have more redundancy, other RAID levels such as 6 are good for losing two disks at once. Also, some implementations of RAID 1 allow you to provide a third redundant disk, just in case. Or you could have a spare disk available to the RAID controller, for use in automatically repairing a reduced RAID set. This is most often used in corporate environments where there are a number of RAID sets connected to one controller and the spare disk will be used to automatically replace the next drive that fails, in any of the RAID sets. A home user is less likely to do this. It’s expensive.

RAID is mainly useful for situations where “zero downtime” is imperative, such as in businesses that run 24×7. Home users probably don’t need it, but they might choose it depending on their needs.

A word about hardware versus software RAID

RAID requires a certain amount of computing power. RAID 1 requires the data to be written out twice, and read from which ever device answers first. RAID 5 requires the error correcting data to be computed and the data to be written out to all the drives, or read from all the drives and checked against the error correcting data to protect against corruption.

If you use a hardware RAID solution, the extra computing power is handled by specialized hardware, leaving the rest of your computer free to continue with whatever it’s doing. On the other hand, a software RAID solution requires your CPU and its memory to do all this grunt work, which means it’s handling your disk requests more than it would be otherwise, costing you performance.

Also, while either software or hardware RAID can use code which has bugs, it’s generally less likely that the firmware in a hardware RAID solution will have big known problems, especially if you do your research before buying your hardware RAID implementation. Ask around on-line, look at reviews by many other users, and get a solution which has the least problems. Currently, I give 3ware a big thumbs up, but I’ve not tried everything, and the ones I’ve tried (Promise, Intel, 3ware, and Adaptec) may have newer versions which are better. Especially consider the utility software and how the product handles recovery. DO YOUR RESEARCH!!

How to Select a Backup Option

What to use for your backup method hinges on how much you need to backup. For instance, if you’re just backing up a Quicken accounts file, you can probably get away with a small memory chip. If, on the other hand, you’re backing up your whole disk drive, you probably want to use something larger. There are steps in-between, and issues regarding cost and speed.

Let’s make a list of the various options, what they cost, and what they’re good for.

Backup deviceCost Size What it is good for

Internethow much are you willing to pay? 100MB to 2GB or more, depending on cost Small backups of things you don’t mind sending over the public network. There’s a security issue with this.

Floppy diskeffectively free 1.44MB (HD) This is good for tiny backups that will fit on a single floppy disk. It probably no longer comes with your computer, so isn’t a real contender any more.

ZIP diskdrive: ?
media: $15/100MB
 250MB, 750MB In my experience, with the older 100MB units, they’ve got a history of being rather less reliable than a floppy disk. I can’t speak to the reliability of the bigger drives.

CD-Rdrive: under $100
media: well under $1 in quantity
 650MB to 800MB Good for portability.

DVD-Rdrive: under $100
media: $25 for spools of 50
 4.7GB to 8GB+ Drives to read and write these are common. I use the term “DVD-R” to include the various DVD writing standards.

DAT tapedrive: up to ~$700 for DDS5
media: under $20
 2GB to 36GB (native) Used to be very nice, but it has reach end of life. Or has it? I've recently seen references to DDS5, which holds 36GB (72GB optimal compressed).

AIT tapedrive: $750 to $2200+
media: $50±
 25GB to 400GB (native) The highest level of this, AIT5, claims to get up to 1024GB using compression.

VXA tapedrive: starts at $699 (list)
media:$14 to $63
 33GB to 160GB, with a maximum of 640GB projected so far (native) 

DLT tape up to 110GB (native) 

SDLT tape 110GB to 800GB (native) 

LTO tape 100GB to 800GB (native) for now 

external disk drive whatever size you like you connect one of these, backup to it, then disconnect it from the computer.

RAID controller depends entirely on what you buy for disks This is good between archival backups.

Hardware Failure Warning Signs

Disk drives and fans both display similar “impending doom” signs: increased noise. If your disk drive is making more noise than it used to, such as a high whine or (uh oh) a scraping sound, it’s time to make a final backup and replace it. In fact, if you hear unusual scraping sounds from a drive, it may be too late already.

A disk utility or operating system feature might tell you of errors on the disk, and you might be able to determine that sectors on the disk are going bad. Even without noise, this is a Very Bad Thing. Those sectors won’t magically heal. The rate at which they fail will only increase. Once this starts happening, replace the drive.

Replacing the drive is simple, in theory. You get another, perhaps bigger, disk drive, back up everything you want to keep, replace the hardware, and restore your backup. If you’ve got some kind of RAID 1 or 5, you may even be able to skip the backup/restore if the RAID hardware will do it for you. Check your RAID controller manual for details.

In practise, replacing a drive may be somewhat more difficult. If your controller requires all the drives to be the same size, and no one makes that size of drive any more, you’ll have to substitute a bigger drive and accept that you’ll only get to use the original-drive’s-size-worth of use out of it.

If you want more from it than that, you can back up everything, and rebuild your RAID set using drives with the new size. If you can’t either take the time to do this, or spend the money to do this, accept the wasted disk space or buy spare disks up-front.

That probably wasn’t very clear. Here’s an example to explain it better, ’cause this is fairly important.

Here’s a RAID card. RAID card
Here’s a 4GB disk drive. 3 1/2 inch disk drive

(for any wise guys out there: if you notice issues about incompatibility between that disk drive and that RAID card, do please feel free to keep them to yourself.)

If I bought the RAID card and two of those physical disk drives, then put them all in a computer and configured them as a 4GB RAID 1 (mirror) virtual disk, I might think I’m set for all time. It doesn’t work that way.

Unfortunately, one of the two physical drives will fail some day, putting my nice RAID 1 into a reduced state, where it’s no longer redundant. If I want to restore the redundancy of my RAID 1, I have to get another drive with at least as much storage space as the physical disk that failed. Since it’s been several years since they made 4GB disks, I’d have to get the smallest I could find, which these days is probably an 80GB drive.

I’d put the 80GB drive into the existing 4GB RAID 1 set, and resume working with my 4GB of redundant disk space. Where’d the rest of the new physical disk go? Nowhere. It was ignored by the RAID controller.

I can fix this by getting a second 80GB drive, backing up the reduced (not redundant) RAID set, creating a new RAID set with the 80GB drives, and restoring the data to the new 80GB of redundant storage. What happens to the working 4GB drive? It goes in the closet. Or it gets recycled like the one that failed.

Or, I could have bought three 4GB disk drives in the first place, used two of them for the RAID 1 set, and put the spare in a closet. When one of the active disks failed, I could replace it with the one I set aside earlier.

Some RAID controllers will even allow you to set up all three units, designate two of them as a RAID 1, and the third as a “hot spare”. The “hot spare” drive will be used automatically when one of the active drives fails. You take your chances on the spare failing before it actually gets used, but that can happen if you leave it in the closet too. It’s a quite a bit less likely to fail in the closet, I think, because its moving parts aren’t moving.

“You pays yer money and you takes yer chances.”

It’s a matter of how much risk can you accept. It’s possible that a reduced RAID 1 will lose the remaining drive before you can replace the bad unit, but it’s not likely under normal circumstances. That can happen with RAID 5 too, which is only designed to lose one drive at a time. If you lose two, you’re sunk. There’s almost always a way to pay more money for more redundancy if your budget is high.

Since most people don’t have company-sized budgets, we have to figure out other answers. After your RAID 1 has a double failure, you can go back to your archival backup and restore it to the replacement hardware. You’ll lose whatever you did since that backup was taken, of course. Also, you might want to have a spare RAID card available in case the RAID card itself fails. The replacement card will tell you if there’s a problem with your RAID set. If there is, you recreate your RAID set and restore your data from archival backups.

External Disk Backups

Many people do their backups to external disks. Lots of vendors sell these. They’re just a disk drive like the one in your computer, but they’re in an enclosure you can connect to the computer from an external USB, Firewire (IEEE 1394), SCSI, SATA, or SAS plug. When you’re not actively doing a backup, you disconnect the drive from the computer, or at least turn it off. Preferably, you disconnect it then take it “off site”, meaning to a far distant location which will survive if there’s a meteor strike on your house, or your house simply burns down.

From previous discussion, you might notice a problem with this idea. What if... the external drive fails? That’s why you have more than one external drive you back up to! Or maybe you have an external drive you back up to some times, and a DVD-R drive you use at other times, and you use them in turn: external hard disk this week, DVD-R next week, then external hard disk again the following week. Or you get a self-enclosed external RAID box. Again, it’s a balance between how much risk you can afford, and how much cash you can afford.

Summary

Ideally, nothing bad ever happens, but since we all know that bad stuff will happen, it’s best to be prepared.


[Built with BBEdit] This site is maintained by Howard Shubs. You can reach him at shubs.net.
Using BBEdit with Interarchy for web page updates is just amazing!