How can we be sure we'll remember our digital past?
As technology evolves, data from outmoded machines is put at risk; panel addresses pathways and costs.
Mark Thomson
When Donald Sweeney heard that his wife's computer had crashed, he was embarrassed.
Countless couples have lost all their files this way. But Mr. Sweeney felt particularly sheepish because he is the project manager for the Large Synoptic Survey Telescope (LSST), which may one day amass the largest database of night-sky photographs ever collected.
If all goes according to plan, the LSST's 6-foot-3-inch digital camera will power up in 2015 and start snapping 200,000 pictures of space (or 1.3 million gigabytes of data) every year.
"Here I am, protecting one of the world's biggest databases, and I let my wife lose all of her computer files because the disk crashed," he jokes.
So what if the LSST collection finds a similar fate? "We just can't let that happen," Sweeney says, his tone turning serious. "We're doing a lot of things to make sure that it never does."
Losing personal files can be upsetting. But failing to protect academic, government, or corporate data could erase irreplaceable pieces of history, says Francine Berman. She co-chairs a newly formed panel of experts tasked to ask how the world can protect its digital past, and answer a more nagging question: Who's going to pay for it?
Unfortunately, she says, the same culture that makes creating our digital lives so easy, makes protecting that data very difficult. Consumers expect faster computers, smarter software, and new gadgets every few years.
Consequently, "it's hard to read the information on floppy disks these days," says Dr. Berman, who is director of the San Diego Supercomputer Center. "Very few people still have the drives. It's hard to play LPs. They were everywhere only a decade ago. But now many people can't read them."
And if diskettes or vinyl aren't kept in the right environment, it won't matter if people have the right drives. The disks will decay. The records will warp.
"It's the great challenge of the Information Age," she says, and a problem that her Blue Ribbon Task Force on Sustainable Digital Preservation will explore over the next two years.
The international panel brings together computer scientists, lawyers, archivists, and economists from universities, corporations, and federal agencies. The effort is backed by the National Science Foundation, the Library of Congress, and several other organizations.
"I'm excited," says Ann Ferguson, who is not on the panel, but has wrestled with the issue as project manager for the Digital Futures Alliance in Seattle. "The task force has all the right people and represents a cross section of the major interests. You need to have a large panel, because if there were an easy answer, we would have done it by now."
Since 2000, the Library of Congress has collected a trove of recent history that was "born digital," particularly websites and YouTube videos from presidential elections.
But the project's funding has faced significant cutbacks from Congress. This stoked the debate on how to make such collections financially sustainable.
Fading media, formats
The problem of digital preservation reaches across two standards. There's the media – floppies, CDs, hard drives – and the format of the files themselves – does it run in DOS, Hypercard, ClarisWorks 2.0?
Microsoft tackles this issue of "legacy" computing by running a kind of corporate museum. The company protects its multiplatform history by preserving old copies of "every major hardware and software change," says Lee Dirks, director of Scholarly Communications at Microsoft and a task force member.
"We've got computers stored on campus that go back to the Altair, the first computer [to run Microsoft software]," he says. "In fact, we bought multiple copies of the Altair just in case."
But maintaining antique computers is a costly way to keep the past alive.
A concept that is gaining momentum, Mr. Dirks says, is emulation, where programmers trick modern computers into thinking the way their classic cousins did. This lets them run old software without retro machines. Another problem arises when the emulator itself is written for last generation's operating systems. Do you write an emulator to handle the original emulator?
A more likely approach to long-term preservation is migration, says Berman. This calls for updating the file format every generation – without changing the contents, one hopes. This method has problems, as well. Some of the original context will be lost in translation, says Dirks. Also, the scale of the conversation will snowball as the number, size, and back-catalog of the files increases with each passing generation of technology.
For example, after one year of photographing the night sky, LSST will likely produce more digital information on space than all past efforts combined, says Berman.
"So, again, who will pay for this?" asks Berman. "I don't expect being able to tell anyone in two years that it will be free."
A pay model?
While panel members are careful not to discuss possible recommendations in too much detail this early in the project, several of them mentioned basic economic models for making data accessible and sustainable.
They include an iTunes-style pay-per-use model, where users would be charged to download old books, census data, etc.; a privatized model, where businesses that already host pictures or files online agree to keep them for decades into the future; or a public-good model, where governments or endowments fund preservation.
"I think it's unlikely that we'll map out, 'Well, this fee structure will go to this kind of data and this model is for this industry,'" says Amy Friedlander, who serves on the task force and is director of programs at the Council on Library and Information Resources in Washington. "We should assume there will be a mix of strategies, because no model is mutually exclusive."
Deciding which files are worth saving is a judgment call the panel will leave to the community.
While the task force spends the next two years reflecting, many other alliances will be researching the problem as well.
At the same time that it's funding the Blue Ribbon Task Force, the National Science Foundation has offered $100 million for five organizations to design a "DataNet" for sustainable data preservation "over a decades-long timeline."
Across the Atlantic, the European Union launched its own push for preservation in June 2006.
Known as Planets, the four-year effort by several national libraries hopes to save the $4.3 billion worth of European data that's "at risk of digital obsolescence," reports the project.
Bursting with digital data of your own? Four ways to store it.
While government and industry spend the next few yearsthinking about large-scale digital preservation, computer experts say households should start protecting their files now.
"Only 10 to 30 percent of consumers back up their stuff, and really it's closer to 10," says Natalie Del Conte, senior editor for the technology news and reviews website CNET.com. "Everyone knows somebody who's lost all their files, yet we're still lazy."
So what's the best storage option for families? There have been countless formats that were once popular but now passé. (Remember Zip drives?) How do you pick one that will last at least until your next computer? At the moment, there are four key storage formats out there: discs (CD-Rs or DVD-Rs), external hard drives, solid-state drives, and online storage. Picking which is best for you is really a personal preference.
Just as cars no longer have tape decks, computers may one day stop shipping with CD drives. (Apple's new MacBook Air doesn't come with one.) While we are all comfortable with discs, they are inherently limited by the fact they only come in one size: 4-3/4 in. Manufacturers can squeeze more information onto a disc, but then you get into tricky format wars over whose new approach is best.
Inside an external hard drive is a spinning magnetic disc. That whirring sound your hard disc makes when it's moving large files is the physical churning of the drive. These twirlers are clunky, loud, and shouldn't be jostled (the magnetic pen that writes the data could scrape the wrong part of the disc).
Solid-state memory, however, has no moving parts – it's basically a circuit with a chip attached. It is silent, smaller, but vastly more expensive. Because of this price-per-gigabyte equation, you probably can't back up your whole music collection on an affordable flash drive. However, small, solid-state "thumb drives" have become the new cheap floppy discs. Digital cameras can fit one GB on a card the size of a nickel. And Apple can now cram 16 GB of solid-state memory into its iPhone. Nonetheless, consumers can buy more than 10 times that much storage for the same price with standard spinning hard drives. Just look at the 160 GB iPod model, or several thousand gigabyte hard drives that go into today's powerhouse PCs.
Because of this price difference, Ms. Del Conte suggests that families invest in an 80-gigabyte external hard drive. "It's the most affordable, and it's a good jumping-off point," she says. Price: around $100. Make sure that the drive works with USB plugs, which she says won't become obsolete anytime soon. Or, if your computer can use FireWire plugs, it's a faster connection.
The newest option, online storage, has been possible for years. But many people have simply not warmed to the idea of storing all their songs and pictures on someone else's machine. For the tech-savvy crowd, Microsoft offers a Home Server that can store files from every computer in the house. "But if I tried to explain that to my mother, her head would explode," jokes Del Conte. Know your needs, and buy with care.