Thursday, March 4, 2010

Rock, Paper, Digital Preservation

Digital preservation is da bomb.

It's an initiative at the Library of Congress. It's got a Wikipedia article, albeit a contested one at the time I'm typing this. It's got a library foundation. In my professional life, working in IT at UC Berkeley, digital preservation is a major concern. At UCB's School of Information, there are regular lectures and papers and courses on this topic by faculty I respect; not to mention the forum of this past August on the hot issue of last and this year, The Google Books Settlement and the Future of Information Access (that's me on the far side of the auditorium ... well, never mind, I can't see me either).

There's no question that in these times an enormous fraction of cultural activity is produced and disseminated electronically. In many cases, life-on-the-internet is the only life a cultural artifact has. Take these blog posts, for example. (Or, as Rodney Dangerfield might have begged, take these blog posts, please.) Where will they go when blogspot goes bye-bye? Will the Internet Archive be enough to save the world from oblivion? (Hint: the answer has two letters.)

Far more important -- circling back to turning books into unicode -- digitization, digital preservation, and digital access are the only hope millions of out-of-print, traditionally-published texts buried in obscure library stacks have of being read. Digitization or bust. And then there are the scholars of materials whose originals are scattered across the world, in archives that can't accommodate even those able to scrabble together funding to visit. For these researchers, digitization is the only hope of assembling a corpus of material suitable to their inquiries. So digital preservation probably does matter, and it may even matter a great deal.

My inner skeptic, however, is skeptical.

It's worth considering, I think, that the technologies by which human-created information have been preserved for the longest period of time are cave painting and clay tablets. Never mind that scholars of cuneiform texts are deeply reliant on projects like the Cuneiform Digital Library housed at UCLA, Penn, and elsewhere (because for them, too, complete corpora are widely dispersed). Seriously, though, there's really no contest. From 3350 BCE to 2010 CE: that's more than five thousand years of track record for preservation of human-written information. Hard to top that. And let's not forget the durability of stone inscriptions and stamped metal coins. Vellum, papyrus, paper ... it's all got better stats than any manufactured medium that preserves bits and bytes.

Last month I spent a bunch of hours rescuing the contents of hundreds of my own personal zip disks, 3.5" diskettes, and -- yes, wait for it -- 5.25" floppies. This mass of obsolete technology contained a quarter century's personal data, timestamped from the mid-80s to just last year. My household is about to recycle our last computer that has the right motherboard connector to accommodate a cable for the antique 5.25" drive I've been saving in a closet for just this purpose. It was now or never, really. The technology to read some of my old backup materials is on the point of disappearing. The lot -- all 25 years worth -- takes up 800MB, uncompressed. Do they even make flash drives that small anymore? I wrote the whole mess to a single mini-DVD. One shiny little 8cm disc. Imagine the drawer space I've freed up!!

But how many people put in the time and effort to save their digital ephemera? And just because I've saved mine -- this year -- doesn't mean that my DVD won't go bad, or DVD technology won't go the way of Betamax video before another couple of decades elapse (it will). Just because Google now allows you to "upload any file!" doesn't mean they'll never go out of business.

Hoping it's not bad form to refer to the same issue of the New Yorker twice in the same week, I have to comment here on one of my favorite magazine covers ever. If you ask me, there are real-world ideas in this speculative cover art (June 8-15, 2009). Sitting on the ruins of our electronica, come the day when the electricity stops, people (and aliens) will still be able to lean against an old, crumbling wall, open to the first paper page of a fine old book, and have a delicious, satisfying read.

So here's a question, circling back to the Great E-books Debate I've been mulling over in prior posts: if an author wants to see her own words last, not to mention the rest of literary culture, is digital publication -- the e-books thing as a sole channel for dissemination and preservation -- a truly viable option? Or is it a setup for near-certain, worldwide, catastrophic failure?


  1. It's a tough balance, too, when it comes to digitizing old things. On one hand, having the stuff in a format where people can access it is a great thing. But, realistically, even today's best efforts to digitize "once and for all" probably won't make it 15 years before people are clamoring to do it again with better technology. And then the folks in charge of preservation have to decide whether the materials can handle another round of it.

    In that respect, digital files have it easier-- you can convert the format of a document as many times as you want without damaging the original (so long as you don't forget to "save as".) There may be errors introduced in the conversion, but it's hard to argue that the occasional bad shadow creeping onto the text-area of a tightly bound book isn't "corrupting data" too.

    As for the e-book question, I think the biggest factor in having your work last "for the ages" is getting it in the hands of people who are serious about preservation. If e-books become a major medium of distribution that major libraries adopt, they'll inevitably also hire a team of digital preservationists to watch over the materials, shepherd them into the next format, keep backups of backups, etc. An e-book under that kind of watchful eye has a better chance of surviving than a printed book that never makes it into libraries that are serious about preservation, or-- if money is tight-- doesn't make it onto the "be sure to preserve" list.

  2. You might be on to something with rocks...

  3. That's a great link, Quinn, thanks. Always looking for the cloud obscuring the silver lining, my take-away quote is this one: Crucially, the nature of these digital markings will be determined by a universal agreement on a common storage language that will hopefully last thousands of years. That is yet to come [...].

    The larger point, of course, is that digital preservation is a real problem in the long-term ... meaning one that has no solution (on a paper- or rock-durability scale). Yet.

  4. Yeah, I actually chuckled a bit when I saw that line. I imagine the task of defining and maintaining a standard like this might well make the technology piece look easy by comparison.

  5. Perhaps in the future, we'll have another medium that combines both digital and physical aspects that can ensure data storage can be made more permanent and long-lasting. But in the near future, data management services will have a bigger role to play in preserving our data—both physical and digital. Most people believe that they can ensure their digital information just by having a back-up of their files and the belief that they can access it anywhere as long as they have a computer. But as we know, digital medium doesn't last forever. There has to be another way to secure your data, and data management services provide better solutions for data protection.

    Williams Data Management