Wednesday, August 5, 2009


(or, a brief history of unarchivable document formats)

What were you doing 20 years ago? Did you work in an office? Was much of your job centered around creating or editing documents?

Let's assume it was. You may have been using a PC with an Intel 386 CPU:

This likely ran MS-DOS, possibly version 4.0. Your word processor of choice might have been Word Perfect 5.0 and Lotus 1-2-3 may have been how you kept your books balanced.

Although modern office software (i.e. Microsoft Office 2007) can import many (but not all) of the documents made by the above two programs, there is very little assurance that they will look the same as they did 20 years ago. In another 20 years who knows if the word processor you use will even be able to do that?

(WK3 files made by Lotus 1-2-3 cannot be opened by Excel 2007)

This might seem new, we've been rushing ahead with new document technologies for many years. Cheap books printed with acidic paper (common in between 1850 and 1950) have lifespans that measure decades without maintenance. Many different formats for creating and viewing Microforms were created during the 20th century, most no longer in use. Migrating from one document format to another in many cases has been expensive and time consuming. As slowing or stopping the tide of new applications seems unlikely, stable standards for document archival are needed to assure that you can find and accurately display your documents in the future.

To whom does this matter? It matters to any person or organization wishing to retain memories of their past. Will a term paper you wrote in college becomes relevant years after it is written? Will you want to read a birthday card you made or a love letter you wrote? Will a contract you've signed matter twenty years hence?

The problem is that we don't know what will become history until it is, we don't know the sentimental value of something until we've grown older and we don't who can benefit from our work until they've read it.

The solution to this, ideally, is to store as many documents as is feasible, ensure that the documents are self contained, make them easily searchable and use an open standard (like PDF) to create them. After that, make regular backups (both onsite and offsite) and and this will prepare you for the future.

(PDF/A documents will display the same way for decades, regardless of OS)

Obviously, this is easier said than done. But archiving your documents as PDF/A-1b compliant files is a good first step.