Posted by: stackedfivehigh | October 23, 2011

More than alphabet soup

I decided to go for a little bit of overkill – an 8-hour workshop on electronic records management, and then the majority of shorter sessions on the same subject. First session – “File Formats: More than Alphabet Soup?”

The first speaker discussed the options an archives has for preservation of digital records. Pros and cons of open source vs. proprietary formats (often cost/support/longevity comparisons), how to do determine if a format is preservation-quality (is it stable? will the company who made it disappear? is it uncompressed and unencrypted?), and the benefits of the PDF/A-1, level A (not to be confused with level B) were discussed. Results? Do the best you can, there aren’t a lot of standards for preservation formats.

The second speaker talked about the problems she has had at the Smithsonian Institute in digitizing files. Some of the most common problems – the font Smithsonian uses in their letterhead/logo will not embed in a PDF/A, so there is an ongoing issue with file normalization. Two problems dealing with file type are misnamed files – a BMP with the extension .JPG, for instance; also the complete lack of file extensions (or entirely obsolete extensions). The speaker said they’ve recently accessioned files that were .ltr, .env, etc.

The third speaker discussed file formats as well as consideration of the scale of digitization (exabytes of data) and the sustainability of preservation formats. One of the problems reiterated throughout the session was that there is no set of steps you can take to preserve digital files, and then you’re done and can walk away. This speaker differed from the others in his acceptance of lossy compression as a real consideration for use. The argument is that it is not practical to avoid compression, and at times lossy compression (so long as the quality of the record is not impacted) may be the only real solution due to space considerations. We were reminded that a loss in data quality is far more likely to result from operator error, internal/external attacks, or organizational/economic failure than because a file was compressed in a lossy format.

The session provided some realistic considerations for preservation-level digitization, how to make the best preservation plan for your organization, and where to look in the future for the next formats. I personally would love to see an internationally supported open source standard that would allow a greater range of embedded data than the PDF/A-1, but the PDF/A-1 standard is a respectable option at this time for many record types.

Also – a useful reference is the Federal Agencies Digitization Guidelines Initiative. There is some great information here about different record types and format options for preservation.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s


%d bloggers like this: