Home > CT Visionary :: Michael Keller

Visionary

CT Visionary :: Michael Keller

1/1/2008

DIGITAL REPOSITORIES: A GLOBAL WORK EFFORT

Stanford's Keller weighs in on petabyte-scale digital object storage systems.

Michael KellerStanford University (CA) Librarian Michael Keller was among the leading digital archiving experts who headed to Paris this past November for the inaugural meeting of the Sun Preservation and Archiving Special Interest Group, a Sun Microsystems-sponsored community dedicated to working on the unique problems of storage and data management, workflow, and architecture for very large digital repositories. Sun PASIG brings together a large group of organizations for an ongoing global discussion of their research and to share best practices for preservation and archiving. Here, CT asks Keller for his perspectives on the effort, and on Sun PASIG's overall goals.

What sparked your professional interest in the work of Sun PASIG? More than 10 years ago, we in the library profession began to realize that we had to take responsibility for preserving-both for the long term and for access-the digital objects that were coming to us in increasing waves and numbers of flows, from varying sources. Over those 10 years, a lot of developments took place and a lot of projects launched, but none of them were particularly large-scale-at least the ones that anybody can talk about. We know that the government and secret agencies are doing a lot of big-scale gathering, but we don't know whether they are preserving anything. So, we need both software and hardware technology that can [work well] across very complex hardware arrays, but can also ingest across a very wide variety of data formats and what we might call "digital genres."

What is your own institution's perspective on that need? At Stanford, we recognized about five or six years ago that the university was producing various kinds of digital information on the order of 40 terabytes per year-as well as consuming information on the order of 40 terabytes per year. And, of course, that number has only increased in the intervening five years. Within those years, Stanford also signed on for the Google Book Search project, which, if our original ambitions are realized, will initially yield something on the order of a petabyte-and-a-half of digital information, for an initial database at Stanford of the books sent forward for Google to digitize. And that would be the first copy of the material, before we do anything to it. So, with those kinds of numbers floating around, we realized that we had to have a comprehensive solution to the problem of preservation of bits and bytes, the problem of access to copies of those files [for redundancy], and the problem of ingesting at a very, very high level in order to get the digital goods into the digital repository.



Recommended Reading