Click here to receive your FREE subscription to Campus Technology
1/1/2008
DIGITAL REPOSITORIES: A GLOBAL WORK EFFORT
Stanford's Keller weighs in on petabyte-scale digital object storage systems.
Stanford University (CA) Librarian
Michael Keller was among the leading
digital archiving experts who
headed to Paris this past November for the
inaugural meeting of the Sun Preservation
and Archiving Special Interest
Group,
a Sun Microsystems-sponsored
community dedicated to working
on the unique problems of storage and data
management, workflow, and architecture for
very large digital repositories.
Sun PASIG brings together a large group of organizations
for an ongoing global discussion of their research
and to share best practices for preservation and archiving.
Here, CT asks Keller for his perspectives on the
effort, and on Sun PASIG's overall goals.
What sparked your professional interest in the work of Sun PASIG? More than 10 years ago, we in the library profession began to realize that we had to take responsibility for preserving-both for the long term and for access-the digital objects that were coming to us in increasing waves and numbers of flows, from varying sources. Over those 10 years, a lot of developments took place and a lot of projects launched, but none of them were particularly large-scale-at least the ones that anybody can talk about. We know that the government and secret agencies are doing a lot of big-scale gathering, but we don't know whether they are preserving anything. So, we need both software and hardware technology that can [work well] across very complex hardware arrays, but can also ingest across a very wide variety of data formats and what we might call "digital genres."
What is your own institution's perspective on that need? At Stanford, we recognized about five or six years ago that the university was producing various kinds of digital information on the order of 40 terabytes per year-as well as consuming information on the order of 40 terabytes per year. And, of course, that number has only increased in the intervening five years. Within those years, Stanford also signed on for the Google Book Search project, which, if our original ambitions are realized, will initially yield something on the order of a petabyte-and-a-half of digital information, for an initial database at Stanford of the books sent forward for Google to digitize. And that would be the first copy of the material, before we do anything to it. So, with those kinds of numbers floating around, we realized that we had to have a comprehensive solution to the problem of preservation of bits and bytes, the problem of access to copies of those files [for redundancy], and the problem of ingesting at a very, very high level in order to get the digital goods into the digital repository.
Now's the time to use online tutorials to streamline professional development and help desk management.