Digital Repositories: A Global Work Effort

A brief interview with Michael Keller

Stanford University librarian Michael Keller will join other leading digital archiving experts November 14-16 in Paris for the inaugural meeting of the Sun Preservation and Archiving Special Interest Group, a group dedicated to working on the unique problems of storage and data management, workflow, and architecture for very large digital repositories. The Sun PASIG brings together a large group of organizations for an ongoing global discussion of their research and sharing of best practices for preservation and archiving. Here, CT asks Keller for his perspectives on the effort and the goals of the Sun PASIG.

 

CT: What has the Sun PASIG set out to accomplish? Could you comment on its work from your own institution’s perspectives and needs?

Keller: More than 10 years ago we in the library profession began to realize that we had to take responsibility for preserving—both for the long term and for access—the digital objects that were coming to us in increasing waves and increasing numbers of flows from varying sources.

Over those 10 years a lot of developments occurred and a lot of projects started, but none of them were particularly large-scale—at least the ones that anybody can talk about. We know that the government and the secret agencies are doing a lot of big-scale gathering, but we don’t know whether they are preserving anything. [So, we need] technology both in the form of software and hardware that can manage across very complex hardware arrays, but also ingest across a very wide variety of data formats and what we might call digital genres.

At Stanford, we recognized about five or six years ago that the university was producing on the order of 40 terabytes per year, and consuming on the order of 40 terabytes per year, of various kinds of digital information. And of course that number has only increased in the intervening five years. So starting four years ago we acquired a big-tape robot and some spinning disk that was intended to help us understand how to manage the huge flow of digital objects that we need to ingest and preserve…

What was missing was a very effective spinning disk technology. We’d experimented with a few of them, and frankly, for various reasons there were points of failure that revealed themselves in operation. In the case of Honeycomb, we’ve tested it very extensively. We’ve subjected it to all the same stress tests, and the same experiences as other technologies and it’s proven to be quite robust.

That said, we know that we have to have a combination of magnetic disk technology, near-line tape storage, and off-line tape storage—which we will have to carefully manage for the very long term, until we see different technologies becoming available to us or different technologies becoming more robust and more appropriate to the various missions we have set for ourselves.

At an initial meeting of the preservation and archive SIG, this past June, we had a dozen institutions—such as the British Library, Oxford University, the Bibliotheque Nationale de France, Johns Hopkins, and the Swedish National Library. And what we decided as the focus for this group was to set it up as a kind of an instant peer review group. That meeting was focused on business drivers and architecture… The [meeting in November] is going to be larger in terms of numbers of institutions and people, and its concerns will be expanded to workflows, policy, and use cases—we will still spend time on the architecture specifications, design specifications, and software and hardware choices, but we intend to broaden the conversation… [There are] very serious issues for what will be very, very large digital archives…

And previous developments—things like DSpace and Fedora—are providing us with evidence that if we work hard in the initiating institutions and produce some experiences that we will talk about in our professional publications and meetings, we may end up in the next generation in a place where institutions without the same kind of IT prowess, without the kind of great IT support in the form of programmers, database analysts, systems administrators, and managers may be able to have equally large digital archives running without having to have that initial big investment.

About the Author

Mary Grush is Editor and Conference Program Director, Campus Technology.

Featured