Where Did All the Books Go?

In the year 2000, an ad for Qwest Communications had a very weary traveler arriving at an out-of-the-way motel. He asks if they have movies in the rooms. A bored motel clerk captures the dream of the modern communications industry while barely lifting her eyes from the pages of a paperback book: "We have every movie ever made, in every language, any time, night or day," she deadpans to a startled guest.

Qwest is not there yet. But it is not going alone. Joining Qwest are members of several digital federations who are working on an interrelated set of objectives: creating, preserving, sharing, and providing access to digital content in a cost-effective manner.

For librarians, the bandwidth issue is but one of the challenges. Members of the Internet2 (www.Internet2.org) project are working with librarians to address issues of data rates. Other problems relate to ways of dealing with information that is licensed and not owned. The librarians want to know that after a license expires there will continue to be access to archives of previously available material.

It Began with Gutenberg
Many readers are familiar with some of the early projects designed to open access to content via the World Wide Web. Notable is Project Gutenberg (PG), www.gutenberg.org, begun by Michael Hart when he received a grant of computer time at the University of Illinois in 1971. He decided that the greatest value created by computers would not be computing, but would be the storage, retrieval, and searching of what was stored in our libraries.

From the first document he typed in, the "Declaration of Independence," the project has grown to over 6,267 full-text electronic books available for free distribution. PG has been successful at attracting worthwhile content and a loyal following. At the current rate of growth, it's been projected that PG's 10,000th book will be added sometime in 2003.

The Gutenberg Project has long attracted gifted programmers and visionaries to its cause. In 1993, World Wide Web founder Tim Berners-Lee pondered how best to get PG's ASCII wares out to the public. Others have written scripts to better handle proofing issues, formatting, FTPing, etc. Recently, Charles Franks has applied technology in a manner that completely redefines the editing process. He has enlisted the help of a loyal army of volunteer proofreaders to help in the effort—one page at a time. Dubbed "Distributed Proofreading in Support of Project Gutenberg" ( http://texts01.archive.org/dp/), this project expects to be a major contributor to Hart's effort to have more than one million books available in the Project Gutenberg archive. They post the daily upload count on their Web site and have recently exceeded 5,000 pages proofread and uploaded in one day.

Hart has an ambitious goal. It may seem impossible, but with people like Franks on his side, and a new means by which anybody can help create eBooks by producing even one page per day, it could well happen in the PG founder's lifetime. Fortunately, there are more than 50 mirror sites around the world that will be able to share the load as added content will result in additional searches and downloads.

Sources: "Distributed Proofing site g'es through the roof" by David Moynihan.
http://promo.net/pg/Proof_Article_Nov2002.html; originally posted at www.theinquirer.net/?article=6167 Nov. 11, 2002.

If You Build it, Will They Come?
With a growing number of sites placing e-content online, John Mark Ockerbloom, a graduate student at Carnegie Mellon University, realized that a key bottleneck to widespread use of the content was the lack of a central "card catalog" that would allow people to find content. He conceived the Online Book Page (OBP), http://onlinebooks.library.upenn.edu/ , as a gateway that would not merely be a place where people could get content, but would also be a guide to the vast number of other sites that were providing similar services.

Now housed at the University of Pennsylvania, the OBP includes links to more than 18,000 books, more than 110 serials, and 300-plus archives of material. The archives are large, general-purpose collections with substantial English-language listings, topical collections, and foreign-language collections.

"We are now at the point where so much information is ‘born digital' that it makes sense the libraries provide a way to easily archive and search this information," says Ockerbloom, now a digital architect at the University of Pennsylvania. He recognizes that the issues are many and beyond the scope of any one library to tackle by itself.

Beyond Books and Access
The work of projects, such as PG and the OBP, are only a part of the growing world of the digital library movement. Leaders of the movement not only want to digitize books and serials, but want to make sure information is preserved, easily accessed, and assembled into topical collections across domains. However, much work needs to be done if library systems are to talk with each other and be able to meet the needs of teachers, scholars, and publishers. With goals and challenges like these, it is not surprising that the movement has spread beyond the early adopters to groups of libraries and colleges working collaboratively. The need for a federated approach is understandable when one realizes that the objective is sharing library holdings, as well as the costs of converting, preserving, and providing access.

One of the groups leading the way to the "library of the future" is the Digital Library Federation (DLF). The DLF is a consortium of libraries and agencies that are pioneering the use of electronic information technologies to extend their collections and services. Funded in large part by the government and foundations, the DLF operates through its members to provide leadership for libraries by identifying standards and best practices for digital collections and network access. The organization also helps coordinate research and development in libraries' use of information technology and helps start projects and services that libraries need but cannot develop individually.

Members include the Library of Congress and the National Archives along with 25 university libraries across the country. Other members are the New York Public Library and the California Digital Library. The goal of the DLF is to transform library services and the way libraries handle their patrons, their content providers, and each other.

As the DLF notes on its Web site (http://www.diglib.org/collections.htm): "In the digital library, collections are transformed through the integration of new formats, licensed (as opposed to owned) content, and third-party information over which the library has little or no direct curatorial control. Collection strategies and practices are not yet fully developed to take account of these changing circumstances, nor are their legal, organizational, and business implications fully understood."

As an academic federation, the DLF promotes strategies for developing sustainable, scaleable digital collections and encourages the development of new collections and collection services. Although it is fairly young, the federation and its members have already produced a number of Digital Library architectures, systems, and tools that are being used by its members and affiliates. The synergy is already producing enhancements to digital collections of members that can be used by the public.

Making Rare Books Accessible
One of the major goals of the DLF is to make available the rare book and artifact material that can only be accessed under restricted-use conditions. One has only to think of the rare book room in most university libraries to remember the books held in climate-controlled conditions that had to be read while wearing gloves.

But with the advent of scanning technology, the Internet, and improved indexing, this material can now be widely distributed. For example: The Digital Imaging and Media Technology Initiative at the University of Illinois (http://images.library.uiuc.edu/) was a joint project between the university and Follett Corp. to digitize historical maps of the state of Illinois. Though containing just 84 records, the collection shows North America, the Northwest Territory, the state of Illinois, and counties and townships within the state. Maps range in time from 1650 to 1994. A thumbnail image and a full indexing record are included for each image. The thumbnail image can be enlarged for better viewing. All the maps used in the project are from the Map and Geography Library and the Rare Book and Special Collections Library at the University of Illinois-Urbana-Champaign, surely not a place to which many people have access. However, thanks to digital technology, we all do.

It is one thing to digitize text and images and make them "public" if they sit as HTML pages that can be easily seen by standard search engines. It is another to determine a way to see content housed in databases and formed into pages only when retrieved. The larger issue is how you find the material if a "collection" might exist in multiple databases that are housed in physically separate facilities. One goal of the DLF is to develop a set of metadata that can be used across databases and facilities.

An implementation of this approach by the University of Michigan, called OAIster (http://oaister.umdl.umich. edu/o/oaister/), uses The Open Archives Initiative Protocol for Metadata Harvesting (www.openarchives.org/OAI/2.0/openarchivesprotocol.htm) to create a collection of "freely available, difficult-to-access, academically oriented digital resources." It currently searches more than one million records from 144 institutions.

One goal of the project was to be able to find items that are "hidden" from view. The project finds that: "Digital resources are often hidden from the public because a Web search using a search engine like Google or AltaVista won't be picking up information about these resources. Robots in use by such search services don't delve into the CGI [Common Gateway Interface] that ports this resource information to the Web. Consequently, these resources are generally accessible only to those who know to look in digital library repositories, generally at universities who are developing the collections in these repositories."

A New Publishing Model
Another issue facing the DLF is the authorship, peer review, and commercialization of content. Most academic content originates from the mind of a professor and g'es to a computer screen. It then moves to peer review among other professors and is finally published in an academic journal. As such, it is sold back to the academy in the form of an individual or institutional subscription. Academics are asking the question, "Who needs publishers if we can self publish and distribute ourselves?" The thought is, if the content starts out on computer screens owned by the colleges and it gets consumed on the computer screens owned by the colleges, what value d'es a "publisher" add?

One project that will provide some muscle for librarians as they deal with the publishing industry is the DSpace repository system developed by MIT and Hewlett-Packard Co. DSpace (www.dspace.org) is an open source software platform that enables institutions to capture and describe digital works using a submission workflow module. It also distributes an institution's digital works over the Internet through a search and retrieval system and preserves digital works over the long term.

According to MacKenzie Smith, project director of DSpace, the project will allow faculty members to pool findings and share everything, from articles, technical reports, conference papers, data sets, and databases, to media clips, visual aids, and simulations used in class. Moreover, she says the project will, "transform the way in which content is made available. MIT libraries seek to make significant progress in the development of scholarly communication and the scholarly record."

Already proven at MIT, DSpace has formed its own federation and is providing the open source software to others. As a critical mass of professors create and post content that can be easily retrieved by their colleagues, the academy will keep looking at the possibility of disintermediating the publisher from the process. The transformation of the archives is underway; the stacks will never look the same.

comments powered by Disqus