Where Did All the Books Go?
In
the year 2000, an ad for Qwest Communications had a very weary traveler arriving
at an out-of-the-way motel. He asks if they have movies in the rooms. A bored
motel clerk captures the dream of the modern communications industry while barely
lifting her eyes from the pages of a paperback book: "We have every movie ever
made, in every language, any time, night or day," she deadpans to a startled
guest.
Qwest is not there yet. But it is not going alone. Joining Qwest are members
of several digital federations who are working on an interrelated set of objectives:
creating, preserving, sharing, and providing access to digital content in a
cost-effective manner.
For librarians, the bandwidth issue is but one of the challenges. Members of
the Internet2 (www.Internet2.org)
project are working with librarians to address issues of data rates. Other problems
relate to ways of dealing with information that is licensed and not owned. The
librarians want to know that after a license expires there will continue to
be access to archives of previously available material.
It Began with Gutenberg
Many readers are familiar with some of the early projects designed to open access
to content via the World Wide Web. Notable is Project Gutenberg (PG), www.gutenberg.org,
begun by Michael Hart when he received a grant of computer time at the University
of Illinois in 1971. He decided that the greatest value created by computers
would not be computing, but would be the storage, retrieval, and searching of
what was stored in our libraries.
From the first document he typed in, the "Declaration of Independence," the
project has grown to over 6,267 full-text electronic books available for free
distribution. PG has been successful at attracting worthwhile content and a
loyal following. At the current rate of growth, it's been projected that PG's
10,000th book will be added sometime in 2003.
The Gutenberg Project has long attracted gifted programmers and visionaries
to its cause. In 1993, World Wide Web founder Tim Berners-Lee pondered how best
to get PG's ASCII wares out to the public. Others have written scripts to better
handle proofing issues, formatting, FTPing, etc. Recently, Charles Franks has
applied technology in a manner that completely redefines the editing process.
He has enlisted the help of a loyal army of volunteer proofreaders to help in
the effort—one page at a time. Dubbed "Distributed Proofreading in Support of
Project Gutenberg" (
http://texts01.archive.org/dp/),
this project expects to be a major contributor to Hart's effort to have more
than one million books available in the Project Gutenberg archive. They post
the daily upload count on their Web site and have recently exceeded 5,000 pages
proofread and uploaded in one day.
Hart has an ambitious goal. It may seem impossible, but with people like Franks
on his side, and a new means by which anybody can help create eBooks by producing
even one page per day, it could well happen in the PG founder's lifetime. Fortunately,
there are more than 50 mirror sites around the world that will be able to share
the load as added content will result in additional searches and downloads.
Sources:
"Distributed Proofing site g'es through the roof" by David Moynihan.
http://promo.net/pg/Proof_Article_Nov2002.html; originally posted at www.theinquirer.net/?article=6167 Nov. 11, 2002.
If You Build it, Will They Come?
With a growing number of sites placing e-content online, John Mark Ockerbloom,
a graduate student at Carnegie Mellon University, realized that a key bottleneck
to widespread use of the content was the lack of a central "card catalog" that
would allow people to find content. He conceived the Online Book Page (OBP),
http://onlinebooks.library.upenn.edu/
, as a gateway that would not merely be a place where people could get content,
but would also be a guide to the vast number of other sites that were providing
similar services.
Now housed at the University of Pennsylvania, the OBP includes links to more
than 18,000 books, more than 110 serials, and 300-plus archives of material.
The archives are large, general-purpose collections with substantial English-language
listings, topical collections, and foreign-language collections.
"We are now at the point where so much information is ‘born digital' that it
makes sense the libraries provide a way to easily archive and search this information,"
says Ockerbloom, now a digital architect at the University of Pennsylvania.
He recognizes that the issues are many and beyond the scope of any one library
to tackle by itself.
Beyond Books and Access
The work of projects, such as PG and the OBP, are only a part of the growing
world of the digital library movement. Leaders of the movement not only want
to digitize books and serials, but want to make sure information is preserved,
easily accessed, and assembled into topical collections across domains. However,
much work needs to be done if library systems are to talk with each other and
be able to meet the needs of teachers, scholars, and publishers. With goals
and challenges like these, it is not surprising that the movement has spread
beyond the early adopters to groups of libraries and colleges working collaboratively.
The need for a federated approach is understandable when one realizes that the
objective is sharing library holdings, as well as the costs of converting, preserving,
and providing access.
One of the groups leading the way to the "library of the future" is the Digital
Library Federation (DLF). The DLF is a consortium of libraries and agencies
that are pioneering the use of electronic information technologies to extend
their collections and services. Funded in large part by the government and foundations,
the DLF operates through its members to provide leadership for libraries by
identifying standards and best practices for digital collections and network
access. The organization also helps coordinate research and development in libraries'
use of information technology and helps start projects and services that libraries
need but cannot develop individually.
Members include the Library of Congress and the National Archives along with
25 university libraries across the country. Other members are the New York Public
Library and the California Digital Library. The goal of the DLF is to transform
library services and the way libraries handle their patrons, their content providers,
and each other.
As the DLF notes on its Web site (http://www.diglib.org/collections.htm):
"In the digital library, collections are transformed through the integration
of new formats, licensed (as opposed to owned) content, and third-party information
over which the library has little or no direct curatorial control. Collection
strategies and practices are not yet fully developed to take account of these
changing circumstances, nor are their legal, organizational, and business implications
fully understood."
As an academic federation, the DLF promotes strategies for developing sustainable,
scaleable digital collections and encourages the development of new collections
and collection services. Although it is fairly young, the federation and its
members have already produced a number of Digital Library architectures, systems,
and tools that are being used by its members and affiliates. The synergy is
already producing enhancements to digital collections of members that can be
used by the public.
Making Rare Books Accessible
One of the major goals of the DLF is to make available the rare book and artifact
material that can only be accessed under restricted-use conditions. One has
only to think of the rare book room in most university libraries to remember
the books held in climate-controlled conditions that had to be read while wearing
gloves.
But with the advent of scanning technology, the Internet, and improved indexing,
this material can now be widely distributed. For example: The Digital Imaging
and Media Technology Initiative at the University of Illinois (http://images.library.uiuc.edu/)
was a joint project between the university and Follett Corp. to digitize historical
maps of the state of Illinois. Though containing just 84 records, the collection
shows North America, the Northwest Territory, the state of Illinois, and counties
and townships within the state. Maps range in time from 1650 to 1994. A thumbnail
image and a full indexing record are included for each image. The thumbnail
image can be enlarged for better viewing. All the maps used in the project are
from the Map and Geography Library and the Rare Book and Special Collections
Library at the University of Illinois-Urbana-Champaign, surely not a place to
which many people have access. However, thanks to digital technology, we all
do.
It is one thing to digitize text and images and make them "public" if they
sit as HTML pages that can be easily seen by standard search engines. It is
another to determine a way to see content housed in databases and formed into
pages only when retrieved. The larger issue is how you find the material if
a "collection" might exist in multiple databases that are housed in physically
separate facilities. One goal of the DLF is to develop a set of metadata that
can be used across databases and facilities.
An implementation of this approach by the University of Michigan, called OAIster
(http://oaister.umdl.umich.
edu/o/oaister/), uses The Open Archives Initiative Protocol for Metadata
Harvesting (www.openarchives.org/OAI/2.0/openarchivesprotocol.htm)
to create a collection of "freely available, difficult-to-access, academically
oriented digital resources." It currently searches more than one million records
from 144 institutions.
One goal of the project was to be able to find items that are "hidden" from
view. The project finds that: "Digital resources are often hidden from the public
because a Web search using a search engine like Google or AltaVista won't be
picking up information about these resources. Robots in use by such search services
don't delve into the CGI [Common Gateway Interface] that ports this resource
information to the Web. Consequently, these resources are generally accessible
only to those who know to look in digital library repositories, generally at
universities who are developing the collections in these repositories."
A New Publishing Model
Another issue facing the DLF is the authorship, peer review, and commercialization
of content. Most academic content originates from the mind of a professor and
g'es to a computer screen. It then moves to peer review among other professors
and is finally published in an academic journal. As such, it is sold back to
the academy in the form of an individual or institutional subscription. Academics
are asking the question, "Who needs publishers if we can self publish and distribute
ourselves?" The thought is, if the content starts out on computer screens owned
by the colleges and it gets consumed on the computer screens owned by the colleges,
what value d'es a "publisher" add?
One project that will provide some muscle for librarians as they deal with
the publishing industry is the DSpace repository system developed by MIT and
Hewlett-Packard Co. DSpace (www.dspace.org)
is an open source software platform that enables institutions to capture and
describe digital works using a submission workflow module. It also distributes
an institution's digital works over the Internet through a search and retrieval
system and preserves digital works over the long term.
According to MacKenzie Smith, project director of DSpace, the project will
allow faculty members to pool findings and share everything, from articles,
technical reports, conference papers, data sets, and databases, to media clips,
visual aids, and simulations used in class. Moreover, she says the project will,
"transform the way in which content is made available. MIT libraries seek to
make significant progress in the development of scholarly communication and
the scholarly record."
Already proven at MIT, DSpace has formed its own federation and is providing
the open source software to others. As a critical mass of professors create
and post content that can be easily retrieved by their colleagues, the academy
will keep looking at the possibility of disintermediating the publisher from
the process. The transformation of the archives is underway; the stacks will
never look the same.