Universities Working to Make Library Metadata Searchable on the Web

With a $4 million Mellon grant, Stanford Libraries is leading the shift to a "linked data" metadata environment.

keyboard with cord leading to books on a shelf

Since the 1960s, academic libraries have been using their own standards for the communication of metadata about resources in their catalogs. Originally designed for magnetic tape-based computers, machine-readable cataloging (MARC) standards are only understood by library systems. Failure to speak the language of the web has isolated libraries from the broader world of information developing there.

Determined to take advantage of the semantic web, Stanford Libraries is working with the libraries of Cornell, Harvard and the University of Iowa to continue the development of a "linked data" metadata environment.

Only libraries can understand what any of the MARC encoding means, explained Philip Schreur, associate university librarian for technical services at Stanford Libraries. "When a company like Google gets that data, it just sees an incomprehensible mass of free text. We are trying to shift to linked data so we can use well-articulated identifiers for things like people, subjects or dates. Then when people search for something on the web, they can actually identify what all those bits of data are and make the results much cleaner."

Libraries have been working on this effort for several years. "We have reached the point where we think we can now make the shift toward this new way of encoding the data," Schreur said.

With a $4 million grantfrom theAndrew W. Mellon Foundation, Stanford is also collaborating with the Program for Cooperative Cataloging (PCC) and the Library of Congress to expand the number of libraries implementing linked data. (PCC is a membership organization of U.S. libraries set up to develop cataloging procedures that libraries will abide by. It provides the community with a forum for the development of policy and training programs for member libraries.)

Stanford is developing a cloud-based sandbox environment that will allow the community to access, adopt and implement linked data. "We expect to have that sandbox up and running by Jan. 1 of next year," Schreur said.

The grant team also will focus on the creation of open source tools and policies to be adopted across the academic library community for transitioning to and implementing linked data.

The transition from MARC to linked data has been a struggle, Schreur added. "All of our very expensive internal systems make use of the MARC system. We buy a lot of data from vendors and they supply it to us in that format. So although it makes the data understandable on the web, the shift toward linked data is a very expensive shift to have to make. Many people and vendors are reluctant to do it just because of the expense involved."

There are many policy decisions to be worked through in the transition. "In the environment we have now, the data is contained and you can stamp it with an award of quality and everyone knows what it means," Schreur explained. "But if we are sharing data in an open environment, how do we assure that same level of quality and consistency to people who want to use the data?"

Among the advantages of linked data, Schreur said, will be access to many more international resources. "There is a lot of data created by libraries such as the national libraries of Germany and France, that once we make this shift, we will be able to make available. It really expands what resources the library can present to people."

About the Author

David Raths is a Philadelphia-based freelance writer focused on information technology. He writes regularly for several IT publications, including Healthcare Innovation and Government Technology.

Featured