Weaving Semantics Into the Web

What a tangled web we weave, when success our first hyperlinks achieved. (My apologies to Sir Walter Scott.)

The ubiquity of the World Wide Web is changing the way we live, learn, and teach. For example, the other day, in what has become a household ritual before going out to the movies, my son logged on to a movie ticket Web site to order tickets for the show we planned to see that evening. Why? Besides a congenital intolerance for queuing up at theaters, it simply made life easier. That alone speaks volumes for the Web's penetration in our daily lives.

The fundamental nature of the Web underlies its powerful simplicity. Anything can be linked to anything else. We experience this essential quality every time we execute a search with our favorite search engine. For example, as of this writing, a search for links to J.R.R. Tolkien generated about 650,000 hits from Google in 0.05 seconds. (Even Google d'esn't seem to be sure when the numbers get this large!)

However, the links don't distinguish between a scribbled draft about the author and a published manuscript, between commercial spin-offs and academic scholarship, or among cultural references (Middle Earth's or any others), languages, or media about his works. That leaves the process of sifting through the returned morass to one's biological computer. With luck, something of interest will emerge in the first 20 or so hits returned, before the biological computer enters "the zone," that place where one's eyes glaze over with information overload and one begins to internally hyperlink to something else. ("Do I need to refill my coffee?")

So what's missing? We've invented machines to extend our muscles, remember our appointments, and convey our thoughts on paper and media. Now we need to apply the same machine leverage to the meaning of the links that we gleefully retrieve with every search. In short, we need to give our surrogates working on the Web the ability to comprehend what is meant by a particular connection.

Actually, work has been progressing on this topic for a number of years. Tim Berners-Lee, inventor of the Internet and most recently the recipient of the Japan Prize for 2002, has embarked on an extension of the Web called the Semantic Web. The goal of the project is to enable computers to share and process data as efficiently as people do. To do this, computers must have access to structured descriptions of information and inference rules that enable them to perform automated reasoning.

As it turns out, artificial intelligence researchers were working on systems like this to represent knowledge before the Web was a gleam in the eye of Berners-Lee. Such systems typically depend on centralized representations of meaning for overlapping concepts such as "parent" or "child of" in a genealogy application, for example.

Yet central control is the antithesis of the Web. Indeed, the Web has exploded even though people have claimed for years that without a well-organized central library of Web resources, no one would ever be sure of finding everything relevant to a search topic.

Any system for representing knowledge that is complex enough to be useful is also likely to encounter questions it cannot answer. Star Trek fans might recall the episode where Captain Kirk out-reasoned an intelligent computer that held him and his crew captive by providing it a simple riddle that was unanswerable. Developers of the Semantic Web aren't immune to the problem, but their solution is much more elegant: they simply accept that some problems will be unanswerable and move on. These exceptions must be handled gracefully, without smoke and melting integrated circuits.

Two technologies are being developed to provide logic to the Semantic Web. First, using Extensible Markup Language (XML), any Web author can create a set of descriptive tags to describe an object. But although XML can provide structure to Web information, what a particular author means by the tags must still be conveyed.

In the Semantic Web, meaning is conveyed by a Resource Description Framework (RDF). Using triples rather like the subject, object, and verb of sentence grammar, structured documents can make assertions about things (say "a person") having properties (such as "is an author of") with certain values (such as "a column in a magazine").Unfortunately, we now have RDFs that can be generated by anyone, often with overlapping definitions for tags that, in effect, represent the same thing.

Enter ontologies. Philosophers refer to ontology as the science of what is, often synonymously with metaphysics. It refers to a document or file that formally defines the relations among terms, usually a taxonomy with a set of inference rules. Now we have a translator among different RDFs. Things are starting to get interesting.

As the power of the Semantic Web unfolds, actors will emerge that process the inference rules and read the meaning of documents to provide the biological computer with intelligent responses to inquiries or commands. These actors, or software agents, will exchange information among themselves as they reason for us.

Marvin Minsky once remarked that in the future, we'll look back and think how strange it was that books we brought home to put in our personal libraries didn't talk to one another, exchanging information on their topics, their authors, and who they referenced. It makes sense, of course, that the new book would add itself to our personal catalog, modify our interest profile, and broaden our personal knowledge base. The work in the Semantic Web is a small step in that direction. At the very least, it should eliminate the infamous "Error 404: Not Found" message, replacing it, perhaps, with a question about what we really intended to find.

The Semantic Web

Berners-Lee, T., J. Hendler, and O. Lassila. "The Semantic Web." Scientific American, May 1, 2001.
http://www.sciam.com/article.cfm?articleID=00048144-10D2-1C70-84A9809EC588EF21&pageNumber=1&catID=2

"Berners-Lee wins Japan Prize for invention of World Wide Web." MIT News Office, December 17, 2001.
web.mit.edu/newsoffice/nr/2001/japanprize.html

Berners-Lee, T. "Semantic Web Road Map." World Wide Web Consortium, September 1998.
www.w3.org/DesignIssues/Semantic

Miller, E. "Semantic Web Activity Statement."
www.w3.org/2001/sw/Activity

World Wide Web Consortium.
www.w3.org/TR/REC-rdf-syntax

The Semantic Web Community:
www.w3.org/2001/sw/Activity


Digital Rights Management

INDECS (Interoperability of Data in E-Commerce Systems)
www.indecs.org

Semantic Web News

XML News
www.xmlnews.org

PRISM
www.prismstandard.org

RDF Site Summary
www.purl.org/rss/1.0

Vocabularies for Distributed Online Learning

The Dublin Core Metadata Initiative
www.dublincore.org

IMS Global Learning Consortium
imsproject.org

Featured