Weaving Semantics Into the Web
What a tangled web we weave, when success our first hyperlinks achieved. (My
apologies to Sir Walter Scott.)
The ubiquity of the World Wide Web is changing the way we live, learn, and
teach. For example, the other day, in what has become a household ritual before
going out to the movies, my son logged on to a movie ticket Web site to order
tickets for the show we planned to see that evening. Why? Besides a congenital
intolerance for queuing up at theaters, it simply made life easier. That alone
speaks volumes for the Web's penetration in our daily lives.
The fundamental nature of the Web underlies its powerful simplicity. Anything
can be linked to anything else. We experience this essential quality every time
we execute a search with our favorite search engine. For example, as of this
writing, a search for links to J.R.R. Tolkien generated about 650,000 hits from
Google in 0.05 seconds. (Even Google d'esn't seem to be sure when the numbers
get this large!)
However, the links don't distinguish between a scribbled draft about the
author and a published manuscript, between commercial spin-offs and academic
scholarship, or among cultural references (Middle Earth's or any others),
languages, or media about his works. That leaves the process of sifting through
the returned morass to one's biological computer. With luck, something
of interest will emerge in the first 20 or so hits returned, before the biological
computer enters "the zone," that place where one's eyes glaze
over with information overload and one begins to internally hyperlink to something
else. ("Do I need to refill my coffee?")
So what's missing? We've invented machines to extend our muscles,
remember our appointments, and convey our thoughts on paper and media. Now we
need to apply the same machine leverage to the meaning of the links that we
gleefully retrieve with every search. In short, we need to give our surrogates
working on the Web the ability to comprehend what is meant by a particular connection.
Actually, work has been progressing on this topic for a number of years. Tim
Berners-Lee, inventor of the Internet and most recently the recipient of the
Japan Prize for 2002, has embarked on an extension of the Web called the Semantic
Web. The goal of the project is to enable computers to share and process data
as efficiently as people do. To do this, computers must have access to structured
descriptions of information and inference rules that enable them to perform
As it turns out, artificial intelligence researchers were working on systems
like this to represent knowledge before the Web was a gleam in the eye of Berners-Lee.
Such systems typically depend on centralized representations of meaning for
overlapping concepts such as "parent" or "child of" in a
genealogy application, for example.
Yet central control is the antithesis of the Web. Indeed, the Web has exploded
even though people have claimed for years that without a well-organized central
library of Web resources, no one would ever be sure of finding everything relevant
to a search topic.
Any system for representing knowledge that is complex enough to be useful is
also likely to encounter questions it cannot answer.
Star Trek fans might recall
the episode where Captain Kirk out-reasoned an intelligent computer that held
him and his crew captive by providing it a simple riddle that was unanswerable.
Developers of the Semantic Web aren't immune to the problem, but their
solution is much more elegant: they simply accept that some problems will be
unanswerable and move on. These exceptions must be handled gracefully, without
smoke and melting integrated circuits.
Two technologies are being developed to provide logic to the Semantic Web.
First, using Extensible Markup Language (XML), any Web author can create a set
of descriptive tags to describe an object. But although XML can provide structure
to Web information, what a particular author means by the tags must still be
In the Semantic Web, meaning is conveyed by a Resource Description Framework
(RDF). Using triples rather like the subject, object, and verb of sentence grammar,
structured documents can make assertions about things (say "a person")
having properties (such as "is an author of") with certain values
(such as "a column in a magazine").Unfortunately, we now have RDFs
that can be generated by anyone, often with overlapping definitions for tags
that, in effect, represent the same thing.
Enter ontologies. Philosophers refer to ontology as the science of what is,
often synonymously with metaphysics. It refers to a document or file that formally
defines the relations among terms, usually a taxonomy with a set of inference
rules. Now we have a translator among different RDFs. Things are starting to
As the power of the Semantic Web unfolds, actors will emerge that process the
inference rules and read the meaning of documents to provide the biological
computer with intelligent responses to inquiries or commands. These actors,
or software agents, will exchange information among themselves as they reason
Marvin Minsky once remarked that in the future, we'll look back and think
how strange it was that books we brought home to put in our personal libraries
didn't talk to one another, exchanging information on their topics, their
authors, and who they referenced. It makes sense, of course, that the new book
would add itself to our personal catalog, modify our interest profile, and broaden
our personal knowledge base. The work in the Semantic Web is a small step in
that direction. At the very least, it should eliminate the infamous "Error
404: Not Found" message, replacing it, perhaps, with a question about what
we really intended to find.
The Semantic Web
J. Hendler, and O. Lassila. "The Semantic Web." Scientific
American, May 1, 2001.
wins Japan Prize for invention of World Wide Web." MIT News Office,
December 17, 2001.
"Semantic Web Road Map." World Wide Web Consortium, September
"Semantic Web Activity Statement."
World Wide Web
The Semantic Web