Semantic Search: Could the Web Think? -- Campus Technology

Semantic Search: Could the Web Think?

By Trent Batson
07/16/08

Semantics is a sub-field of linguistics that focuses on meaning making in language. Therefore, the Semantic Web we're still reaching for will be based on a set of definitions, languages, and standards that can base a search on the detection of meaning and not just on a simple character string. The Semantic Web will at least be smarter than the current Web.

Results from a Current Web Search

On the current Web, I did a quick search posing as someone who knows nothing about gardening and is searching for reasons either to try, or not to try, a garden. So, I typed in "Should I grow a garden?" and received a lot of links about what I should grow in my garden. In other words, my responses were begging the question.

So, I tried "Why should I grow a garden?" For this query, I got links that ignored the "why" in my query. Again, I found tips about growing a garden as if I'd already decided to garden actually knew a fair amount about soil testing and my local extension agent. My gardening novice persona still had no answer.

Then I tried "Benefits of growing a garden." The results told me "You get fresh vegetables," or "You are doing good for the planet," and "You can find peace while hoeing your garden." Still not really the answers I was looking for.

Finally, I tried "Reasons not to garden." My responses: "Five reasons not to move to New Jersey," and "Reasons not to own a cat," and "Reasons not to hate winter." (These examples are all taken from real search results!)

This was going nowhere. I didn't know how to ask the right question (using advanced search only showed that my specific question is not answered anywhere on the Web).

Somehow I would have had to guess the particular terms and phrases someone might have used in a paper or posting that would help me decide if I wanted to put any effort into learning about gardening. Finding the meaning I needed would be a long process given the current state of search.

To Search Semantically, You Need Ontologized Content

Of course, it's not just the search engines that are failing my gardening novice persona. The Web now grabs anything that is posted to it, no matter how the resource is rhetorically organized. So, it's really no surprise that from this grab bag of millions of potential links I didn't find a single match to the meaning of my query. No matter how smart the search engine, it can't do a good job searching uncharacterized and undefined content.

One way to improve search, then, is to start organizing information better to begin with. If you were searching a library where books had just been dumped willy-nilly (as have resources on the Web), you'd also have difficulty finding the right printed information. One current approach to better organization up-front is through the Resource Definition Framework (RDF):

"The RDF metadata model is based upon the idea of making statements about Web resources in the form of subject-predicate-object expressions, called triples in RDF terminology. The subject denotes the resource, and the predicate denotes traits or aspects of the resource and expresses a relationship between the subject and the object. For example, one way to represent the notion "The sky has the color blue" in RDF is as the triple: a subject denoting "the sky", a predicate denoting "has the color", and an object denoting "blue"." --Wikipedia

In other words, we're using natural language (ordered in this case on the syntax of Western languages) as a model for our standard descriptors of Web information.

Now, we have a way to describe individual resources. These descriptions define the resources, so search can produce more relevant results. But what about resources that are related semantically but don't have the full set of descriptors in the search?

If we are basing our search to find a web of related resources, we need those resources to also have definitions about their relationship to other resources. We need a Web ontology. The Web Ontology Language has been produced by the WC3:

"The data described by an OWL ontology is interpreted as a set of "individuals" and a set of "property assertions" which relate these individuals to each other. An OWL ontology consists of a set of axioms which place constraints on sets of individuals (called "classes") and the types of relationships permitted between them. These axioms provide semantics by allowing systems to infer additional information based on the data explicitly provided." --Wikipedia

But, Does it Work?

Machines should now be capable of using a new Web language to talk to "individuals" who have properties. And, then these individuals (or the database in which they reside) will lead people to semantically related other individuals (sets of data). This improved search targets content that is more reliably relevant than current searches produce, and then the content is placed within a context of meaningfully related other content.

The question right now is when will enough organizations ontologize their resources so that a true semantic search will be possible? I used Hakia, http://www.hakia.com/, "a new semantic search engine," to do the same search about gardening with no better results than Google. (Google already includes some semantic elements in its algorithm, however, which probably made its results somewhat closer to Hakia's.)

There is hope, however, and maybe a hint of a trend. A large number of major corporations, and other large organizations, are in the process of semanticizing their Web holdings. See: http://www.w3.org/2004/01/sws-testimonial.

And stay tuned. Artificial intelligence research, on which the goal of a Semantic Web is based, always seems to take longer to produce results than we thought. The Semantic Web is not a reality yet. When it is a reality, will it be able to "think"? Not really, but I hope it can at least convince me that gardening is more work than it's worth.

About the Author

Trent Batson is the president and CEO of AAEEBL (http://www.aaeebl.org), serving on behalf of the global electronic portfolio community. He was a tenured English professor before moving to information technology administration in the mid-1980s. Batson has been among the leaders in the field of educational technology for 25 years, the last 10 as an electronic portfolio expert and leader. He has worked at 7 universities but is now full-time president and CEO of AAEEBL. Batson’s ePortfolio: http://trentbatsoneportfolio.wordpress.com/ E-mail: [email protected]