Databases

Cornell Research Finds Route to Faster, Better Page Ranking

A technique developed by Cornell University researchers offers the promise of speeding up search rankings to real-time speeds. In a visual representation of a database, circles, referred to as "nodes," represent data items; lines, called "edges," represent the links between the nodes. The new mechanism, described in a paper presented at last summer's ACM SIGKDD Conference on Knowledge Discovery and Data Mining in Sydney, focuses on the most important edges.

When a user enters a term in a search engine, recommendations are often personalized based on what that user has recently been entering in search. For example, hunting in Google for a particular product one night could affect how results are displayed the next day for related search terms. "They have computed the rankings offline, based on your choice," said Wenlei Xie, primary author of the paper and a PhD candidate in the Department of Computer Science.

The research project is part of a larger initiative at Cornell to gain a greater understanding of data "that goes well beyond classical database queries," the university wrote in a $2.4 million National Science Foundation grant request. The overall goal is to create "the foundations" for a new kind of database, called the "causal database," where queries go beyond mining for "statistically significant patterns in data" to take causality and "explanations" into account in generating results.

Xie's work lays out the standard route for how a computer performs a search ranking: It "walks" around the database, frequently following a route set by nodes and edges that have been "weighted" based on how many times a particular site has been visited or a type of product has been viewed. Node-weighted algorithms tend to be a bit random. It's akin, the researcher explained, to ranking something in Twitter based on the topics of tweets rather than the relationship between two people.

The approach proposed by Xie's paper recommends "edge weighting." These are already used, but they tend to be slow. The paper offers a way to speed up edge-weighted ranking by reducing the granularity of the walking route and focusing on nods that have linked edges — by way of similar interests or strong connections. The overall goal is to develop "general, fast methods for edge weight personalization."

The researchers tested their approach with a database of scholarly publications and a blog search system. The results: The new method "outperformed existing methods by nearly five orders of magnitude."

As the paper reported, "This huge performance gain over previous work allows us — for the very first time — to solve learning-to-rank problems for edge weight personalization at interactive speeds, a goal that had not previously been achievable for this class of problems."

The "reduced model," as it's called, also speeds up "learn to rank" systems, where the computer learns user preferences by making note of which items in a list the user clicks on.

Further performance gains might be possible by downloading the reduced model onto the user's computer to perform the calculations there and to update the reduced model continuously as new data comes in.

The work, which also involved Cornell researchers David Bindel, Alan Demers and Johannes Gehrke, was supported by the National Science Foundation and the National Research Council of Norway.

The project has a GitHub site where it provides access to the script, data and results from its experimentation. The paper is available online there also.

About the Author

Dian Schaffhauser is a senior contributing editor for 1105 Media's education publications THE Journal and Campus Technology. She can be reached at dian@dischaffhauser.com or on Twitter @schaffhauser.

comments powered by Disqus

Campus Technology News

Sign up for our newsletter.

I agree to this site's Privacy Policy.