Cornell Research Finds Route to Faster, Better Page Ranking

A technique developed by Cornell University researchers offers the promise of speeding up search rankings to real-time speeds. In a visual representation of a database, circles, referred to as "nodes," represent data items; lines, called "edges," represent the links between the nodes. The new mechanism, described in a paper presented at last summer's ACM SIGKDD Conference on Knowledge Discovery and Data Mining in Sydney, focuses on the most important edges.

When a user enters a term in a search engine, recommendations are often personalized based on what that user has recently been entering in search. For example, hunting in Google for a particular product one night could affect how results are displayed the next day for related search terms. "They have computed the rankings offline, based on your choice," said Wenlei Xie, primary author of the paper and a PhD candidate in the Department of Computer Science.

The research project is part of a larger initiative at Cornell to gain a greater understanding of data "that goes well beyond classical database queries," the university wrote in a $2.4 million National Science Foundation grant request. The overall goal is to create "the foundations" for a new kind of database, called the "causal database," where queries go beyond mining for "statistically significant patterns in data" to take causality and "explanations" into account in generating results.

Xie's work lays out the standard route for how a computer performs a search ranking: It "walks" around the database, frequently following a route set by nodes and edges that have been "weighted" based on how many times a particular site has been visited or a type of product has been viewed. Node-weighted algorithms tend to be a bit random. It's akin, the researcher explained, to ranking something in Twitter based on the topics of tweets rather than the relationship between two people.

The approach proposed by Xie's paper recommends "edge weighting." These are already used, but they tend to be slow. The paper offers a way to speed up edge-weighted ranking by reducing the granularity of the walking route and focusing on nods that have linked edges — by way of similar interests or strong connections. The overall goal is to develop "general, fast methods for edge weight personalization."

The researchers tested their approach with a database of scholarly publications and a blog search system. The results: The new method "outperformed existing methods by nearly five orders of magnitude."

As the paper reported, "This huge performance gain over previous work allows us — for the very first time — to solve learning-to-rank problems for edge weight personalization at interactive speeds, a goal that had not previously been achievable for this class of problems."

The "reduced model," as it's called, also speeds up "learn to rank" systems, where the computer learns user preferences by making note of which items in a list the user clicks on.

Further performance gains might be possible by downloading the reduced model onto the user's computer to perform the calculations there and to update the reduced model continuously as new data comes in.

The work, which also involved Cornell researchers David Bindel, Alan Demers and Johannes Gehrke, was supported by the National Science Foundation and the National Research Council of Norway.

The project has a GitHub site where it provides access to the script, data and results from its experimentation. The paper is available online there also.

About the Author

Dian Schaffhauser is a former senior contributing editor for 1105 Media's education publications THE Journal, Campus Technology and Spaces4Learning.

Featured

  • student reading a book with a brain, a protective hand, a computer monitor showing education icons, gears, and leaves

    4 Steps to Responsible AI Implementation

    Researchers at the University of Kansas Center for Innovation, Design & Digital Learning (CIDDL) have published a new framework for the responsible implementation of artificial intelligence at all levels of education.

  • glowing digital brain interacts with an open book, with stacks of books beside it

    Federal Court Rules AI Training with Copyrighted Books Fair Use

    A federal judge ruled this week that artificial intelligence company Anthropic did not violate copyright law when it used copyrighted books to train its Claude chatbot without author consent, but ordered the company to face trial on allegations it used pirated versions of the books.

  • server racks, a human head with a microchip, data pipes, cloud storage, and analytical symbols

    OpenAI, Oracle Expand AI Infrastructure Partnership

    OpenAI and Oracle have announced they will develop an additional 4.5 gigawatts of data center capacity, expanding their artificial intelligence infrastructure partnership as part of the Stargate Project, a joint venture among OpenAI, Oracle, and Japan's SoftBank Group that aims to deploy 10 gigawatts of computing capacity over four years.

  • laptop displaying a phishing email icon inside a browser window on the screen

    Phishing Campaign Targets ED Grant Portal

    Threat researchers at cybersecurity company BforeAI have identified a phishing campaign spoofing the U.S. Department of Education's G5 grant management portal.