Cornell Research Finds Route to Faster, Better Page Ranking
        
        
        
			- By Dian Schaffhauser
 - 02/08/16
 
		
        A technique developed by Cornell  University researchers offers the promise of speeding up search rankings to  real-time speeds. In a visual representation of a database, circles, referred  to as "nodes," represent data items; lines, called "edges,"  represent the links between the nodes. The new mechanism, described in a paper presented at last  summer's ACM SIGKDD Conference on  Knowledge Discovery and Data Mining in Sydney, focuses on the most important  edges.
When a user enters a term in a search engine,  recommendations are often personalized based on what that user has recently  been entering in search. For example, hunting in Google for a particular  product one night could affect how results are displayed the next day for  related search terms. "They have computed the rankings offline, based on  your choice," said Wenlei Xie, primary author of the paper and a PhD  candidate in the Department of Computer  Science.
The research project is part of a larger initiative at  Cornell to gain a greater understanding of data "that goes well beyond  classical database queries," the university wrote in a $2.4 million National Science  Foundation grant request. The overall goal is to create "the  foundations" for a new kind of database, called the "causal  database," where queries go beyond mining for "statistically  significant patterns in data" to take causality and "explanations"  into account in generating results.
Xie's work lays out the standard route for how a computer  performs a search ranking: It "walks" around the database, frequently  following a route set by nodes and edges that have been "weighted"  based on how many times a particular site has been visited or a type of product  has been viewed. Node-weighted algorithms tend to be a bit random. It's akin,  the researcher explained, to ranking something in Twitter based on the topics  of tweets rather than the relationship between two people.
The approach proposed by Xie's paper recommends "edge  weighting." These are already used, but they tend to be slow. The paper  offers a way to speed up edge-weighted ranking by reducing the granularity of  the walking route and focusing on nods that have linked edges — by way of  similar interests or strong connections. The overall goal is to develop  "general, fast methods for edge weight personalization."
The researchers tested their approach with a database of  scholarly publications and a blog search system. The results: The new method  "outperformed existing methods by nearly five orders of magnitude."
As the paper reported, "This huge performance gain over  previous work allows us — for the very first time — to solve learning-to-rank  problems for edge weight personalization at interactive speeds, a goal that had  not previously been achievable for this class of problems."
The "reduced model," as it's called, also speeds  up "learn to rank" systems, where the computer learns user  preferences by making note of which items in a list the user clicks on.
Further performance gains might be possible by downloading  the reduced model onto the user's computer to perform the calculations there  and to update the reduced model continuously as new data comes in.
The work, which also involved Cornell researchers David  Bindel, Alan Demers and Johannes Gehrke, was supported by the National Science Foundation and the National  Research Council of Norway.
The project has a GitHub site where it provides  access to the script, data and results from its experimentation. The paper is available online there also.
        
        
        
        
        
        
        
        
        
        
        
        
            
        
        
                
                    About the Author
                    
                
                    
                    Dian Schaffhauser is a former senior contributing editor for 1105 Media's education publications THE Journal, Campus Technology and Spaces4Learning.