The Web

The Little Search Engine That Could

In response to creeping subscription fees, Rice University created a powerful campus search engine application all its own.

When Jeff Frey and Michael O’Connor were asked to pinpoint areas where Rice University’s (TX) IT department could cut costs, their minds immediately went to the Google Search Appliance (GSA) for which the Houston-based college was paying $40,000 in upgrade/support fees every two years. Charged with indexing documents and pages, allowing for Google-search-like retrieval, and customizing returned results, the “box,” as it’s known, served as the search engine for the university’s website.

“Google gave us the box a while ago at no charge, but then the upgrades and subscription fees increased over the years as we grew, and the university wound up paying a fair amount of money to use it,” explains Frey, manager of web services, a department which provides IT solutions to the university on a cost recovery basis. “Though the GSA had many great search features, we looked at the $40,000 tab, realized that we could build something for less than 25 percent of the cost that the university was paying for the GSA, and decided not to become committed to its purchase cycle.”

According to O’Connor, web application developer, Rice was also running up against page limits within its Google licensing agreement. As the school’s web presence grew, those limits resulted in a lower percentage of pages being indexed by the box.

“That was a real problem,” says O’Connor, whose team explored Rice’s existing search interfaces (including the integration of directory search, group directory, and web search) and found that the entire system was ripe for an overhaul. “We saw the opportunity to integrate each segment while implementing some new search modalities,” says O’Connor, “and to do it in a way that would make the system faster and easier to use.”

The DIY Approach

With the financial and technical cases in place for a proprietary search engine application, Rice’s IT team started working on the project during the summer of 2009. The first few months were spent creating mockups of the new interface, and then shopping the concept around to get buy-in from faculty members, staff, and other interested parties. Professors—particularly those who complained about lack of speed, cohesiveness across web and people searches, and usability of the previous search engine—were specially sought out for their input.

The results were very positively received, according to Frey. “Overall, pretty much everyone loved the new interface that we presented,” he says. “It also brought us a long way in terms of campus users being able to control search information, and people really took to that.”

Development of the search engine application itself required less time than the concept proof period. The philosophical cornerstone of the implementation: Any user should be able to get what he or she needs with one click of the mouse. Another design goal was to build a complete system that not only met Rice’s current needs, but could be scaled up and adapted to include new technologies and innovations in the future.

The IT team used an open source framework and freely available APIs—including Python, Django framework, jQuery libraries, and Yahoo BOSS—to develop its platform. “Our architecture was heavily oriented toward providing a layer that would also allow us to plug in new search interfaces,” explains Frey, whose team spent time figuring out how the search engine interface would work, factoring in the possibility that a search provider could change license terms or go down at any point, leaving Rice without a sufficient search tool.

After exploring many different options, the IT team decided that Bing was the best search-results provider at the moment. “It came down to search-result quality, and Bing consistently provided better results,” says Frey, who points out that Yahoo has recently transitioned to using Bing as its search-results provider (see “One Bing Now Rules Them All in US and Canada”; campustechnology.com/articles/2010/08/25/one-bing-now-rules-them-all-in-us-and-canada.aspx). Rice’s interface is search-provider agnostic and includes the option of seamlessly switching among providers as well as falling back to its original GSA, should the situation warrant it, or even using a completely different engine, such as Apache Solr. “If a new service comes out that’s cheap and easy to integrate, we’ll be able to plug it in and use it without much effort,” says Frey. “Google’s free Ajax API for public search is also very nice, and given the way our interface turned out, we may wind up revisiting it as an option in the future.”

Cutting Costs

Because Rice’s new in-house search engine application uses an established web- and database-hosting setup, it created no new direct costs for the college. Total cost for the project’s first phase, which rolled out on campus in the spring of 2010, was about $13,000 worth of time from the IT web services department’s designer, developer, and database administrator. Several customizations (including a mobile version of the site, which took about three hours to develop because of the modular nature of the code) have been completed since then, bringing Rice’s total cost to about $25,000—or just over one year’s worth of Google appliance subscription fees. Maintaining the system requires “about two or three hours of manpower—or a few hundred dollars—every month,” says Frey.

Anyone who accesses Rice’s website and keys search terms into the box situated on the upper right corner of each page uses the proprietary system. Input “psychology” into the search field, for example, and the interface presents programmatic recommended results that link to the different psychology departments, followed by a listing of other top psychology results provided by Bing. On the right-hand side of every page appears a group directory listing also used for the university’s printed phone book. Both the recommended result boxes and the group directory are items that departments are able to access and edit themselves.

Under “groups” is a people directory that contains an alphabetical listing of names and contact information—everyone from graduate students to professors to departmental staff—related to the search. Frey says the system also deciphers incomplete names, vanity e-mail addresses, and phone numbers to find the appropriate person or department, while its mobile application allows “cell phone and PDA users to experience an interface very similar to what you’d see on our website.”

Sharing the Wealth

By all accounts, the search engine application has met and exceeded expectations, yet IT gets requests for enhancements and customizations every day, so the development team still considers it a work in progress. O’Connor says the school may consider “packaging the technology up and licensing it to other universities for a fee or free/open source, depending on what it will take to polish the release for more generalized use,” but has yet to make any moves in that direction. “We’re talking about it, based on all of the interest we’ve seen from other institutions,” he confides.

To schools looking to emulate Rice’s success at developing an in-house search engine application, Frey’s best advice is to look closely at all of the alternatives available on the market, and only choose those that make sense for your school’s situation and that accommodate its future plans. “Google certainly dominates the market,” says Frey, “and for good reason, but there are many new options out there, and a lot more flexibility than there was five years ago.”

O’Connor adds that college IT departments that shy away from developing their own in-house applications for any system could be missing the boat, particularly when it comes to having truly customized technology options at a low cost. “Before you start shopping around for vendors,” says O’Connor, “it’s really worth the time to look at your options and resources, and figure out if the project isn’t something that you can’t do better yourself.”

Resources

Apache Solr: lucene.apache.org/solr

Bing: bing.com

Django: djangoproject.com

Google Ajax API: code.google.com/apis/ajaxsearch

Google Search Appliance: google.com/enterprise/search/gsa.html

jQuery: jquery.com

Python: python.org

Rice University search engine application: search.rice.edu

Yahoo BOSS: developer.yahoo.com/search/boss

comments powered by Disqus