Carnegie Mellon Algorithm Uncovers Online Social Site Fraudsters -- Campus Technology

Share this Page

Social Media

Carnegie Mellon Algorithm Uncovers Online Social Site Fraudsters

By Dian Schaffhauser
09/14/16

A research team at Carnegie Mellon University has developed open source software that will allow social media sites to identify fraudulent accounts, reviews and followers. In an experiment using Twitter data, Fraudar, as it's called, successfully detected numerous fake accounts with tweets showing they used "follower-buying" services. These accounts had gone undetected by the social networking service for the seven years since the data was originally collected.

The basic idea of Fraudar is to "see through camouflage" set up to make fake traffickers look legitimate, according to Christos Faloutsos, a professor of machine learning and computer science and principal investigator for the project.

Faloutsos has long been researching how to identify fraudulent online activity, particularly involving online reviews. "They influence our decisions over an extremely wide spectrum of daily and professional activities: e.g., where to eat, where to stay, which products to purchase, which doctors to see, which books to read, which universities to attend and so on. However, the credibility and trustworthiness of online reviews are at stake," explained an abstract submitted to the National Science Foundation, which has funded the work.

A previous initiative led to development of "NetProbe," a "fast and scalable system" to perform fraud detection in online auction websites such as eBay.

As explained in a recent article on the university website, Fraudar works by identifying a "bipartite core." A bipartite graph is a way of diagramming paired sets of data wherein no node from one set is connected to any other node in the same set; the connections only go from a node in one set to a node in the other set.

In the case of the fraud detection, each node represents a user, and the transactions between the users are shown as lines or "edges." The bipartite core are groups of users who have transactions with members of a second group but no transactions with each other. The existence of the core "suggests a group of fraudsters, whose only purpose is to inflate the reputations of others by following them, by having fake interactions with them or by posting flattering or unflattering reviews of products and businesses," as the article noted. They try to look normal by linking their fraudulent accounts to "popular sites or celebrities," or they exploit "legitimate user accounts they have hijacked."

Fraudar cuts away at the camouflage by first identifying and eliminating the legitimate accounts — those that follow random people or post only occasional reviews or show other signs of normal activity. What's left more readily exposes the bipartite cores.

To test the Fraudar algorithm, the research team turned to a Twitter database extracted in 2009 for research. The technology identified 4,000 accounts that appeared "highly suspicious."

Then the team randomly chose 125 followers and 125 "followees" from the suspicious group as well as two control groups of 100 users who hadn't been identified by the algorithm. Each user's account was examined for links associated with malware or scams or clear "bot-like behavior," such as replying to large numbers of tweets with identical messages. The researchers found that 57 percent of the followers and 40 percent of the followees in the suspicious group were labeled as fraudulent, compared to 12 percent and 25 percent in the control groups.

"We're not identifying anything criminal here, but these sorts of frauds can undermine people's faith in online reviews and behaviors," Faloutsos said. He added that social media platforms do their best to "flush out such fakery." However, the highly scaled approach offered by Fraudar could be useful, he added, in keeping up with the latest practices of fraudsters. "We hope that by making this code available as open source, social media platforms can put it to good use."

The algorithm is available at andrew.cmu.edu. The paper that describes the work, "Fraudar: Bounding Graph Fraud in the Face of Camouflage," is available on the Carnegie Mellon website.

About the Author

Dian Schaffhauser is a former senior contributing editor for 1105 Media's education publications THE Journal, Campus Technology and Spaces4Learning.