Carnegie Mellon: Signals from Twitter Mimic Traditional Public Opinion Polls

Twitter could become the new mechanism for pollsters to gauge public opinion as natural language processing improves, according to research by Carnegie Mellon. A team of people from the School of Computer Science used simple text analysis on a billion microblog messages posted to Twitter during 2008 and 2009--posts averaging 11 words long--to identify messages about the economy or politics and then to find words within the text that indicated positive or negative sentiments.

Computer analysis of sentiments showed that they were fairly similar to those of well established public opinion polls, such as Consumer Sentiment (ICS) from Reuters/University of Michigan Surveys of Consumers, Pollster.com, and the Gallup Organization's Economic Confidence Index.

"The findings suggest that analyzing the text found in streams of tweets could become a cheap, rapid means of gauging public opinion on at least some subjects," the university reported in a statement.

The measurement of opinions derived from Twitter was much more volatile day to day than the polling data. But when the researchers "smoothed" the results by averaging them over a period of days, the results often correlated closely with the polling data, said Brendan O'Connor, a graduate student in Carnegie Mellon's Language Technologies Institute and one of the authors of the study. As an example, consumer confidence followed the same general slide through 2008 and the same rebound in February/March of 2009 as was seen in the poll data. The researchers said the ICS and Gallup data had a correlation of 86 percent over the period; the Twitter-generated sensibilities had between 72 percent and 79 percent correlation with the Gallup data, depending on the number of days averaged to smooth the data.

"With 7 million or more messages being tweeted each day, this data stream potentially allows us to take the temperature of the population very quickly," said Noah Smith, assistant professor of language technologies and machine learning in the School of Computer Science. "The results are noisy, as are the results of polls. Opinion pollsters have learned to compensate for these distortions, while we're still trying to identify and understand the noise in our data. Given that, I'm excited that we get any signal at all from social media that correlates with the polls."

"The Web is so mainstream now that there's no question that the Web is representative somehow of the population," O'Connor said. But pinning down Web demographics is still difficult, he noted, pointing out that Twitter traffic alone increased by a factor of 50 during the two-year span of the study.

Improved natural language processing tools, as well as query-driven analysis and use of demographic and time stamp data available on some social media sites, could increase the sophistication and reliability of microblog analysis, the researchers reported.

A paper on the topic, "From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series," will be presented at the International Conference on Weblogs and Social Media in Washington, DC in late May.

About the Author

Dian Schaffhauser is a former senior contributing editor for 1105 Media's education publications THE Journal, Campus Technology and Spaces4Learning.

Featured