Carnegie Mellon Stanford Researchers Devise Method To Share Password Data Safely -- Campus Technology

Breaking News

Carnegie Mellon Stanford Researchers Devise Method To Share Password Data Safely

By Dian Schaffhauser
02/23/16

An unfortunate reality for cybersecurity researchers is that real-world data for their research too often comes via a security breach. Now computer scientists have devised a way to let organizations share statistics about their users' passwords without putting those same customers at risk of being hacked.

The work at Carnegie Mellon University (CMU) and Stanford University, part of an emerging field on rigorous human authentication, persuaded Yahoo! to publicly share password frequency statistics for about 70 million of its users.

"This is the first time a major company has released frequency information on user passwords," said Anupam Datta, associate professor of computer science and electrical and computer engineering at CMU. "It's the kind of information that legitimate researchers can use to assess the impact of a security breach and to make informed decisions about password defenses. This is extremely valuable, so we hope other organizations will follow Yahoo's lead."

The researchers are presenting their method on Wednesday at the Network and Distributed System Security Symposium in San Diego. Their method distorts numbers in the dataset so the list is "differentially private," a precise mathematical definition that guarantees the released statistics don't reveal whether any specific individual's password is included in the dataset.

The information at issue isn't actual passwords or user IDs, but password frequency lists — the number of times passwords are selected by a group of users. In a simplified case involving 10 users, if eight users select "123456" as a password, and two users select "abc123," the frequency list would be (8,2).

Password frequency lists for large user groups can be analyzed to help organizations set authentication policies that balance security with usability, or to predict which user accounts are most vulnerable, said Jeremiah Blocki, a post-doctoral researcher at Microsoft Research who began this study while a post-doc at Carnegie Mellon.

But getting access to frequency lists is difficult because of the potential for misuse. Alone, frequency lists don't help hackers identify individual passwords, Blocki said, but they could potentially provide important clues if cross-referenced to other databases. For instance, in the earlier example, if an adversary knew the passwords for nine of 10 users, it would be child's play to figure out the 10th password knowing that the frequency was (8,2).

Most companies are reluctant to provide access to their frequency lists, so researchers make do with data that has been inadvertently released, such as the 32 million user accounts of the defunct RockYou social app site, which suffered a data breach in 2009.

Several years ago, Joseph Bonneau, a Stanford post-doctoral researcher and a technology fellow with the Electronic Frontier Foundation, obtained samples of password frequency from Yahoo. He was able to publish some aggregate statistics, but Yahoo wouldn't let him publicly share the raw data because of potential privacy concerns.

"Here was this data that was incredibly useful to people like me, but we couldn't get access to it," Blocki said.

So Blocki, Datta and Bonneau created a new algorithm to add just enough distortion to the frequency lists to make them useless to hackers, but still enable researchers to see the high-level patterns they seek in the data.

Their algorithm is based on a powerful differentially private tool called the exponential mechanism, which introduces minimal distortion but is not computationally efficient in general. By exploiting the inherent mathematical structure of a password frequency list, the researchers were able to develop a computationally efficient version of the exponential mechanism tailored to the lists.

"With our new approach, we can provide precise guarantees about privacy," Bonneau said. "I hope this convinces more organizations to share data publicly about passwords and potentially other data that might be useful for security."

Blocki said getting additional organizations to release password frequency lists would enable researchers to explore the impact of differing password policies. The method also might be extended to social networks — enabling the study of degree distribution lists that track the number of friends users have — and to more complicated data structures.

This research was supported by the National Science Foundation, the Air Force Office of Scientific Research, the Simons Institute for the Theory of Computing and the Open Technology Fund.

About the Author

Dian Schaffhauser is a former senior contributing editor for 1105 Media's education publications THE Journal, Campus Technology and Spaces4Learning.

E-Mail this page

Printable Format

Featured

UC Berkeley Announces Sky-T1-32B Open Source AI Model, Offering High Performance at a Fraction of the Cost

UC Berkeley researchers have introduced Sky-T1-32B, an open source, reasoning-focused language model that delivers high performance at an unprecedented cost of under $450 for training.
From the Kuali Days 2025 Conference: A CEO's View of Planning for AI

How can a company serving higher education navigate the changes AI brings to ed tech? What will customers expect? CT talks with Kuali CEO Joel Dehlin, who shared his company's AI strategies with attendees at Kuali Days 2025 in Anaheim.
AWS Updates AI Offerings with Amazon Nova Premier, Llama 4, Anonymous User Q Business Chatbots

Amazon Web Services (AWS) has made a number of AI moves to maintain its position alongside fellow cloud giants Microsoft and Google.
Report: 85% of Organizations Are Using Some Form of AI

Eighty-five percent of organizations today are leveraging some form of AI, according to the latest State of AI in the Cloud 2025 report from Wiz. While AI's role in innovation and disruption continues to expand, security vulnerabilities and governance challenges remain pressing concerns.