Carnegie Mellon: Put Lawyers To Work Automating Privacy Compliance in Big Data Systems

Turning lawyers into programmers may not seem like the most obvious way to ensure that big data systems comply with an organization's privacy policies. But that's one of the outcomes figured out by a team of researchers at Carnegie Mellon and Microsoft Research in a recent project.

The researchers undertook the challenge of developing a way to replace the tedious manual work of safeguarding user data in large Web services, such as Facebook, Google and Microsoft, with an automated system. The project specifically used Microsoft's search engine Bing as a test case.

The problem is a major one, said lead student researcher Shayak Sen, a Ph.D. candidate in computer science who interned at Microsoft Research India. "Tens of millions of lines of code are already in the pipeline," he noted. "And during our implementation on Bing, we found that more than 20 percent of the code was changing on a daily basis." Without automation, there's no way to keep up with the verification of compliance.

That's where the lawyers come in. The researchers found that those who develop privacy policies within an organization (often lawyers) don't typically speak the same language as the software developers. So the students developed a language — Legalease — simple enough to be used by non-programmers who understand the technicalities of the privacy policies. Legalease enforces syntactic restrictions to ensure that encoded policy clauses are structured similarly to policy text defining how user data is allowed to be handled.

In usability testing, 12 Microsoft employees were given a one-page document explaining Legalease and spent an average of under five minutes studying the directions. It took them an average of under 15 minutes to program nine Bing policy clauses laying out how user information could be used. "They were able to perform this task with a high degree of accuracy, which is encouraging," said Sen.

But the research didn't end there. As their report, "Bootstrapping Privacy Compliance in Big Data Systems," describes, what the researchers actually developed was a workflow for privacy compliance that targets large codebases written in languages that support the Map-Reduce programming model. That workflow uses Legalease, along with a self-bootstrapping data inventory mapper developed by Microsoft Research that ties low-level data types in the code to the high-level policy concepts. Grok, as it's called, was deployed by Bing a year before the research began for the purpose of automating policy compliance; but at that time the developers found writing policies for Grok too cumbersome.

"Legalease was the final piece of the automated privacy compliance jigsaw puzzle," said Anupam Datta, associate professor of computer science and electrical and computer engineering and co-author. "Legalease bridged privacy teams with Grok, and through Grok, with the developers."

Datta emphasized that automating the process of compliance checks could push the industry to adopt stronger privacy protection policies. "Sometimes, companies want to make their policies stronger, but hesitate because they are not sure they can ensure compliance in these large systems," he added.

The work was presented at the 35th IEEE Symposium on Security & Privacy, May 18-21, in San Jose, CA, where it won a Google award for the best student paper.

This work was supported, in part, by the Air Force Office of Scientific Research and the National Science Foundation.

About the Author

Dian Schaffhauser is a former senior contributing editor for 1105 Media's education publications THE Journal, Campus Technology and Spaces4Learning.

Featured

  • From Fire TV to Signage Stick: University of Utah's Digital Signage Evolution

    Jake Sorensen, who oversees sponsorship and advertising and Student Media in Auxiliary Business Development at the University of Utah, has navigated the digital signage landscape for nearly 15 years. He was managing hundreds of devices on campus that were incompatible with digital signage requirements and needed a solution that was reliable and lowered labor costs. The Amazon Signage Stick, specifically engineered for digital signage applications, gave him the stability and design functionality the University of Utah needed, along with the assurance of long-term support.

  • digital network with glowing blue and red lines, featuring multiple red arrows shifting in different directions

    Report: Attackers Change Tactics as Ransomware Payoffs Decline

    Attackers are changing tactics as they collect less money from ransomware payoffs, according to a new report from Chainalysis, a blockchain analytics firm.

  • SXSW EDU

    Explore the Future of AI in Higher Ed at SXSW EDU 2025

    This March 3-6 in Austin, TX, the SXSW EDU Conference & Festival celebrates its 15th year of exploring education's most critical issues and providing a forum for creativity, innovation, and expression.

  • business leader standing confidently amid interconnected gears

    Leading Through Complexity: How Online Leaders Can Drive Digital Institutional Transformation

    Leaders charged with developing and expanding online programs at their institutions are finding themselves in increasingly complex roles, but there are a few core steps institutional leaders can take to ensure success.