Carnegie Mellon: Put Lawyers To Work Automating Privacy Compliance in Big Data Systems

Turning lawyers into programmers may not seem like the most obvious way to ensure that big data systems comply with an organization's privacy policies. But that's one of the outcomes figured out by a team of researchers at Carnegie Mellon and Microsoft Research in a recent project.

The researchers undertook the challenge of developing a way to replace the tedious manual work of safeguarding user data in large Web services, such as Facebook, Google and Microsoft, with an automated system. The project specifically used Microsoft's search engine Bing as a test case.

The problem is a major one, said lead student researcher Shayak Sen, a Ph.D. candidate in computer science who interned at Microsoft Research India. "Tens of millions of lines of code are already in the pipeline," he noted. "And during our implementation on Bing, we found that more than 20 percent of the code was changing on a daily basis." Without automation, there's no way to keep up with the verification of compliance.

That's where the lawyers come in. The researchers found that those who develop privacy policies within an organization (often lawyers) don't typically speak the same language as the software developers. So the students developed a language — Legalease — simple enough to be used by non-programmers who understand the technicalities of the privacy policies. Legalease enforces syntactic restrictions to ensure that encoded policy clauses are structured similarly to policy text defining how user data is allowed to be handled.

In usability testing, 12 Microsoft employees were given a one-page document explaining Legalease and spent an average of under five minutes studying the directions. It took them an average of under 15 minutes to program nine Bing policy clauses laying out how user information could be used. "They were able to perform this task with a high degree of accuracy, which is encouraging," said Sen.

But the research didn't end there. As their report, "Bootstrapping Privacy Compliance in Big Data Systems," describes, what the researchers actually developed was a workflow for privacy compliance that targets large codebases written in languages that support the Map-Reduce programming model. That workflow uses Legalease, along with a self-bootstrapping data inventory mapper developed by Microsoft Research that ties low-level data types in the code to the high-level policy concepts. Grok, as it's called, was deployed by Bing a year before the research began for the purpose of automating policy compliance; but at that time the developers found writing policies for Grok too cumbersome.

"Legalease was the final piece of the automated privacy compliance jigsaw puzzle," said Anupam Datta, associate professor of computer science and electrical and computer engineering and co-author. "Legalease bridged privacy teams with Grok, and through Grok, with the developers."

Datta emphasized that automating the process of compliance checks could push the industry to adopt stronger privacy protection policies. "Sometimes, companies want to make their policies stronger, but hesitate because they are not sure they can ensure compliance in these large systems," he added.

The work was presented at the 35th IEEE Symposium on Security & Privacy, May 18-21, in San Jose, CA, where it won a Google award for the best student paper.

This work was supported, in part, by the Air Force Office of Scientific Research and the National Science Foundation.

About the Author

Dian Schaffhauser is a former senior contributing editor for 1105 Media's education publications THE Journal, Campus Technology and Spaces4Learning.

Featured

  • abstract interconnected AI neural networks merging into a single central hub

    OpenAI to Unify AI Models with GPT-5 Launch

    OpenAI has scrapped plans to release its o3 model, opting instead for a "simplified" product lineup centered on its upcoming GPT-5 product.

  • robot typing on a computer

    Microsoft Announces 'Computer Use' Automation in Copilot Studio

    Microsoft has introduced a new AI-powered feature called "computer use" for its Copilot Studio platform that allows agents to directly interact with Web sites and desktop applications using simulated mouse clicks, menu selections and text inputs.

  • college building with a central domed rotunda, arched windows, and columns, overlaid with glowing blue circuit patterns

    Kishwaukee College Moves to Ellucian Colleague SaaS

    Illinois's Kishwaukee College is modernizing its administrative systems with an Ellucian Colleague SaaS rollout that will bring AI-powered tools to human resources, finance, and student management.

  • From Fire TV to Signage Stick: University of Utah's Digital Signage Evolution

    Jake Sorensen, who oversees sponsorship and advertising and Student Media in Auxiliary Business Development at the University of Utah, has navigated the digital signage landscape for nearly 15 years. He was managing hundreds of devices on campus that were incompatible with digital signage requirements and needed a solution that was reliable and lowered labor costs. The Amazon Signage Stick, specifically engineered for digital signage applications, gave him the stability and design functionality the University of Utah needed, along with the assurance of long-term support.