Carnegie Mellon: Put Lawyers To Work Automating Privacy Compliance in Big Data Systems

Turning lawyers into programmers may not seem like the most obvious way to ensure that big data systems comply with an organization's privacy policies. But that's one of the outcomes figured out by a team of researchers at Carnegie Mellon and Microsoft Research in a recent project.

The researchers undertook the challenge of developing a way to replace the tedious manual work of safeguarding user data in large Web services, such as Facebook, Google and Microsoft, with an automated system. The project specifically used Microsoft's search engine Bing as a test case.

The problem is a major one, said lead student researcher Shayak Sen, a Ph.D. candidate in computer science who interned at Microsoft Research India. "Tens of millions of lines of code are already in the pipeline," he noted. "And during our implementation on Bing, we found that more than 20 percent of the code was changing on a daily basis." Without automation, there's no way to keep up with the verification of compliance.

That's where the lawyers come in. The researchers found that those who develop privacy policies within an organization (often lawyers) don't typically speak the same language as the software developers. So the students developed a language — Legalease — simple enough to be used by non-programmers who understand the technicalities of the privacy policies. Legalease enforces syntactic restrictions to ensure that encoded policy clauses are structured similarly to policy text defining how user data is allowed to be handled.

In usability testing, 12 Microsoft employees were given a one-page document explaining Legalease and spent an average of under five minutes studying the directions. It took them an average of under 15 minutes to program nine Bing policy clauses laying out how user information could be used. "They were able to perform this task with a high degree of accuracy, which is encouraging," said Sen.

But the research didn't end there. As their report, "Bootstrapping Privacy Compliance in Big Data Systems," describes, what the researchers actually developed was a workflow for privacy compliance that targets large codebases written in languages that support the Map-Reduce programming model. That workflow uses Legalease, along with a self-bootstrapping data inventory mapper developed by Microsoft Research that ties low-level data types in the code to the high-level policy concepts. Grok, as it's called, was deployed by Bing a year before the research began for the purpose of automating policy compliance; but at that time the developers found writing policies for Grok too cumbersome.

"Legalease was the final piece of the automated privacy compliance jigsaw puzzle," said Anupam Datta, associate professor of computer science and electrical and computer engineering and co-author. "Legalease bridged privacy teams with Grok, and through Grok, with the developers."

Datta emphasized that automating the process of compliance checks could push the industry to adopt stronger privacy protection policies. "Sometimes, companies want to make their policies stronger, but hesitate because they are not sure they can ensure compliance in these large systems," he added.

The work was presented at the 35th IEEE Symposium on Security & Privacy, May 18-21, in San Jose, CA, where it won a Google award for the best student paper.

This work was supported, in part, by the Air Force Office of Scientific Research and the National Science Foundation.

About the Author

Dian Schaffhauser is a former senior contributing editor for 1105 Media's education publications THE Journal, Campus Technology and Spaces4Learning.

Featured

  • Abstract widescreen image with geometric shapes, flowing lines, and digital elements like graphs and data points in soft blue and white gradients.

    5 Trends to Watch in Higher Education for 2025

    In 2025, the trends shaping higher education reflect a continuous transformation of the higher education landscape to meet the changing needs of students and staff, while maintaining sustainable and cost-effective institutional practices.

  • glowing digital document floats above a laptop, surrounded by soft, flowing tech-inspired lines and geometric shapes in shades of blue and white

    Boston U Expands AllCampus Partnership with New Non-Credit Certificate Programs

    Boston University Metropolitan College's Center for Professional Education has expanded its relationship with online program management provider AllCampus. The agreement will extend support for BU's existing online Paralegal Studies Program and add new non-credit certificates in financial planning, professional fundraising, and genealogical studies.

  • a professional worker in business casual attire interacting with a large screen displaying a generative AI interface in a modern office

    Study: Generative AI Could Inhibit Critical Thinking

    A new study on how knowledge workers engage in critical thinking found that workers with higher confidence in generative AI technology tend to employ less critical thinking to AI-generated outputs than workers with higher confidence in personal skills.

  • computer screen displaying a landline phone being unplugged from a single cord, with a modern office desk, keyboard, and subtle lighting in the background

    Microsoft to Discontinue Skype Services

    Microsoft has announced that it is shutting down service for its Skype telecommunications and video calling services on May 5, 2025.