How will the GDPR impact machine learning?

By Andrew Burt (Source @ O'Reilly),

Much has been made about the potential impact of the EU’s General Data Protection Regulation (GDPR) on data science programs. But there’s perhaps no more important—or uncertain—question than how the regulation will impact machine learning (ML), in particular. Given the recent advancements in ML, and given increasing investments in the field by global organizations, ML is fast becoming the future of enterprise data science.

This article aims to demystify this intersection between ML and the GDPR, focusing on the three biggest questions I’ve received at Immuta about maintaining GDPR-compliant data science and R&D programs. Granted, with an enforcement data of May 25, the GDPR has yet to come into full effect, and a good deal of what we do know about how it will be enforced is either vague or evolving (or both!). But key questions and key challenges have already started to emerge.

1. Does the GDPR prohibit machine learning?

The short answer to this question is that, in practice, ML will not be prohibited in the EU after the GDPR goes into effect. It will, however, involve a significant compliance burden, which I’ll address shortly.

Technically, and misleadingly, however, the answer to this question actually appears to be yes, at least at first blush. The GDPR, as a matter of law, does contain a blanket prohibition on the use of automated decision-making, so long as that decision-making occurs without human intervention and produces significant effects on data subjects. Importantly, the GDPR itself applies to all uses of EU data that could potentially identify a data subject—which, in any data science program using large volumes of data, means that the GDPR will apply to almost all activities (as study after study has illustrated the ability to identify individuals given enough data).

When the GDPR uses the term “automated decision-making,” the regulation is referring to any model that makes a decision without a human being involved in the decision directly. This could include anything from the automated “profiling” of a data subject, like bucketing them into specific groups such as “potential customer” or “40-50 year old males,” to determining whether a loan applicant is directly eligible for a loan.

As a result, one of the first major distinctions the GDPR makes about ML models is whether they are being deployed autonomously, without a human directly in the decision-making loop. If the answer is yes—as, in practice, will be the case in a huge number of ML models—then that use is likely prohibited by default. The Working Party 29, an official EU group involved in drafting and interpreting the GDPR, has said as much, despite the objections of many lawyers and data scientists (including yours truly).

So why is interpreting the GDPR as placing a ban on ML so misleading?

Because there are significant exceptions to the prohibition on the autonomous use of ML—meaning that “prohibition” is way too strong of a word. Once the GDPR goes into effect, data scientists should expect most applications of ML to be achievable—just with a compliance burden they won’t be able to ignore.

Now, a bit more detail on the exceptions to the prohibition.