AI and GDPR - what do you need to know?

Article

by Ed Day

For over a year, GDPR has been the topic of conversion for businesses around the globe. In May 2018, the General Data Protection Regulation (GDPR)went into force as the new single data protection legislation in the European Union. Its remit affects those within Europe and outside - anyone processing and holding the personal data of data subjects residing in the European Union, regardless of a company’s location.

If you are using, or planning to use, artificial intelligence (AI) in your organisation you need to consider how the GDPR impacts on how you use data in your AI. The implications of getting this wrong can be substantial, a fine of up to 4% of your turnover.

In this article, I consider some of the tensions between the use of AI, Machine Learning (ML) and the GDPR.

AI poses data privacy challenges

Under GDPR, when you collect personal data, you have to say what it will be used for, and not use it for anything else. You cannot collect data simply to do a ML trawl on it. Specific uses of AI may be more acceptable, such as using data to calculate a credit score, but even then, you must take care that the scope of the credit scoring system does not widen. Always try to minimise the data you hold.

Not only are you required to minimise the amount of data you collect and keep - limiting it to what is strictly necessary for your stated purposes —you must also put limits on how long you hold the data. ML on the other hand wants as much data as it can get. The more data ML gets, the better it is at spotting patterns, and it is desirable to keep the data as long as possible, since historic patterns can better inform ML decisions. I think this will inevitably reduce the performance of some ML systems. It is likely that companies like Amazon who use ML in a number of systems such as recommendation engines, need to gain consent for each of the different uses from their customers, which might result in the simplifying of their ML.

Additionally, once data has been collected, you have to be able to tell people what data you hold on them, and what’s being done with it. You also need to be able to alter or get rid of people’s personal data if requested. So, data needs to be identifiable and accessible at an individual level, and this might mean that some ML systems have to remove data that does not contain individual level identifiers.

Like any legislation, it is yet unknown exactly what each provision of the GDPR means and some aspects will only become clear once they have been tested in court. Because of GDPR some US the based websites currently don't allow access to requests emanating from the EU. This is obviously not an ideal solution, and I think sooner rather than later most companies will reduce the amount of data they are collecting. I think this will make ML less effective in some areas, however a lot of ML is not necessarily based on customer data, such as image recognition systems, so there will be a number of uses of ML that the GDPR will have little effect on.

Increasing AI transparency with GDPR

There is also the issue of process transparency under the GDPR. Traditional algorithms are rule-based – a smallish set of logical rules are created based on expected inputs and outputs. The decision-making processes for such algorithms can easily be explained: the process is transparent. On the other hand, some ML algorithms create rules on the fly, and so are very difficult to make transparent. This also means the use of some ML algorithms - so-called blackbox AI - is problematic, since how the black boxes come to their decisions is impossible to explain.

Some argue this means GDPR prohibits the use of ML, others argue that the interpretation of GDPR is crucial, and it is unlikely that GDPR would prohibit ML. Again, test cases are necessary to clarify what the courts decide the legislation intends, but I think it highly unlikely that the EU would want the banning of ML. It will survive under the GDPR albeit probably in a less potent form.

The GDPR Article 9 prohibits discrimination using data of

…racial or ethnic origin, political opinions, religious or philosophical beliefs, or trade union membership, and the processing of genetic data, biometric data for the purpose of uniquely identifying a natural person, data concerning health or data concerning a natural person's sex life or sexual orientation...

although such data can be used in some cases when consent is given. This type of data can be extremely useful, in fact vital, for some systems such as those in healthcare and pharma where genetic information is particularly important. But this data should be handled extremely carefully and at the very least informed consent must be obtained to use individuals' data. Indeed, consent should always be clearly obtained when collecting any data, not just personal data.

AI and GDPR compliance: what is the future?

ML conflicts with GDPR in many ways:

purpose limitation - often ML is a trawling exercise
data minimisation - often as much as possible is collected
data retention - many AI systems never get rid of the large amounts of data they collect.

However, despite ML and the GDPR being somewhat at odds with one another, ML can be used to aid in compliance with GDPR, for example by spotting personal data that should not be held. Vast amounts of personal data are stored by large organisations, so automated processes using ML tools such as MinerEye Data Tracker are necessary to identify such data.

To be GDPR compliant at the very least you need to minimise the amount of data you hold and the time you hold it, make sure it is identifiable and removable, limit what you do with the data and be able to explain what you are doing with it and obtain clear unambiguous consent. There is a lot to think about when using AI in terms of data usage, but following these principles means you will have made large strides towards being GDPR compliant.

Have you thought about how to get your AI GDPR compliant? Ed Day is delivering an AI Fundamentals online course with the Innovation Academy on how you can lead your organisation into the fourth industrial revolution.

Ed Day is an expert in AI and Machine Learning with over thirty years’ technical consulting experience. Since 2006, he has been a Senior Lecturer at Canterbury Christ Church University, where he also managed Big Data computing projects combining Machine Learning, Hadoop and Spark technologies. He has worked in a range of industries across both the public and private sectors, from financial institutions including American Express, to the Arizona Department of Corrections.