Risks of big data not clearly identified in GDPR

28 August 2019

The General Data Protection Regulation (GDPR) came into force in 2018. It was intended to protect the rights and freedoms of individual citizens from the risks of personal data processing. Meanwhile, the phenomenon known as big data has continued to advance at a fast pace. PhD defence on 12 September.

Michiel Rhoen studied Applied Physics and in his subsequent work as an active duty fire officer he was involved in the control of technological risks. He went on to study law, graduating in 2013. 'Around this time, the term big data became familiar with the general public and one aspect of its discussion almost immediately concerned the risks in relation to privacy. This focused mainly on the dominant position that large corporations could accrue as they were able to build profiles on almost everyone', Rhoen explains. 'These profiles are based on an increasing amount of personal data which is collected during daily activities - such as chatting with friends, reading the paper or listening to music. So this is something that affects us several times a day'. At the same time, an initial text was published for what would later become the GDPR. 'I wondered to what extent the GDPR - as it was proposed then – would outline the risks arising from big data and what would be done about them.'

The line separating data from big data

'It's not easy to draw a line separating data from big data. The first concerns about big data arose in the 1970s when government authorities began to digitalise population records. Big data is actually the main result of computerization. As soon as you use a computer for something, it provides you with data. But at the same time, the computer also generates data about what you are doing; this is known as metadata. The amount of computers generating metadata is perhaps far greater than you would expect: when you use the Internet, a mobile phone or a smart TV, for example, you connect to perhaps dozens of other computers and these all generate data on your activities. Metadata can provide highly effective profiles since it knows when, how and even with whom you were active online. And anyone who controls a piece of the Internet can collect metadata. Sometimes this is even obligatory because the authorities want to see who mailed whom and when, in the case a crime has been committed for example.'

In his PhD research, Rhoen examines to what extent the GDPR reflects current scientific views on how to deal with technological risks and dominant market positions. 'I looked in particular at situations where consumers agree to the processing of their personal data. Think of the numerous 'I agree' buttons you have to click on when visiting a website. Rhoen compared the GDPR with environmental legislation, which also regulates risks, and consumer protection laws. 'Together with a researcher from the Faculty of Science of Utrecht University, I investigated whether the GDPR's anti-discrimination provisions are sufficient to prevent, identify and combat automatic discrimination that can occur as a result of the automated processing of data via algorithms.'

People with certain things in common can often be identified by the data they generate, Rhoen goes on to explain. For example, the owner of a supermarket will be able to see that there are groups of customers who never buy pork, and the owner of an Internet search engine can keep a tab on who is searching for information about flu symptoms or social diseases. 'Even if you're not specifically searching for people of a certain religion or with a certain illness, an algorithm can still identify that one group of people is different from other groups. Of course, there’s a reason for having an algorithm: it has to decide, for instance, on the next advertisement that you see online. If people in a certain group are treated unfairly compared to people outside that group, it is possible that those people are being discriminated against.'

The GDPR prohibits storing data related to health, religion or sexual preference. 'But now that data and metadata on almost all your activities is collected, an algorithm – even one that only uses neutral data – can nevertheless still measure one group of people with a single sensitive attribute by the same yardstick. This appears to be a fundamental feature of human society, and the GDPR does not take this into account as such. Discrimination, therefore, cannot be consistently prevented or identified.'

Underlying aim of GDPR unclear

Although the GDPR was intended to limit the risks in relation to an individual's rights, Rhoen claims that he has been unable to establish whether the legislature had a clear picture of what those risks are and how to deal with them. 'None of the models on risks and balance of power which I studied were clearly applied in the GDPR. The GDPR therefore deviates from regulations dealing with consumer or environmental protection. This is remarkable.'

Rhoen expects, therefore, that it will be difficult to establish how well the GDPR is actually functioning: if you don't know what it is exactly that you want to prevent, how can you establish whether you have been successful? 'If in a few years you want to see if the GDPR should perhaps be improved, there is no clear way to determine what needs to be improved. There will, of course, be legal proceedings concerning the GDPR. A similar problem will arise then. If it is unclear what the underlying aim is, the explanation and application of the rules will be difficult. In legal practice this will unavoidably lead to complicated legal discussions.'

The researcher hopes that his dissertation will contribute to a meaningful discussion on the law on privacy and data protection, 'an important area of law which will continue to develop in the coming years'.

Professor G.J. Zwenne on Michiel Rhoen’s research:

'The GDPR is based on basic premises which are difficult to relate to the developments we are currently witnessing in relation to big data and machine learning. Serious questions exist about the effectiveness and 'sell-by date' of the new privacy rules. Michiel's dissertation provides a framework within which these questions can be answered.'

Text: Floris van den Driesche
Contact