Leiden: Silicon Valley of FAIR data

If researchers make their data FAIR, computers can link large quantities of data and identify patterns, thus greatly accelerating the process of arriving at new insights. In Leiden, the birthplace of ‘FAIR data’, Professor Barend Mons explains the meaning of this term.

Imagine that a computer programme has access through the internet to all the results of all the medical research in the world. The programme will then be able to detect relationships that no physician in the world could otherwise have detected, simply because the quantity of data involved is more than a human being can process. This could lead to new insights, better diagnoses and new drugs. From the technical point of view, this is already possible, not only for medical science but for all disciplines.

However, before this can happen, the data must first be FAIR: Findable, Accessible, Interoperable and Reusable. Only when all these criteria are met will we come closer to achieving the envisaged future scenario.

Accessibility and privacy

‘Academic publications based on publicly funded Dutch research already have to meet Open Access criteria,’ says Barend Mons, Professor of BioSemantics at LUMC. ‘However, the fact that everyone can read the article doesn’t mean that the underlying research data are findable and accessible for a computer.’ This requires metadata structures: data stations that tell the computer programme what kind of data can be found where, such as the medical data on smokers.

This must not, of course, disrupt the balance between data linking and privacy. ‘The metadata stations therefore clearly indicate the level of accessibility: are smokers’ data, for example, accessible to everyone, or do you have to contact the person conducting the research?’

Finally, the data must be interoperable and reusable by the computer programme. A computer isn’t good at handling ambiguities, such as the abbreviation PSA, for instance, which not only stands for Prostate Specific Antigen but also has more than 100 other meanings. Every possible term in the world would therefore need to be given a unique numerical code that is centrally recognised.

Reward for data sharing

‘It all sounds more complicated than it really is,’ says Mons. ‘The problem is 80% cultural. At the moment, there aren’t enough incentives to share scientific data. Researchers are rewarded for publishing their paper and for their citation/journal impact factor.’ The scientific paper itself, it seems to Mons, takes second place. ‘An impact factor needs to be assigned to the data output of research: the researcher will be rewarded if the data are combined with another dataset.’

Mons is the chair of an EU advisory group in this area. He takes the view that it won’t be long before the outlined future scenario gradually becomes reality. ‘From 2017, researchers will only receive funding from the Horizon 2020 programme if they make their data FAIR. Once other funding bodies follow suit, so too will the researchers. But ideally, they themselves will soon see the tremendous advantages of FAIR data and good data stewardship.’

Making research data FAIR is on academic agendas all round the world. However, the concept originated in Leiden, says Mons with some pride. ‘The principles of FAIR data were first formulated during a workshop in the Lorentz Center about two-and-a-half years ago. Now experts in the area of linked data come here from all over the world to implement FAIR data. If the government invests enough, the Netherlands could become a very important FAIR data player, and Leiden would be a kind of Silicon Valley of FAIR data.’

Professor Barend Mons - Social machines & FAIR data

Publication: The FAIR Guiding Principles for scientific data management and stewardship
Dutch Techcentre for Life Sciences (DTLS)
Video: Vision on Open Science

