Universiteit Leiden Universiteit Leiden

Nederlands English

10th LCDS meeting: no more data graveyards

How can we make sure data is universally findable, citable and reusable? This question was raised - and answered - during the meeting BioSemantics & FAIR Data, which was organised by the Leiden Centre of Data Science on January 14th.

Methods from the new field of data science can help us understand the mechanisms underlying human genetic diseases. For instance, by keeping track of the frequency of gene-disease associations in the queries that people make, actual associations between diseases and genes can be found. During the meeting, professor Barend Mons and researcher Erik Schultes from the LUMC BioSemantics lab explained how this innovative method works, and how it makes use of nanopublications.

Messy data

As professor Jaap Heringa (VU) pointed out, the field of bioinformatics has changed drastically over the past years. ‘Thirty years ago, data was hardly available. Nowadays, the biggest driver of bioinformatics is technology.’ Obviously, more and more scientists use large amounts of data in their research. This often leads to new insights, but time and again it causes problems: messy data leads to unreliable outcomes. Therefore, data cleansing takes up a lot of our time. Paul Groth from Elsevier Labs put it this way: ‘Sixty percent of research time is spent on data preparation. We are wasting highly educated professionals’ time on this.’ Secondly, as LIACS professor Fons Verbeek pointed out, hard disks full of data are lost or put away after researchers have completed their projects, which results in data graveyards. 


In order to provide a smooth research path for PhD students and seasoned researchers, data should be FAIR, the speakers  argued. This means that data should be Findable, Accessible, Interoperable, and Reusable for both humans and computers. Thus, data should be well annotated, stored for the long term and ready to be combined with other datasets. This way, large datasets can be built and data from different research groups can be integrated. More information on the FAIR data initiative can be found here.


During its monthly meetings, LCDS brings together data scientists and researchers/professionals from other disciplines. Each of the events is centered around a specific topic or field of study. If you would like to stay updated on the events LCDS organizes, please subscribe to our mailing list.