Universiteit Leiden Universiteit Leiden

Nederlands English

8th LCDS meeting: data science goes back to its roots

How are data science and statistics related? And what can data scientists learn from statisticians? These were the central questions of the 8th meeting of the Leiden Centre of Data Science (LCDS), which took place on Friday, November 13th. Recent developments in the field of statistics, including new statistical approaches to dealing with large amounts of data, were discussed.

Room B.03 at the Snellius Building was packed. A varied audience of mathematicians, computer scientists, professionals from organisations such as the NFI (Netherlands Forensic Institute) and CBS (Statistics Netherlands), as well as master students had come to Leiden’s Bio Science Park to attend the meeting.

After a word of welcome by Jaap van den Herik (chair of the board of LCDS), Jacqueline Meulman (Statistics professor and board member of LCDS) gave an introduction on how the fields of statistics and data science are related. Or, related? In essence, data science is statistics, Meulman pointed out, quoting Biostatistics Professor Karl Broman: ‘When physicists do mathematics, they don’t say they’re doing “number science”. They’re doing math. If you’re analyzing data, you’re doing statistics. You can call it data science or informatics or analytics or whatever, but it’s still statistics.’

Genomic networks

The tone was set. What followed was an interesting talk by Professor Aad van der Vaart, who has recently been awarded a Spinoza Award for his work on Bayesian statistics. Van der Vaart showed how Bayesian statistics and sparse estimation can be applied in research on genomic networks. An important new approach, which can be used, for instance, to identify genes that play a role in cancer.

Up next was Machine Learning expert Tim van Erven, who spoke about Online Sequential Prediction: how to make predictions when data becomes available one piece at a time, instead of all at once. This is used in spam filtering, for instance, in which an algorithm learns how to classify incoming emails. Van Erven was followed by Johannes Schmidt Hieber, assistant professor at the Mathematical Institute, who gave a presentation on the Bayesian method and how it can be used for high dimensional data.

Criminal behaviour

After a coffee break, two presentations were held on the applied side of the matter. Cor Veenman, specialist at the Netherlands Forensic Institute (NFI), elaborated on the forensic applications of statistics and data science. Data mining methods can be used to discover criminal behaviour in large data sets, with few known target samples and using limited resources, Veenman argued. The final speaker was Nees Jan van Eck, who performs scientometric research at the Centre for Science and Technology Studies (CWTS). With brightly coloured visualisations, Van Eck showed how citation networks are structured, how research is less globalised than we tend to believe, and how data-driven research is much more prevalent in some scientific disciplines than in others.

Afterwards, the attendees had a chance to meet up, have a drink and continue their discussions at the Foobar. There, undoubtedly, the foundations for future contacts and collaborations were laid. ‘This meeting once more showed that statistics is the fundamental of data science’, said Jaap van den Herik. ‘It stresses the importance of close collaborations between data scientists and statisticians.’

(JvdB)

 During its monthly meetings, LCDS brings together data scientists and researchers/professionals from other disciplines. Each of the events is centered around a specific topic or field of study. The next meeting, which will take place on December 14th,  will be about Linked Open Data. The exact programme and location of the event will be announced; please subscribe to our mailing list if you would like to stay updated.