Leiden University launches Data Science research programme

Leiden University is investing 4 million euros in a new Data Science research programme. This is a joint initiative of all the faculties, headed by Dean Geert de Snoo at the Faculty of Science. The programme will focus on Leiden scientific data.

Knowledge exchange is the essence

De Snoo: 'The Leiden Data Science programme is important for all our faculties, in particular for domains where scientists are increasingly working with large amounts of data, as in language research, environmental research, medicine, archaeology and biology. Pattern recognition using big data generates new insights: it reveals links between specific data that were not previously recognised.' The essence of the research programme is the exchange of knowledge: it is multidisciplinary. Data scientists from the Faculty of Science work in teams with experts from different disciplines at other faculties. The motto of the programme is 'Learning from one another and growing stronger together.'

Social and Behavioural Sciences

Hanna Swaab, Dean of the Faculty of Social and Behavioural Sciences, is delighted with this initiatief: 'The programme is a good match for our Faculty. It fits well within the current research themes and it offers us the chance to work more closely together with researchers from other disciplines and to explore new lines of research with them.' A number of big data projects are currently running in the faculty focusing on such questions as how we can better predict development risks, how we can evaluate the benefits of science by analysing large datasets, and how we can discover patterns in psychopathology that will help us identify opportunities for new treatment methods for patients.

Fundamental maths approach

The fast-developing field of data science aims to discover patterns in large amounts of data and to convert these patterns into usable information. Leiden University is strong in data science, but the knowledge is currently linked to specific disciplines and is consequently spread throughout the University. Data science in the Faculty of Science focuses on methodology, combining statistics and information science. The faculty is known for its fundamental mathematical approach; Leiden University was one of the initiators of particular standards in Data Science. 

Socially relevant

The new research programme is in line with the diverse societal themes in the National Research Agenda, where big data is one of the pathways: it will play a role in improving healthcare (e-health), preserving our heritage, opening up texts, making energy more sustainable and promoting the effectiveness and acceptance of legislation, jurisprudence and the maintenance of law. 

Digging up new knowledge

But this is not the only reason for pressing ahead, commented Joost Kok, Director of the research programme. 'We have more data than ever at our disposal and big advances are being made with new algorithms in statistics and artificial intelligence. Not only that, with the current status of computing power, we can say that data management has reached maturity. We can now go looking for the knowledge that is hidden in all that data.'  

PhD candidates are the key

The PhD candidates are the key to the research programme. Seven of them will be appointed, each one focusing on the fundamental development of new algorithms that can be applied in the fields of research at Leiden University. The seven faculties will each appoint a PhD candidate who will work on a specific data science project. The next step will be to combine the data from different fields. The PhD candidates will each have two supervisors or co-supervisors, one a specialist in data science, and one from the faculty's research field. The aim is for the programme to have an open structure: besides the PhD candidates, other researchers are very welcome to join in the research and students can work on projects. 

Leiden Centre of Data Science

The research programme is intended to strengthen the Leiden Centre of Data Science (LCDS) that opened in  2014 at Leiden University. LCDS is the network of all Leiden researchers who are involved in big data. With the arrival of the new Data Science Lab, equipped with specialist computers that have enormous computing power, the LCDS will have a physical base. Of the total investment in Leiden Data Science, 2.6 million euros will come from Leiden University's central Innovation Fund, and 1.4 million from the faculties to fund the faculty PhD candidates. 


New teaching applications will also be developed as part of the data science programme, and September 2016 will see the start of the Master's in Statistical Science and Computer Science. The aim is also to provide support for the teaching in other programmes. A  Small Private Online Course or a Massive Online Open Course will be developed.

Excellent starting position

Kok believes that the time is ripe to start this research programme in Data Science. ‘A lot of data has always been produced and amassed in Leiden: in libraries, museums, labs, hospitals and at the Leiden Observatory. These are data in the form of documents and images, gathered through cohort studies or by observations using telescopes. Data are being gathered continuously, in all places and at all times, and this trend is set to continue and even increase.  However, that's not the ultimate aim: the goal is to discover the unknown in what we do know.' All the necessary elements are in place: Leiden statistician Aad van der Vaart was awarded the Spinoza Prize in 2015 and Leiden University holds 7th place worldwide in the field of mathematics and computer science & engineering in the CWTS ranking. 

Examples of big data projects at Leiden University

Political Science  
Dr Daniela Stockmann studies the mobilisation of people by means of social media under an authoritarian regime, as in China. The data science aspect is in following politically tinted tweets, for example: are they forwarded or retweeted?  Users' click behaviour is analysed using specific software. Stockmann looks at offline behaviour: is there a relation between conduct on social media and 'real' behaviour, for instance in the likelihood someone will take part in a demonstration?

The time when astronomers stared at stars using a telescope through an opening in the roof and made notes is not so very far behind us. However, in the present day enormous telescopes gather mind-boggling amounts of data. Analysing this data using powerful computers generates information on the Universe; on black holes, for example, the domain of  Professor Simon Portegies Zwart. He not only conducts astronomical research, but also looks at what data is needed to generate new astronomical knowledge. 

Institute for Area Studies 
Professor Hilde de Weerdt at the Faculty of Humanities conducts research on the impact of informal groups and networks on the political process in China. Together with Dr Brent Ho, she developed the MARKUS platform for annotating and visualising data in Chinese texts. MARKUS is used for all kinds of applications in Chinese humanities. 

Barend Mons, Professor of Biosemantics at the LUMC, is engaged in very different work. As a bioinformation specialist, he is working on a standard format for storing data files in open repositories so that they can also be accessed by other users. In medicine, for example, only 12% of the data files are accessible by other scientists. The standard being promoted by Mons is FAIR: fundable, accessible interoperable and reusable. Mons is chairman of an EU working group that focuses on standardisation and that operates under the name of the European Open Science Cloud.