Sifting through data the smart way

We produce more data than ever before, and researchers gather more and more information. That data contains a wealth of insights and new possibilities. But how do you extract them? In Leiden, statistics and information science come together in innovative multidisciplinary research. Read more in the Data Science research dossier.

Delving in data
With our public transport cards, smartphones and our use of the Google search machine, we are continuously producing data. Research institutions generate more data than ever before and the amount of data in the world is growing exponentially. All this big data that we generate conceals a wealth of information that data scientists are able to extract. New insights and opportunities can, for example, mean that Alzheimer is diagnosed earlier or information can be gained on the origin of the universe. Data science opens the doors to opportunities that could previously only be imagined.

Intelligent computers
Data scientists design algorithms that allow computers to recognise patterns in large quantities of data. They are also working on self-learning computers, so that the computers can detect even more patterns and carry out tasks using the insights they have gained. With this kind of software a computer can, for example, analyse data from previous legal cases and learn to use this data to issue judgements.   Jaap van den Herik, Professor of Computer Science and Law, believes that in 15 years’ time computers will be able to handle simple legal cases without the need for a judge.

No statistics, no data science
To be able to develop software that can recognise patterns, you first need a solid basis in maths and statistics.  It’s only with statistics that you can show that a pattern you have identified is not coincidence, but a real pattern. For decades before data science became a hot topic, Leiden researchers have been conducting research on methods for finding patterns and signals among the white noise in large datasets. Leiden is also known for its mathematical, fundamental approach to data science that combines statistics with information science.

Setting the standard
Data scientists in Leiden study the criteria needed to work with data science. In organisational terms, for instance, it is something of a challenge for scientists to be able to make their datasets accessible. Leiden is the birthplace of the ‘FAIR data principles’. It is only once scientists make their data compatible and they have the opportunity to link their data to that of other researchers that smart computers can start to discover patterns. There are also ethical hurdles that have to be overcome. How can we make sure that all that data and the resulting insights arec handled responsibly?

