Tim van Erven
In this era of big data, self-learning algorithms can be of great help. But if they try to learn too fast, they can start hallucinating and find patterns in the data that are not really there. Mathematician Tim van Erven works on developing algorithms which adjust their learning strategies depending on the data to avoid such mistakes.
Making computers learn smarter and faster
More and more people use machine learning to plow through big piles of data, looking for new information. But where do they get the algorithms that run their machine learning software? And how do they know they can trust them? That’s where Van Erven comes in. ‘I develop new sub-algorithms and I work on the mathematical theory that describes how well the algorithms work under different circumstances.’
Machine learning is a branch of artificial intelligence (AI) which is used to analyse data using self-learning algorithms. Instead of writing down all the rules an algorithm needs to follow, you show the algorithm examples of what you want it to do and it learns the rules by itself. Van Erven: ‘You can, for example, show a self-learning algorithm pictures of cats and dogs, and tell it whether it was a cat or a dog. If you do that long and well enough, the algorithm will learn how to identify cats and dogs on pictures.’
‘If an algorithm draws its conclusions about noise-filled data too soon, it might find patterns that are not really there. I develop algorithms which adjust their learning strategies depending on the data, so algorithms do not start hallucinating patterns.’
Van Erven has been working on self-learning algorithms that process data sequentially. ‘This means they work through a large data set one data point at a time’, he explains. ‘For example, if you have a set of 200,000 websites, each with 600,000 characteristics, and you want to teach an algorithm to recognise which are fraudulent, it has to process a few gigabytes of data. That is too much for a computer to do at once. It’ s more efficient if the algorithm looks at a single website, learns from it and then moves on to the next one.’
However, data sets are never perfect. They always contain some noise, such as a cat picture which is accidentally labelled ‘dog’. Algorithms have to be able to cope with that. ‘If an algorithm draws its conclusions about noise-filled data to soon, it might find patterns that are not really there,’ says Van Erven. ‘Just like we sometimes see figures in the clouds. You do not want your algorithm to start hallucinating patterns.’
Earlier in 2019, Van Erven received a Vidi grant to work on so-called adaptive algorithms, which could solve this problem by adapting, depending on the difficulty of the data. ‘If you have difficult data with a lot of noise, the algorithms should be more careful and learn a bit slower to avoid finding patterns that are not there,’ says Van Erven. When the data is easier, with little noise, the algorithm can learn quicker. ‘I am working on adaptive algorithms that determine by themselves on the basis of the data whether it is difficult or easy and which learning strategy they should apply.’
Van Erven is also part of the statistical research group at Leiden. ‘I like to focus on the mathematical side of self-learning algorithms, proving their characteristics. Machine learning is used for more and more advanced applications, making the algorithms more and more complex. It’s important to understand the components of those complex systems and know how and when they work best. A very powerful way to study that is using mathematics.’
When he was 16 years old, Tim van Erven (Eindhoven, 1982) read about Artificial Intelligence being able to solve a maze. ‘I still think it’s magical that a computer program, which always follows strict rules, can perform complex and intelligent behaviour if you program it to learn.’ It’s no surprise he studied AI at the University of Amsterdam. He then did a PhD at Centrum Wiskunde & Informatica (CWI), followed by a postdoc at the Université Paris-Sud, and is now an assistant professor at Leiden University.