Smarter hypothesis testing with statistics: how e-values can improve scientific research

13 June 2025 text: Manon Boot

During his PhD research, mathematician Tyron Lardy worked on a new approach to hypothesis testing. Instead of the traditional p-value, he uses so-called e-values. These turn out to be more flexible – especially when you want to look at your results midway through the study.

Imagine this: you repeatedly bet an amount of your choice on a fair coin, which has an equal chance of landing heads or tails. If it lands on heads, you double your money; if it lands on tails, you lose it. On average, you expect to break even – it’s a fair bet. You start with €1 and, each round, bet everything you have. If you happen to get heads eight times in a row, you’ll end up with 256 euros and might start to wonder: is this coin really fair? That is the concept behind e-values: they help you assess whether an assumption still holds.

The e-value (where ‘e’ stands for expected value) offers an alternative to the p-value (with ‘p’ standing for probability), which researchers traditionally use to test their hypotheses. The p-value comes with a major limitation: in principle, you’re only supposed to draw conclusions once you’ve collected all your data. If you later decide to add more measurements, your statistical analysis is no longer valid. ‘A lot of researchers still do it anyway, especially when their p-value is just not quite small enough,’ says Tyron Lardy. This increases the risk of drawing the wrong conclusion. E-values, on the other hand, remain statistically sound even when you add extra data or adjust your analysis plan as you go along.

Tyron Lardy’s supervisor, Peter Grünwald, has been studying e-values for years. Grünwald explains: ‘You can think of the e-value as the amount of money you would earn from bets like the one in the example.’ The higher the e-value, the stronger the evidence against your original assumption (‘The coin is fair’). That makes e-values especially useful in fields like medicine and psychology, where researchers often face complex situations and need flexibility in how they handle data.

A general recipe for e-values can be very complex

By now, there’s a general method for calculating an optimal e-value. But in practice, that method isn’t always easy to apply. ‘That’s why I looked into how to design a good e-value for these kinds of complex problems,’ says Lardy. ‘What recipe should someone follow to end up with a meaningful number at the end of their experiment?’

One concrete example is testing whether a medicine works while taking into account factors like the patient’s age or gender. ‘In clinical trials, you usually know exactly how the treatment is assigned – one half of the patients receives the medicine, the other half a placebo. You can use that knowledge to build an optimal e-value,’ Lardy explains.

Netflix is already using it – now the universities need to catch up

For now, p-values are still the norm in most university programmes. Will we ever fully switch to e-values? According to Grünwald, there are still a few hurdles to overcome. ‘The theory is there, but we now need to develop the practical tools. We’ve got beautiful formulas, but we still need good software to go with them.’ There’s also the matter of catching up: p-values have been standard practice for decades. ‘A lot of people know about their limitations, but still stick to what they’re familiar with.’

Even so, Lardy sees signs of progress. Tech companies like Netflix are already using e-values, for instance, to test whether users are more likely to click on a red button or a grey one. Lardy and Grünwald hope that one day, e-values will make their way into university textbooks – so that future students learn from the start that they might be better off using e-values to test their hypotheses.

PhD defence

Tyron Lardy will defend his thesis, titled Optimal Test Statistics for Anytime-Valid Hypothesis Tests, on 18 June in the Academy Building. His supervisors are Peter Grünwald and Wouter Koolen-Wijkstra.

statistics