Lecture
Florence Nightingale Colloquium presents Peter Flach
- Date
- Friday 26 November 2021
- Time
- Explanation
- The seminar is targeted at a broad audience, in particular we invite master students, PhD candidates and supervisors interested or involved in the Data Science Research programme as well as colleagues from LIACS and MI to attend. The seminar is organized by the DSO, MI and LIACS.
- Location
- Zoom link will follow

The highs and lows of performance evaluation: Towards a measurement theory for machine learning
Abstract:
Our understanding of performance evaluation measures for machine-learned classifiers has improved considerably over the last decades. However, there is a range of areas where this understanding is still lacking, leading to ill-advised practices in classifier evaluation. This is clearly problematic, since if machine learning researchers are unclear about what exactly their experiments are telling them about their machine learning algorithms, then how can end-users trust systems deploying those algorithms?
I suggest that in order to make further progress we need to develop a proper measurement theory of machine learning. Measurement theory studies the concepts of measurement and scale. If one has a way to measure, say, the length of individual rods or planks, this should also allow one to then calculate the combined length of concatenated rods or planks. What relevant concatenation operations are there in data science and AI, and what does that mean for the underlying measurement scale?
I discuss by example what such a measurement theory might look like and what kinds of new results it would entail. I furthermore argue that key properties such as classification ability and data set difficulty are unlikely to be directly observable, suggesting the need for latent-variable models. Ultimately, machine learning experiments need to go beyond simple correlations and aim to make causal inferences of the form 'Algorithm A outperformed algorithm B because the classes were highly imbalanced', or counterfactually, 'if the classes were re-balanced, this performance difference between A and B would not have been observed'.
Short CV:
Peter Flach has been Professor of Artificial Intelligence at the University of Bristol since 2003. An internationally leading scholar in the evaluation and improvement of machine learning models using ROC analysis and calibration, he has also published on mining highly structured data, and has an interest in human-centred AI. He is author of Simply Logical: Intelligent Reasoning by Example (John Wiley, 1994) and Machine Learning: the Art and Science of Algorithms that Make Sense of Data (Cambridge University Press, 2012).
Prof Flach stepped down last year as the Editor-in-Chief of the Machine Learning journal, after being in post for 10 years. He was Programme Co-Chair of the 1999 International Conference on Inductive Logic Programming, the 2001 European Conference on Machine Learning, the 2009 ACM Conference on Knowledge Discovery and Data Mining, and the 2012 European Conference on Machine Learning and Knowledge Discovery in Databases in Bristol. He is President of the European Association for Data Science, and a Fellow of the Alan Turing Institute for Data Science and Artificial Intelligence.