Universiteit Leiden

nl en


When data compression and statistics disagree: two frequentist challenges for the minimum description length principle

Promotor: P.D. Grünwald

Tim van Erven
23 november 2010
Thesis in Leiden Repository

According to the minimum description length (MDL) principle, data compression should be taken as the main goal of statistical inference. This stands in sharp contrast to making assumptions about an underlying "true'' distribution generating the data, as is standard in the traditional frequentist approach to statistics. If the MDL premise of making data compression a fundamental notion can hold its ground, it promises a robust kind of statistics, which does not break down when standard, but hard to verify, assumptions are not completely satisfied. This makes it worthwhile to put data compression to the test, and see whether it really makes sense as a foundation for statistics. A natural starting point are cases where standard MDL methods show suboptimal performance in a traditional frequentist analysis. This thesis analyses two such cases. In the first case it is found that although the standard MDL method fails, data compression still makes sense and actually leads to the solution of the problem. In the second case we discuss a modification of the standard MDL estimator that has been proposed in the literature, which goes against its data compression principles. We also review the basic properties of Rényi's dissimilarity measure for probability distributions.