PhD. Position/Offre de thèse : Statistical Performance Evaluation Tools for Classification and Machine Learning with Erroneous Data

Full Description

General Topic

Classical performance evaluation metrics used in the domains of Classification and Machine Learning are based on the assumption that the reference data used for validation and comparison are not error-free. Recent work has shown that this hypothesis is almost never guaranteed.

In order to evaluate and compare different classification methods objectively and with verifiable levels of confidence, we have started to explore empirical statistical techniques which are promising and which make it possible to avoid of the constraint of perfect data. These techniques now need to be formally and theoretically validated on the one hand and extended to classifiers that are more generic. Our experiments validated the approach on binary classifiers; they should be generalized to classifiers involving any number of classes.

This project aims to revisit the whole performance evaluation process by studying and developing statistical tools expressing a ‘confidence’ in classification measures resulting from evaluation or benchmarking campaigns.

The question to be answered is the following: given the response of n algorithms to a set of reference data, what confidence can be attached to the resulting ranking, given an estimated error rate of less than ε? Or, conversely, from what error rate on the reference data can we consider, with a confidence rate of τ that the obtained classification cannot be guaranteed? These questions can be expressed with several probabilistic formulations. For example, considering the data as realizations of a random variable (whose law may belong to a given parametric model), we can study the responses of the n algorithms as a function of this random variable. This will make it possible to compute the probability of having a given ranking of these answers and to test if the ranking is reliable. Assuming that the error distribution belongs to a parametric model, Bayesian statistical tools can be used to study the posterior distribution of the parameters in the light of the responses of the algorithms.


This work being co-supervised by two research teams (one specialized in classification and learning, the other in statistics), the candidate is required to develop either but without distinction, a more Computer science profile, either a more Applied Mathematics profile, or both, depending on their skills. It is expected that candidates will then invest the most appropriate scientific field, while maintaining sufficient openness and interaction with the other.

In general, candidates will have to have a strong Computer Science and Mathematical literacy and a exhibit a high level of curiosity in the fields of classification and machine learning.


Please contact

Potential candidates should also send a resume and motivation letter and apply online through


Logo du CNRS
Logo Inria
Logo Université de Lorraine