Tatiana Makhalova will defend his thesis, entitled: Contributions to pattern set mining: from complex datasets to significant and useful pattern sets, on Wednesday, June 23, at 2 p.m.online (in English).
We discuss different aspects of pattern mining in binary and numerical tabular datasets. The objective of pattern mining is to discover a small set of non-redundant patterns that may cover entirely a given dataset and be interpreted as useful and significant knowledge units. We focus on such issues as (i) formal definition of pattern interestingness, (ii) the mitigation of the pattern explosion problem, (iii) measure for evaluating the performance of pattern mining, and (iv) the discrepancy between interestingness and quality of the discovered pattern sets.
The first part of the talk is devoted to a so-called closure structure and the GDPM algorithm for its computing. The closure structure allows for estimating both the data and pattern complexity. Moreover, we discuss how the closure structure allows an analyst to understand the intrinsic data configuration before selecting an interestingness measure for pattern mining.
In the second part, we discuss the difference between interestingness and quality of pattern sets. We present the KeepItSimple algorithm that adopts the best practices of supervised learning in pattern mining and relates interestingness and the quality of pattern sets. We show that KeepItSimple allows for efficient mining of a set of interesting and good-quality patterns without any pattern explosion.
The third part of the talk is devoted to numerical pattern mining. We present an MDL-based algorithm called Mint for mining pattern sets in numerical data. The Mint algorithm relies on a strong theoretical foundation and at the same time has a practical objective in returning a small set of numerical, non-redundant, and informative patterns. Mint has very good behavior in practice and usually outperforms its competitors.
Keywords: Pattern Set Mining; Pattern interestingness; MDL; Minimum Description Length principle; Closed patterns; Equivalence classes; Data complexity; Closure structure; Pattern explosion; Pattern evaluation; Formal Concept Analysis; Interval Pattern Structures; Binary data; Numerical data
Composition of the jury
Arnaud Soulet, MCf HDR, Université de Tours, Tours
Jilles Vreeken, Pr. The CISPA Helmholtz Center for Information Security, Saarbrücken
François Charoy, Pr. Université de Lorraine, Nancy
Antoine Cornuéjols, Pr. AgroParisTech, Paris
Elisa Fromont, Pr. Université de Rennes, Rennes
Esther Galbrun, CR Inria, University of Eastern Finland, Kuopio
Christel Vrain, Pr. Université de d’Orléans, Orléans
Sergei O. Kuznetsov Pr. NRU HSE, Moscow
Amedeo Napoli, DR CNRS LORIA, Nancy