MALINCA: the project that aims to bridge the linguistic gap between the mathematician and the machine

Philippe de Groote, Hugo Herbelin, ERC Synergy Grand 2024.

5 November 2024

The MALINCA (MAthematicae LINgua franCA) project, led by Philippe de Groote, Hugo Herbelin, Paul-André Melliès and Carlos Simpson, has won the ERC Synergy Grant 2024 call for projects. Explanation of the scientific stakes by two of the principal investigators: Hugo Herbelin, senior researcher in the PICUBE joint project-team (Université Paris Cité, CNRS, Inria) and Philippe de Groote, head of the SÉMAGRAMME joint project-team (Université de Lorraine, CNRS, Inria) at the Loria laboratory.

What is the scientific ambition of the MALINCA project?

We like to introduce our project by saying that it aims to bridge the linguistic gap between the mathematician and the machine. When we talk about machines here, we are thinking in particular of proof assistants like Coq, which are programs that allow mathematics to be developed and certified by computers. In recent years, these programs have demonstrated an astounding ability to tackle the complete formalization and certification of substantial and complex mathematical results (the Feit-Thompson theorem, the Kepler conjecture, and the liquid tensor challenge of Scholze).

This success leads us to believe that it’s time to equip demonstration assistants with the ability to understand mathematics in the flexible and semi-formal way that mathematicians use in their dayly work.  In other words, we want the machine to be able to analyze, understand and exploit mathematical texts written in that particular fragment of the language that the mathematicians use to communicate with each other and publish their results (a fragment that is rigorous but remains informal and has the characteristic of mixing natural language and mathematical notation).

If we had to sum up our project in the form of a slogan, we would say that the ultimate code for interacting with proof assistants must be natural language.  Achieving such a goal requires both applied and fundamental research in mathematics, computer science, logic and linguistics.

What will the funding be used for?

Our project does not require any major investment in equipment. It is primarily a scientific and human adventure. We intend to gather a team of researchers around our goal. In addition to the four “principal investigators”, our project includes three associate members: David Alfaya from the Pontifical University of Comillas in Madrid, a mathematician specializing in machine learning, Benoît Crabbé from the Laboratoire de Linguistique Formelle at Université Paris Cité, a specialist in automatic language processing, and Daniela Petrișan from the Institut de Recherche en Informatique Fondamentale at Université Paris Cité, a specialist in probabilistic semantics.

We also plan to fund twelve Ph.D.theses, twenty-four years of postdoctoral fellowships, and the hiring of two engineers. We also plan to federate a scientific community around our research theme by organizing conferences, including thematic residential conferences in the spirit of the thematic trimesters of Institut Henri Poincaré.

How did you come up with the idea of submitting a project to the ERC Synergy call for projects?

The project began with a discussion within the Inria Pi.r2 project-team about the formalization of mathematics, the development of proof assistants and the logical foundations of mathematics. Very quickly, this discussion led to a seemingly elementary central question: “What is a mathematical proof?”. There are several possible answers to this question. One could support the radically formal thesis that a proof is a term of a certain type within the calculus of constructions (the type theory behind proof assistants such as Coq and Lean).

A more semantic view is that a proof is a mathematical object that allows such a term to be given an interpretation. Another possible pragmatic answer is that a proof is a text, considered as such, that appears in a mathematics book or a scientific article. One can even defend the psychological and social point of view that would define a proof as a communicable mental representation, recognized and validated as such by the community of mathematicians. One then realizes that all these answers are only facets of the same reality, and that the fundamental question is that of the deep nature of a proof. This led naturally to the idea of combining expertise in theoretical computer science, proof theory, software development, formal linguistics and mathematics.

Over the course of the discussions, the project evolved in terms of application, incorporating the idea of using machine learning and natural language processing techniques to develop a new generation of proof assistants that would be driven by natural language while remaining rooted in solid mathematical foundations.

Are there any non-scientific issues involved in this project, such as societal issues?

In the medium term, our project should lead to educational applications. It should also make it easier for researchers to publish scientific results whose mathematical correctness has been formally certified. In the longer term, the continuation of our project could make the field of “computer-aided mathematics” accessible to everyone.

Some forty years ago, when studying to become an engineer, technician, designer or architect, a large part of one’s time was spent on so-called technical drawing. Since then, computer-aided design has relegated tracing paper, India ink, pens, and drafting tables to the status of antiques. Our goal, once achieved, could lead to a similar revolution in the practice of mathematics.

Source of the article: inria.fr