Chargement Évènements

« Tous les Évènements

  • Cet évènement est passé

PhD defense : Anastasia Shimorina (Synalp)

26 février 2021 @ 14:00 - 15:30

Anastasia Shimorina (Synalp) will defend her thesis on Friday, 26th February at 2pm.

Her thesis is entitled “Natural Language Generation: from Data Creation to Evaluation via Modelling” and supervised by Claire Gardent and Yannick Parmentier.

Abstract:
Natural language generation is a process of generating a natural language text from some input. This input can be texts, documents, images, tables, knowledge graphs, databases, dialogue acts, meaning representations, etc. Recent methods in natural language generation, mostly based on neural modelling, have yielded significant improvements in the field. Despite this recent success, numerous issues with generation prevail, such as faithfulness to the source, developing multilingual models, few-shot generation. This thesis explores several facets of natural language generation from creating training datasets and developing models to evaluating proposed methods and model outputs.

In this thesis, we address the issue of multilinguality and propose possible strategies to semi-automatically translate corpora for data-to-text generation. We show that named entities constitute a major stumbling block in translation exemplified by the English-Russian translation pair. We proceed to handle rare entities in data-to-text modelling exploring two mechanisms: copying and delexicalisation. We demonstrate that rare entities strongly impact performance and that the impact of these two mechanisms greatly varies depending on how datasets are constructed. Getting back to multilinguality, we also develop a modular approach for shallow surface realisation in several languages. Our approach splits the surface realisation task into three submodules: word ordering, morphological inflection and contraction generation. We show, via delexicalisation, that the word ordering component mainly depends on syntactic information. Along with the modelling, we also propose a framework for error analysis, focused on word order, for the shallow surface realisation task. The framework enables to provide linguistic insights into model performance on the sentence level and identify patterns where models underperform. Finally, we also touch upon the subject of evaluation design while assessing automatic and human metrics, highlighting the difference between the sentence-level and system-level type of evaluation.

Keywords: natural language generation, data-to-text generation, surface realisation, evaluation, error analysis

Committee:

Reviewers:
– Emiel Krahmer, Full Professor, Tilburg University, the Netherlands
– Kees van Deemter, Full Professor, Utrecht University, the Netherlands

Examiner:
– Dimitra Gkatzia, Associate Professor, Edinburgh Napier University, UK

Supervisors:
– Claire Gardent, Directrice de recherche, CNRS, LORIA, France
– Yannick Parmentier, Maı̂tre de conférences, Université de Lorraine, France

Détails

Date :
26 février 2021
Heure :
14:00 - 15:30
Catégorie d’évènement:

Lieu

online