Liam Cripwell (Synalp) will defend his thesis, entitled “Controllable and document-level text simplification“, on Friday, November 10th at 3 p.m. am in room A008.
Abstract :
Text simplification is a task that involves rewriting a text to make it easier to read and understand for a wider audience, while still expressing the same core meaning. This has potential benefits for disadvantaged end-users (e.g. non-native speakers, children, the reading impaired), while also showing promise as a preprocessing step for downstream NLP tasks. Recent advancements in neural generative models have led to the development of systems that are capable of producing highly fluent outputs. However, these end-to-end systems often rely on training corpora to implicitly learn how to perform the necessary rewrite operations. In the case of simplification, these datasets are lacking in both quantity and quality, with most corpora either being very small, automatically constructed, or subject to strict licensing agreements. As a result, many systems tend to be overly conservative, often making no changes to the original text or being limited to the paraphrasing of short word sequences without substantial structural modifications. Furthermore, most existing work on text simplification is limited to sentence-level inputs, with attempts to iteratively apply these approaches to document-level simplification failing to coherently preserve the discourse structure of the document. This is problematic, as most real-world applications of text simplification concern document-level texts.
In this thesis, we investigate strategies for mitigating the conservativity of simplification systems while promoting a more diverse range of transformation types. This involves the creation of new datasets containing instances of under-represented operations and the implementation of controllable systems capable of being tailored towards specific transformations and simplicity levels. We later extend these strategies to document-level simplification, proposing systems that are able to consider surrounding document context and use similar controllability techniques to plan which sentence-level operations to perform ahead of time, allowing for both high performance and scalability. Finally, we analyze current evaluation processes and propose new strategies that can be used to better evaluate both controllable and document-level simplification systems.
Jury
Reviewers:
- Benoît Sagot, Inria Paris
- Benoit Favre, Aix Marseille Université
Examiners:
- Wei Xu, Georgia Institute of Technology
- Liana Ermakova, Université de Bretagne Occidentale
Thesis Director and co-Director:
- Claire Gardent, Université de Lorraine
- Joël Legrand, Université de Lorraine