Huiyuan Kelvin Han (Synalp) will defend his thesis, entitled “Generating and answering questions across text and knowledge graphs”, on Monday, December 2nd at 2:30 p.m., in room A008.
Abstract:
Question generation (QG) is the task of automatically producing a question given some information source containing the answer. It is a subtask within natural language generation (NLG) but is also closely associated with question answering (QA), which is a counterpoint to QG. While QG is concerned with generating the linguistic expression for seeking information, the QA task is concerned with meeting that need by automatically identifying the answer to a question given some information source. Both tasks have direct applicability in domains such as information retrieval, dialogue and conversation, and education. Recent research also indicates that QG and QA, when used jointly in QA-based evaluation, are helpful for factual verification (especially for NLG outputs such as summarisation and data-to-text generations). When used together to produce a discourse representation, they can also help reduce the propensity of large language models (LLMs) to produce text with hallucinations and factual inaccuracies. While QA has long been studied, and approaches have been proposed as early as the 1960s, QG only started to gain more research attention in recent years. Most research on the tasks is focused on addressing only one of them and doing so for a single modality. In QG, previous approaches typically rely on architectures that require heavy processing and do not generally consider the generation of questions across the entirety of the input information source nor the diversity of the ways a question can be phrased. In QA, although work has been done for answering questions given some unstructured input (e.g. a piece of text), and work has also been done for doing so given some structured input (e.g. knowledge graph (KG) or tables), these methods are typically not transferable for use on another input modality. In this thesis, we are focused on QG foremost, with the aim of identifying ways to generate questions across both structured and unstructured information, namely text and KG inputs, in a manner that is controllable for increasing the diversity, comprehensiveness, and coverage of these questions. We also study QG and QA in concert with a model that can controllably generate both simple and complex questions from one modality and also answer them on another modality, an ability that has relevance for improving QA-based evaluation. Finally, we examine doing so for lower-resourced languages other than English, with the view that being able to do so helps enable similar QA-based evaluation for these languages.
Key words: question generation, question answering, natural language generation, knowledge graphs, multilingual
Thesis Committee:
Reviewers:
- Anne Vilnat, Université Paris-Saclay
- Frédéric Béchet, Aix-Marseilles Université
Examiners:
- Sophie Rosset, Université Paris-Saclay
- Catherine Faron, Université Côte d’Azur
Supervisors:
- Claire Gardent, Université de Lorraine
- Thiago Castro Ferreira, Federal University of Minas Gerais