MALOTEC seminar by Joël Legrand
Date: Wednesday, 14 of February 2018
TreeLSTM and cross-corpus training for extracting pharmacogenomic relationships from text
A key aspect of machine learning algorithms for relationship extraction in textual data is the availability of sufficiently large training data. Manually annotated corpora are valuable resources for this task, but the time and expertise required for their development explains why only few annotated corpora are freely available. For tasks related to precision medicine, most of them are rather small (i.e., hundreds of sentences) or they only focus on specialized relationships (e.g., drug-drug interactions) that rarely fit what one wants to extract. In this talk, I will present two lines of research that I pursued to overcome the lack of annotated data for the task of pharmacogenomic relation extraction. I will first present a TreeLSTM-based transfer learning method that allows to achieve high performance for the extraction of biomedical relationships from text, for which initial resources are scarce. Then, I will present the PGxCorpus, a manually annotated training corpus designed for the supervised extraction of pharmacogenomic relationships from text.