BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//LORIA - ECPv6.15.18//NONSGML v1.0//EN
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-WR-CALNAME:LORIA
X-ORIGINAL-URL:https://www.loria.fr
X-WR-CALDESC:Évènements pour LORIA
REFRESH-INTERVAL;VALUE=DURATION:PT1H
X-Robots-Tag:noindex
X-PUBLISHED-TTL:PT1H
BEGIN:VTIMEZONE
TZID:Europe/Paris
BEGIN:DAYLIGHT
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
DTSTART:20210328T010000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
DTSTART:20211031T010000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
DTSTART:20220327T010000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
DTSTART:20221030T010000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
DTSTART:20230326T010000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
DTSTART:20231029T010000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTART;TZID=Europe/Paris:20220506T100000
DTEND;TZID=Europe/Paris:20220506T120000
DTSTAMP:20260405T152826
CREATED:20220506T073833Z
LAST-MODIFIED:20220506T073833Z
UID:16128-1651831200-1651838400@www.loria.fr
SUMMARY:PhD defense: Ashwin Geet D'Sa (Multispeech)
DESCRIPTION:Ashwin Geet D’Sa (Multispeech) will defend his thesis on May 6th\, 2022 at 10 am\, in Room A008. His presentation is entitled: \n“Expanding the training data for neural network based hate speech classification”\nAbstract:\nThe phenomenal increase in internet usage\, catering to the dissemination of knowledge and expression\, has also led to an increase in online hate speech. Online hate speech is anti-social communicative behavior\, which leads to the threat and violence towards an individual or a group. Deep learning-based models have become the state-of-the-art solution in classifying hate speech. However\, the performance of these models depends on the amount of labeled training data. In this thesis\, we explore various solutions to expand the training data to train a reliable model for hate speech classification. \n\nAs the first approach\, we use a semi-supervised learning to combine the huge amount of unlabeled data\, easily available on the internet\, with a limited amount of labeled data to train the classifier. For this\, we use the label-propagation algorithm. The performance of this method depends on the representation space of labeled and unlabeled data. We show that pre-trained sentence embeddings are label agnostic and yield poor results. We propose a simple and effective neural-network-based approach for transforming these pre-trained representations to task-aware ones. This method achieves significant performance improvements in low-resource scenarios.\n\nIn our second approach\, we explore data augmentation\, a solution to obtain synthetic samples using the original training data. Our data augmentation technique is based on a single conditional GPT-2 language model fine-tuned on the original training data. Our approach uses a fine-tuned BERT model to select high-quality synthetic data. We study the effect of the quantity of augmented data and show that using a few thousand synthetic samples yields significant performance improvements in hate speech classification. Our qualitative evaluation shows the effectiveness of using BERT for filtering the generated samples.\n\nFor our final approach\, we use multi-task learning as a method to combine several available hate speech datasets and jointly train a single classification model. Our approach leverages the advantages of a pre-trained language model (BERT) as shared layers of our multi-task architecture. We treat one hate speech corpus as one task. Thus\, adopting the paradigm of multi-task learning to multi-corpus learning. We show that training a multi-task model with several corpora achieves similar performance as training several corpus-specific models. Nevertheless\, fine-tuning the multi-task model for a specific corpus allows improving the results. We demonstrate the effectiveness of our multi-task learning approach for domain adaptation on hate speech corpora.\n\nWe explore the three proposed approaches in low-resource scenarios and show that they achieve significant performance improvements in very low-resource setups.\n\n\n\n\n\nJury members:\nPhD Advisors:\n\nIrina ILLINA\, Maître de conférence\, Université de Lorraine\nDominique FOHR\, Chargé de Recherche\, CNRS\n\nReviewers:\nRichard DUFOUR\, Professeur\, Laboratoire des Sciences du Numérique de Nantes (LS2N)\nPavel KRÁL\, Professeur associé\, University of West Bohemia\n\nExaminers:\nGeorges LINARÈS\, Professeur\, Université d’Avignon\nFrançois PORTET\, Professeur\, Laboratoire d’Informatique de Grenoble\nJosiane MOTHE\, Professeur\, Université de Toulouse\nChristophe CERISARA\, Chargé de Recherche\, CNRS\n\nInvited members:\nDietrich KLAKOW\, Professeur\, Universität des Saarlandes\nAngeliki MONNIER\, Professeur\, Université de Lorraine
URL:https://www.loria.fr/event/phd-defense-ashwin-geet-dsa-multispeech/
LOCATION:A008
CATEGORIES:Soutenance
END:VEVENT
END:VCALENDAR