Shakeel Ahmad Sheikh (Multispeech) will defend his thesis on Friday, February 24th at 10 am in room C005.
His presentation is entitled “Deep Learning for Stuttering Detection”.
Jury Members:
Reviewer: Corinne Fredouille, Professor, University of Avignon, LIA, France
Reviewer: Benjamin Lecouteux, Professor, University of Grenoble Alpes, LIG, France
Examiner: Armelle Brun, Professor, Université de Lorraine, LORIA, France
Invitee: Fabrice Hirsch, Professor, University of Paul-Valery Montpellier, Praxiling, France
Invitee: Md Sahidullah, Ex Research Scientist, Inria, France
Director of thesis : Slim Ouni, Associate Professor, University of Lorraine, LORIA, France
Abstract:
Stuttering is a speech disorder that is most frequently observed among speech impairments and results in the form of core behaviours. The tedious and time-consuming task
of detecting and analysing speech patterns of persons who stutter (PWS), with the goal of rectifying them is often handled manually by speech therapists, and is biased towards
their subjective beliefs. Moreover, the ASR systems also fail to recognize the stuttered speech, which makes it impractical for PWS to access virtual digital assistants such as Siri, Alexa, etc.
This thesis tries to develop audio based stuttering detection (SD) systems that successfully capture different variabilities from stuttering utterances such as speaking styles, age,
accents, etc., and learns robust stuttering representations with an aim to provide a fair, consistent, and unbiased assessment of stuttered speech.
While most of the existing SD systems use multiple binary classifiers for each stutter type, we present a unified multi-class StutterNet capable of detecting multiple stutter types.
Approaching the class-imbalance problem in stuttering domain, we investigated the impact of applying weighted loss function, and, also presented Multi-contextual (MC) Multi-branch
(MB) StutterNet to improve the detection performance of minority classes.
Exploiting the speaker information with an assumption that the stuttering models should be invariant to meta-data such as speaker information, we present, an adversarial
multi-task learning (MTL) SD method that learns robust stutter discrimintaive speaker-invariant representations.
Due to paucity of unlabelled data, the automated SD task is limited in its use of large deep models in capturing different variabilities, we introduced the first-ever SSL framework
to SD domain. The SSL framework first trains a feature extractor for a pre-text task using a large quantity of unlabelled non-stuttering audio data to capture these different variabilities,
and then applies the learned feature extractor to a downstream SD task using limited labelled stuttering audio data.