Phd position: Identifying disfluency in speakers with stuttering, and its rehabilitation, using DNN

Phd position: Identifying disfluency in speakers with stuttering, and its rehabilitation, using DNN.

Advisor : Slim Ouni (

Location : LORIA (Multispeech Team), Nancy

Starting date : september 2019


This PhD proposal is part of the ANR project BENEPHIDIRE (The Stuttering: Neurology, Phonetics, Computer Science for its Diagnosis and Rehabilitation). The objective of this proposal is to provide assistance for the diagnosis and therapy of stuttering by speech therapists. Indeed, patients, whether children or adults, often find it difficult to find a therapist or to have a sufficient amount of therapy sessions. This is why the goal of this project is to develop disfluency identification tools that can be used in the future by patients under the supervision of a therapist.


As part of this PhD project, the automatic identification of typical stuttering disfluency will be conducted with the aim of targeting acoustic and visual cues for their automatic detection. The work in this framework will be based on the extraction of visual cues based on algorithms for detecting spatial landmarks (detection of the contours of the eyes, nose, mouth, eyebrows, chin, etc.). Visual cues and acoustic characteristics will be combined to train an identification system using a deep learning techniques (neural network based techniques) that takes into account the temporal evolution of articulatory gestures and the acoustic signal. These identification tools are based on the analysis of dynamic articulatory data that will be acquired within the framework of the ANR project BENEPHIDIRE on speakers with stuttering and normo-fluent people. This articulatory data will be analyzed and exploited to develop automatic identification methods of sufficiently robust cues to be able to assert and characterize typical disfluency of stuttering.

In parallel, we will rely on the results obtained by the other ANR partners (phoneticians, speech therapists and neurologists) to better integrate the acoustic and articulatory characteristics of the typical disorders of stuttering. Techniques of deep learning by neural networks that take into account the temporal evolution of articulatory gestures and the acoustic signal will be considered. The automatic learning will also rely on the results of the modeling of the articulator patterns specific to disfluency, for an automatic recognition: the objective will be to study the behavior of the mandible and the lips during the phases of severe disfluency in order to observe what characterizes them the most. This training will be based on dynamic articulatory data that will be acquired by two techniques: Electromagnetography and MRI. Finally, the algorithms developed in this PhD project will be evaluated from a point of view of accuracy, robustness and efficiency.


• Didirkova I., Le Maguer S., Gbedahou D., Hirsch F. (2017) What happens during stuttering-like disfluencies ? An EMA study. Proceedings of the 11th International Seminar on Speech Production, octobre 2017, Tianjin (Chine), communication orale, 182-183
• Drayna, D., & Kang, C. (2011) Genetic Approaches to Understanding the Causes of Stuttering. Journal of Neurodevelopmental Disorders, 3(4):374-380.
• Dumont A. & Julien M. (2004) Le bégaiement : Comprendre et faire accepter le bégaiement, traiter ce trouble et le prendre en charge au quotidien. Paris : Ed. Solar, 276 p.
• Dutrey C. (2014) Analyse et détection automatique de disfluences dans la parole spontanée conversationnelle. Thèse de Doctorat soutenue à l’Université Paris Sud – Paris XI, 190 p.
• Monfrais-Pfauwadel, M.-C. (2014). Bégaiement, bégaiements: [un manuel clinique et thérapeutique]. Paris: De Boeck-Solal.
• Gizatdinova, Y., & Surakka, V. (2006). Feature-based detection of facial landmarks from neutral and expressive facial images. IEEE transactions on pattern analysis and machine intelligence, 28(1), 135-139.
• Zhang, Z., Luo, P., Loy, C. C., & Tang, X. (2014, September). Facial landmark detection by deep multi-task learning. In European conference on computer vision (pp. 94-108). Springer, Cham.
• Ramanan, D., & Zhu, X. (2012, June). Face detection, pose estimation, and landmark localization in the wild. In 2012 IEEE conference on computer vision and pattern recognition (pp. 2879-2886). IEEE.
• Ranjan, R., Patel, V. M., & Chellappa, R. (2019). Hyperface: A deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(1), 121-135.

Logo du CNRS

Logo d'Inria

Logo Université de Lorraine