[PhD thesis 2021 offer] Deep learning explainability: Application to Arab dialect identification in videos

Supervision: Kamel Smaïli (kamel.smaili@loria.fr), David Langlois (david.langlois@loria.fr)

Team: SMarT

Support: competition for a doctoral contract

Scientific Context

Deep learning algorithms showed their efficiency and relevance in different domains such as computer vision, automatic speech recognition, machine translation, biomedical engineering [3], etc. However, these models are monstrously complex; they can use tens or hundreds of millions of parameters [6,2] that make them hard to identify what subset of parameters is responsible for efficient or underperforming the model. The complexity of the architecture of deep learning model prevents getting a clear explanation about the part of the architecture that conducts to the best performance or the one that makes the results suboptimal. In this research, among other issues, we would like to know which subset of weights is likely to play the most important role in the final prediction?

Many methods concerning explainability or interpretability do exist such as LIME (Local Interpretable Model-agnostic Explanations) [7]. In this article, the authors proposed an algorithm that can explain predictions of any classifier or regressor reliably, by approximating it locally with an interpretable model. Another algorithm SHAP (Shapely Additive Explanations) [4] connects optimal credit allocation with local explanations using the classic Shapley values from game theory and their related extensions. The SHAP’s goal is to explain the prediction of instances by computing the addition of each feature to prediction.

Objective of the thesis

The Phd student will address this explainability problem and propose a new framework to interpret the used deep learning model.

The application in this proposal will be the Arab dialects identification in videos. In fact, with the SMarT experience of the Chist-Era project AMIS [8] about summarizing an original video into a target language, in this Phd research, we propose to identify a specific Arabic dialect from several others in a database of videos and to explain the results. In the Arab world, there is an official language consisting in modern standard Arabic that coexists with several Arabic dialects depending on the region. People use their dialects in their daily conversation, while the modern standard Arabic is used in the formal aspect of the communication. Dialects may differ strongly from one to another, even in the same country. It is crucial to identify the dialect in order to select the corresponding models for speech recognition. Or it is crucial to know when it is not possible to trusty decide and then use a generic model. For that, explainability and interpretability allow to understand the classification detection. Moreover, they allow to highlight the parts of data on which the decision stems, and therefore to better understand the acoustic and lexical features of each dialect. In the scope of neural network based decision, we would like to look at the layers and weights of the predictive model; the objective is to be able to “read” the parameters of the predictive model through the filter of the features human beings use to identify an Arab dialect.

Expected Competencies

This research work requires skills in automatic speech recognition, video processing, and natural language processing.


The PhD student will be under the supervision of Kamel Smaïli, Professor, and the co-supervision of David Langlois, Associate Professor. He/she will be fully integrated in the SMarT team of Loria, which is interested in language modeling, multilingual aspects of language, and has a long experience with the ‘machine learning’ approach.

In the team, we collected 100 hours of Arabic videos in AMIS project that could be used in this research. In each video we can have several speakers from different Arab regions. Moreover, SMarT built dialectal corpora [5,1] which will be useful for this research.


  • Kamel Smaïli : kamel.smaili@loria.fr
  • David Langlois : david.langlois@loria.fr


[1] Karima Abidi, Mohamed Amine Menacer, and Kamel Smaïli. Calyou: A comparable spoken algerian corpus harvested from youtube. In 18th Annual Conference of the International Communication Association (Interspeech), 2017.
[2] Anurag Arnab, Ondrej Miksik, and Philip H S Torr. On the robustness of semantic segmentation models to adversarial attacks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 888–897, 2018.
[3] Andreas Holzinger, Georg Langs, Helmut Denk, Kurt Zatloukal, and Heimo Müller. Causability and explainability of artificial intelligence in medicine. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 9(4):e1312, 2019.
[4] Scott Lundberg and Su-In Lee. A unified approach to interpreting model predictions. arXiv preprint arXiv:1705.07874, 2017.
[5] Karima Meftouh, Salima Harrat, Salma Jamoussi, Mourad Abbas, and Kamel Smaïli. Machine translation experiments on PADIC: A parallel arabic dialect corpus. In the 29th Pacific Asia conference on language, information and computation, 2015.
[6] Mohamed Menacer, Odile Mella, Dominique Fohr, Denis Jouvet, David Langlois, and Kamel Smaïli. An enhanced automatic speech recognition system for arabic. In the third Arabic Natural Language Processing Workshop-EACL 2017, 2017.
[7] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ”why should I trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016.
[8] Kamel Smaïli, Dominique Fohr, Carlos-Emiliano Gonzalez-Gallardo, Michal Grega, Lucjan Janowski, Denis Jouvet, Arian Kozbial, David Langlois, Mikolaj Leszczuk, Odile Mella, et al. Summarizing videos into a target language: Methodology, architectures and evaluation. Journal of Intelligent & Fuzzy Systems, 37(6):7415–7426, 2019.

Logo d'Inria