[Engineer] Deep-learning based speech enhancement with ad-hoc microphone arrays


Supervisors: Romain Serizel (Université de Lorraine, Loria), Nicolas Furnon (Université de Lorraine, Loria)
Starting date: September 2020
Duration: 18 months
Application deadline: June 30th 2020

Speech is everywhere in our daily life. It is one of the most intuitive means of communication and chances are high that during a regular day you will have many spoken interactions. However, most of the computer applications that are based on speech communication rely on the assumption that a “clean” version of the speech is available which is rarely true in real-life scenarios. One solution to this noise problem is to apply so-called speech enhancement techniques that aim at extracting the speech component from a noisy speech mixture [1]. In the context of fast deployment of mobile devices with two or more microphones, nowadays almost everyone has access to many microphones at all times. However, exploiting multiple microphones from several devices (that form a so-called heterogeneous microphone array) is far from trivial [2, 3].

Over the years, a large body of work has been devoted to multichannel speech enhancement algorithms: initially based on signal processing [4, 5] and more recently on deep learning [6]. The application of these algorithms to signals collected with an array composed of multiple devices requires some signal-level calibration and synchronization between devices, which is quite challenging. During this project, instead of considering each device as a part of a large array we will consider the signals from each device as a different view of the same acoustic scene. The goal of this project is to design a demonstrator for algorithms that work in lab condition [7] and eventually to convert these algorithms to solutions that would work in real-world scenarios (alleviating problems such as device calibration, real-time computation…).


  • MSc in computer science, machine learning, signal processing
  • Experience with programming language Python
  • Experience with deep learning toolkits is a plus

Please send your applications (cv, motivation letter and cover letter) to Romain Serizel (https://members.loria.fr/RSerizel/)


[1] Loizou, P. C. “Speech enhancement: theory and practice.” CRC Press, 2013

[2] Kako, T., Niwa, K., Kobayashi, K., and Ohmuro, H. “Wiener filter design by estimating sensitivities between distributed asynchronous microphones and sound sources.” In Proc of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) (2015), pp. 1–5.

[3] Doclo, S., Spriet, A., Wouters, J., and Moonen, M. “Frequency-domain criterion for the speech distortion weighted multichannel wiener filter for robust noise reduction.” Speech Communication 49, 7 (2007), 636–656.

[4] Serizel, R., Moonen, M., Van Dijk, B., and Wouters, J. “Low-rank Approximation Based Multichannel Wiener Filter Algorithms for Noise Reduction with Application in Cochlear Implants.” IEEE/ACM Transactions on Audio, Speech and Language Processing 22 (2014), 785–799.

[6] Nugraha, A. A., Liutkus, A. and Vincent, E. “Multichannel audio source separation with deep neural networks”, IEEE/ACM Transactions on Audio, Speech, and Language Processing 24, 9 (2016), 1652–1664.

[7] Furnon, N., Serizel, R., Illina, I., Essid, S. “DNN-Based Distributed Multichannel Mask Estimation for Speech Enhancement in Microphone Arrays”, In Proc of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2020).

No offers are available for now.

Logo d'Inria