Cet évènement est passé.

PhD Defense: Prerak Srivastava (Multispeech)

Name: PhD Defense: Prerak Srivastava (Multispeech)
Start: 2023-11-13T10:00:00+01:00
End: 2023-11-13T12:00:00+01:00
Location: A008

13 novembre 2023 @ 10:00 am - 12:00 pm

Prerak Srivastava (Multispeech) will defend his thesis, entitled « Realism in Virtually Supervised Learning for Acoustic Room Characterization and Sound Source Localization« , on Monday, November 13th at 10 am in room A008.

Abstract :

Audio Augmented Reality aims to integrate virtual audio content into the user’s acoustic environment, creating an immersive audio experience. The commercial availability of augmented reality headsets such as Apple Vision Pro has further motivated interest in this research field. To synthesize binaural spatial audio that can recreate the perception of distance, direction, and acoustic cues, the knowledge of specific acoustic parameters of the user’s environment is a pre-requisite. Acoustic parameters can be divided into two categories: global parameters associated with the room’s geometry, reverberation time, and wall materials, and local parameters concerning the location of each sound source. With the help of room acoustic simulators, these parameters are used to simulate room impulse responses. These room impulse responses can then be convolved with dry speech signals to synthesize binaural spatial audio with a perception of realism. However, the estimation of these acoustic parameters is a challenge. Previous research has attempted to address this problem through cumbersome and time-consuming in-situ measurements, which are often impractical.

In this thesis, we tackle this challenge by leveraging supervised machine-learning techniques using speech recordings as input. Our primary focus is on cuboid rooms with static acoustic scenarios. In the initial part of our work, we develop a multi-task neural network for room parameter estimation. We then assess its robustness using real-world data. In the second part, we shift our focus towards virtually supervised learning. This approach involves training machine learning models exclusively on simulated data. The rationale behind this strategy is rooted in the limited availability of task-specific real datasets within this domain. To ensure genralization, the training dataset should closely resemble the scenarios encountered in the test datasets. In order to bridge the gap, we improve realism in the open-source room acoustics simulator Pyroomacoustics by implementing an extended image source method. Further, this improved room acoustics simulator is used to train neural networks for the tasks of room parameter estimation and sound source localization. We employ several real test datasets to assess the positive impact brought by training the systems using the improved simulator. Our experiments show that the generalization of the system is improved across both tasks when compared to the systems trained for the same task with less realistic training data. To the best of our knowledge, this is one of the first studies to explore the field of virtually supervised learning for the task of global and local room acoustic parameter estimation.

Jury

Reviewers:

Rainer Martin – Ruhr-Universitat Bochum, Germany
Eric Bavu – CNAM Paris, France

Examiners:

Marie-Odile Berger – INRIA Nancy, France
Simon Leglaive – Centrale Supelec, Rennes, France

Thesis Director and co-Director:

Emmanuel Vincent – INRIA Nancy, France
Antoine Deleforge – INRIA Nancy, France

Détails

Date : 13 novembre 2023
Heure :
10:00 am - 12:00 pm
Catégorie d’Évènement: Soutenance
Évènement Tags:Audio Augmented Reality, Machine learning, Multispeech, PhD defense

Lieu

A008

PhD Defense: Prerak Srivastava (Multispeech)

Abstract :

Jury

Détails

Lieu

À propos

Contact

L’actualité du Loria

Accès privé