Periodic/Aperiodic source decomposition of speech signals

This page describes shortly the method developed with Gilles Chardon, associate professor at CentraleSupelec, to separate the periodic contributions embedded in the speech signal from the aperiodic contributions. This is a summary of the paper presented in ICA 2016, available here [1].

Method

The decomposition has been specifically designed for the case of fricatives. In the case of voiced fricatives, the acoustic signal may be seen as the sum of the contributions of the voiced source, considered as periodic, and the contributions of the frication noise source, hence
$$ s(t) = s_p(t)+s_n(t),$$
where $s(t)$ is the speech signal, and $s_p(t)$ and $s_n(t)$ are the contributions of the periodic components and the aperiodic components, respectively.

The fundamental frequency of the periodic components is first estimated within a short windowed segment of the speech signal from a frequency based technique using a whitened cumulative periodogram. A simple partial detector is used to avoid octave errors.

The periodic component $s_p$ is estimated by projecting the signal onto the subspace spanned by the detected harmonics.

The separation is used to compute a newly introduced voicing index, named the voicing quotient ($VQ$), quantifying the proportion of energy of the periodic component in the energy of the speech signal;
$$ VQ(\%) = 100\times\frac{\vert\vert s_p \vert\vert_2^2}{\vert\vert s_p+s_n\vert\vert_2^2}$$

Validation

Example of separation performed on a simulated voiced alveolar fricative /z/.

Numerical validation on simulated fricative signals for each of the three places of articulations of French fricatives. The validation compares the estimated voicing quotient ($VQ$) with the theoretical $VQ$.

A few examples

Voiced fricatives in aFa context

Voiceless fricatives in aFa context

You will find the Matlab code here for the separation

In this archive, you will find a file paptest.m, which performs the separation for 2 test signals, contained in the audio folder. Choose voiced = 0 for a /aSa/ (asha) utterance, and voiced = 1 for a /aZa/ (azha) utterance. Just use the function hn_sep.m with your own signals.
Please, do not hesitate to report any suggestion, dysfunctionnement, or weird results, to benjamin.elie(at)loria.fr.

[1] Elie B., and Chardon G. "Robust tonal and noise separation in presence of colored noise, and application to voiced fricatives". Intern. Congress on Acoustics (ICA), Buenos Aires 2016. [.pdf] [.bib]

Last modification: December 22, 2016