Speech enhancement based on perceptual loudness and statistical models of speech

Title: Speech enhancement based on perceptual loudness and statistical models of speech
Authors: Zhang, Wei
Date: 2009
Abstract: This dissertation is concerned with speech enhancement based on the statistical and loudness models. We will study the field of speech enhancement with the objective of improving the quality of speech signals in noisy environments. First, speech enhancement based on the Laplacian model for speech signais is reviewed. The performance is shown to be limited by the accuracy of the Laplacian parameter estimation in the noisy environment. A recursive version is proposed to estimate the Laplacian model parameters using the enhanced speech and then use these estimated parameters to re-enhance the original noisy speech again. This approach achieves better parameter estimation and hence further improvements of speech quality. Next, loudness models for speech are reviewed. Considering that it describes the human hearing system better than the spectrum, the fundamental approaches of spectral subtraction are extended to the loudness domain. We propose the loudness subtraction approach. The tests are done for subtraction with different a values in the loudness model. Simulations show that the quality of enhanced speech can be optimized by choosing the appropriate a for a given input SNR. Thus, an adaptive-a subtraction model is proposed. The simulations show it can further improve the performance of spectral subtraction. Then, the proposed loudness subtraction with fixed a is shown to provide better results overall than the classical spectral subtraction, even though noise residue and unpleasant artifacts are still high in the enhanced signal. Loudness over-subtraction is then proposed to further reduce these artifacts/noise. Extensive simulation studies are conducted showing clear improvement over other subtraction type approaches. Finally, we proposed a Maximum Likelihood-based (ML) speech enhancement algorithm in the loudness domain. It is an optimal speech enhancement algorithm based on the ML criteria in the loudness domain, given the loudness of the noisy speech and the noise estimate. The Laplacian model and the Gaussian model of speech are used separately for comparison. Both approaches shows significant improvement of quality. It is shown that the Laplacian model leads to better preservation of the speech and the Gaussian model leads to better noise reduction.
URL: http://hdl.handle.net/10393/29955
CollectionTh├Ęses, 1910 - 2010 // Theses, 1910 - 2010
NR61400.PDF3.85 MBAdobe PDFOpen