Repository logo

Speech enhancement in real-world environments using state-space based algorithms

Loading...
Thumbnail ImageThumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

University of Ottawa (Canada)

Abstract

The family of state space-based speech enhancement algorithms is taken as the central building block for this thesis; such algorithms include for example Kalman Filters running on autoregressive models of speech. The goal of this thesis is to render this family of algorithms "real-world" ready, that is, upgrade the existing solutions to make them more flexible and more robust to real-world situations where different types of nonstationary, non-Gaussian noise are to be expected. The chosen context is that of sensitive applications such as hearing aids for which naturalness, intelligibility, and noise reduction are equally important. Most of the state space-based speech enhancement algorithms are currently unable to operate robustly in nonstationary colored-noise environments, and those found in the literature usually assume AWGN conditions. Since in addition, they are in general more computationally demanding than other well-established algorithms, it is not surprising that they are often either overlooked or omitted in respected and recent speech enhancement publications, as the introduction in this thesis will underline. Nevertheless, for AWGN conditions they have still been reported to yield high quality and natural-sounding results, making them an appealing choice for the range of applications chosen. To achieve the goal of this thesis, it is first necessary to categorize, implement and test many state-space based algorithms for speech enhancement; as it turns out, most of the existing algorithms revolve around the same principles and ideas. Then, in this work several tools and extensions of state spaced-based algorithms are developed in order to tackle the required handling of real-world noise. In parallel, since one of the main goals is to preserve naturalness and intelligibility, a variety of configurations based on these proposed tools and extensions are concocted and thoroughly tested so as to be able to determine the best solution both in terms of output quality and complexity requirements. As an additional constraint, most of the tools devised are meant to be "compatible" with any state space-based algorithm, rendering the work in this thesis homogeneous and unifying. To deal with real-world noise, novel fullband and subband solutions that take into account information from existing noise estimation algorithms are proposed. In the fullband domain, new all-pole modelling techniques are devised, and in the subband domain several alternatives to handle the noise are proposed as well. Throughout the thesis, several important accessory contributions are also developed, such as a new category of particle filtering algorithms for nonlinear speech models, low-cost post-processing techniques for further noise reduction, binaural extensions of monaural state-space models, and various improvement ideas specifically targeted to state-space algorithms running on autoregressive speech models. As an example, a possible state-space based algorithm is proposed based on some of the conclusions of the several Chapters of this thesis, which is shown to outperform three well-established state-of-the-art algorithms in adverse conditions. Rather than merely justifying the work in this thesis by a final arm-wrestling contest, these latter results prove that state-space based algorithms deserve a place amongst highly regarded and viable algorithms for speech enhancement.

Description

Keywords

Citation

Source: Dissertation Abstracts International, Volume: 72-02, Section: B, page: 1076.

Related Materials

Alternate Version