Incorporating prior information in nonnegative matrix factorization for audio source separation

Grais Girgis, Emad Mounir (2013) Incorporating prior information in nonnegative matrix factorization for audio source separation. [Thesis]

[thumbnail of EmadMounirGraisGirgis_10003028.pdf] PDF

Download (2MB)


In this work, we propose solutions to the problem of audio source separation from a single recording. The audio source signals can be speech, music or any other audio signals. We assume training data for the individual source signals that are present in the mixed signal are available. The training data are used to build a representative model for each source. In most cases, these models are sets of basis vectors in magnitude or power spectral domain. The proposed algorithms basically depend on decomposing the spectrogram of the mixed signal with the trained basis models for all observed sources in the mixed signal. Nonnegative matrix factorization (NMF) is used to train the basis models for the source signals. NMF is then used to decompose the mixed signal spectrogram as a weighted linear combination of the trained basis vectors for each observed source in the mixed signal. After decomposing the mixed signal, spectral masks are built and used to reconstruct the source signals. In this thesis, we improve the performance of NMF for source separation by incorporating more constraints and prior information related to the source signals to the NMF decomposition results. The NMF decomposition weights are encouraged to satisfy some prior information that is related to the nature of the source signals. The priors are modeled using Gaussian mixture models or hidden Markov models. These priors basically represent valid weight combination sequences that the basis vectors can receive for a certain type of source signal. The prior models are incorporated with the NMF cost function using either log-likelihood or minimum mean squared error estimation (MMSE). We also incorporate the prior information as a post processing. We incorporate the smoothness prior on the NMF solutions by using post smoothing processing. We also introduce post enhancement using MMSE estimation to obtain better separation for the source signals. In this thesis, we also improve the NMF training for the basis models. In cases when enough training data are not available, we introduce two di erent adaptation methods for the trained basis to better t the sources in the mixed signal. We also improve the training procedures for the sources by learning more discriminative dictionaries for the source signals. In addition, to consider a larger context in the models, we concatenate neighboring spectra together and train basis sets from them instead of a single frame which makes it possible to directly model the relation between consequent spectral frames. Experimental results show that the proposed approaches improve the performance of using NMF in source separation applications.
Item Type: Thesis
Uncontrolled Keywords: Single channel source separation. -- Nonnegative matrix factorization. -- Hidden Markov model. -- Gaussian mixture model. -- Minimum mean squared error estimation. -- Model adaptation. -- Orthogonality constraints. -- Discriminative training. -- Dictionary learning. -- Wiener filter. -- Spectral masks. -- Digital signal processing. -- Speech signals. -- Voice signals. -- Sayısal İşaret İşleme. -- Konuşma işaretleri -- Ses işareti. -- Tek Kanal Kaynak ayrımı. -- Negatif Olmayan Matris Ayrıştırma (NOMA). -- Saklı Markov modeli. -- Gauss karışım modeli. -- Minimum ortalama karesel hata kestirimi (MOKH). -- Model uyarlama. -- Dikgenlik kısıtları. -- Ayırt edici eğitim. -- Sözlük öğrenme. -- Wiener filtresi. -- Spektral maskeler.
Subjects: T Technology > TK Electrical engineering. Electronics Nuclear engineering > TK7800-8360 Electronics
Divisions: Faculty of Engineering and Natural Sciences > Academic programs > Electronics
Faculty of Engineering and Natural Sciences
Depositing User: IC-Cataloging
Date Deposited: 27 Mar 2017 11:29
Last Modified: 26 Apr 2022 10:08

Actions (login required)

View Item
View Item