Adaptation of speaker-specific bases in non-negative matrix factorization for single channel speech-music separation
Grais, Emad Mounir and Erdoğan, Hakan (2011) Adaptation of speaker-specific bases in non-negative matrix factorization for single channel speech-music separation. In: 12th Annual Conference of the International Speech Communication Association (Interspeech 2011), Florence, Italy
This paper introduces a speaker adaptation algorithm for nonnegative matrix factorization (NMF) models. The proposed adaptation algorithm is a combination of Bayesian and subspace model adaptation. The adapted model is used to separate speech signal from a background music signal in a single record. Training speech data for multiple speakers is used with NMF to train a set of basis vectors as a general model for speech signals. The probabilistic interpretation of NMF is used to achieve Bayesian adaptation to adjust the general model with respect to the actual properties of the speech signals that is observed in the mixed signal. The Bayesian adapted model is adapted again by a linear transform, which changes the subspace that the Bayesian adapted model spans to better match the speech signal that is in the mixed signal. The experimental results show that combining Bayesian with linear transform adaptation improves the separation results.
Repository Staff Only: item control page