Weninger, Felix and Erdoğan, Hakan and Watanabe, Shinji and Vincent, Emmanuel and Le Roux, Jonathan and Hershey, John R. and Schuller, Bjoern (2015) Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR. In: 12th International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA), Liberec
Full text not available from this repository. (Request a copy)
Official URL: http://dx.doi.org/10.1007/978-3-319-22482-4_11
Abstract
We evaluate some recent developments in recurrent neural network (RNN) based speech enhancement in the light of noise-robust automatic speech recognition (ASR). The proposed framework is based on Long Short-Term Memory (LSTM) RNNs which are discriminatively trained according to an optimal speech reconstruction objective. We demonstrate that LSTM speech enhancement, even when used 'naively' as front-end processing, delivers competitive results on the CHiME-2 speech recognition task. Furthermore, simple, feature-level fusion based extensions to the framework are proposed to improve the integration with the ASR back-end. These yield a best result of 13.76% average word error rate, which is, to our knowledge, the best score to date.
Item Type: | Papers in Conference Proceedings |
---|---|
Subjects: | T Technology > TK Electrical engineering. Electronics Nuclear engineering |
Divisions: | Faculty of Engineering and Natural Sciences > Academic programs > Electronics Faculty of Engineering and Natural Sciences |
Depositing User: | Hakan Erdoğan |
Date Deposited: | 24 Dec 2015 16:06 |
Last Modified: | 26 Apr 2022 09:21 |
URI: | https://research.sabanciuniv.edu/id/eprint/28861 |