Chen, Zhuo and Watanabe, Shinji and Erdoğan, Hakan and Hershey, John R. (2015) Speech enhancement and recognition using multi-task learning of long short-term memory recurrent neural networks. In: InterSpeech 2015, Dresden, Germany
Full text not available from this repository. (Request a copy)Abstract
Long Short-Term Memory (LSTM) recurrent neural network has
proven effective in modeling speech and has achieved outstanding
performance in both speech enhancement (SE) and automatic
speech recognition (ASR). To further improve the performance of
noise-robust speech recognition, a combination of speech enhancement
and recognition was shown to be promising in earlier work.
This paper aims to explore options for consistent integration of SE
and ASR using LSTM networks. Since SE and ASR have different
objective criteria, it is not clear what kind of integration would
finally lead to the best word error rate for noise-robust ASR tasks.
In this work, several integration architectures are proposed and
tested, including: (1) a pipeline architecture of LSTM-based SE and
ASR with sequence training, (2) an alternating estimation architecture,
and (3) a multi-task hybrid LSTM network architecture.
The proposed models were evaluated on the 2nd CHiME speech
separation and recognition challenge task, and show significant
improvements relative to prior results.
Item Type: | Papers in Conference Proceedings |
---|---|
Subjects: | T Technology > TK Electrical engineering. Electronics Nuclear engineering |
Divisions: | Faculty of Engineering and Natural Sciences > Academic programs > Electronics Faculty of Engineering and Natural Sciences |
Depositing User: | Hakan Erdoğan |
Date Deposited: | 24 Dec 2015 15:49 |
Last Modified: | 26 Apr 2022 09:21 |
URI: | https://research.sabanciuniv.edu/id/eprint/28866 |