Topkaya, İbrahim Saygın and Erdoğan, Hakan (2011) Using multiple visual tandem streams in audio-visual speech recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2011), Prague, Czech Republic
PDF
saygin_icassp1.pdf
Download (1MB)
saygin_icassp1.pdf
Download (1MB)
Official URL: http://dx.doi.org/10.1109/ICASSP.2011.5947476
Abstract
The method which is called the "tandem approach" in speech recognition has been shown to increase performance by using classifier posterior probabilities as observations in a hidden Markov model. We study the effect of using visual tandem features in audio-visual speech recognition using a novel setup which uses multiple classifiers to obtain multiple visual tandem features. We adopt the approach of multi-stream hidden Markov models where visual tandem features from two different classifiers are considered as additional streams in the model. It is shown in our experiments that using multiple visual tandem features improve the recognition accuracy in various noise conditions. In addition, in order to handle asynchrony between audio and visual observations, we employ coupled hidden Markov models and obtain improved performance as compared to the synchronous model.
Item Type: | Papers in Conference Proceedings |
---|---|
Uncontrolled Keywords: | Audio-Visual Speech Recognition , Coupled Hidden Markov Models , Hidden Markov Models , Neural Networks , Support Vector Machines , Tandem Approach |
Subjects: | T Technology > TK Electrical engineering. Electronics Nuclear engineering |
Divisions: | Faculty of Engineering and Natural Sciences > Academic programs > Electronics Faculty of Engineering and Natural Sciences |
Depositing User: | Hakan Erdoğan |
Date Deposited: | 25 Dec 2011 16:17 |
Last Modified: | 26 Apr 2022 09:05 |
URI: | https://research.sabanciuniv.edu/id/eprint/18469 |