Using multiple visual tandem streams in audio-visual speech recognition

Topkaya, İbrahim Saygın and Erdoğan, Hakan (2011) Using multiple visual tandem streams in audio-visual speech recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2011), Prague, Czech Republic

[thumbnail of saygin_icassp1.pdf] PDF
saygin_icassp1.pdf

Download (1MB)

Abstract

The method which is called the "tandem approach" in speech recognition has been shown to increase performance by using classifier posterior probabilities as observations in a hidden Markov model. We study the effect of using visual tandem features in audio-visual speech recognition using a novel setup which uses multiple classifiers to obtain multiple visual tandem features. We adopt the approach of multi-stream hidden Markov models where visual tandem features from two different classifiers are considered as additional streams in the model. It is shown in our experiments that using multiple visual tandem features improve the recognition accuracy in various noise conditions. In addition, in order to handle asynchrony between audio and visual observations, we employ coupled hidden Markov models and obtain improved performance as compared to the synchronous model.
Item Type: Papers in Conference Proceedings
Uncontrolled Keywords: Audio-Visual Speech Recognition , Coupled Hidden Markov Models , Hidden Markov Models , Neural Networks , Support Vector Machines , Tandem Approach
Subjects: T Technology > TK Electrical engineering. Electronics Nuclear engineering
Divisions: Faculty of Engineering and Natural Sciences > Academic programs > Electronics
Faculty of Engineering and Natural Sciences
Depositing User: Hakan Erdoğan
Date Deposited: 25 Dec 2011 16:17
Last Modified: 26 Apr 2022 09:05
URI: https://research.sabanciuniv.edu/id/eprint/18469

Actions (login required)

View Item
View Item