Improved features and dynamic stream weight adaption for robust Audio-Visual Speech Recognition framework

Saudi, AS; Khalil, Mahmoud; Abbas, HM

Improved features and dynamic stream weight adaption for robust Audio-Visual Speech Recognition framework

Saudi, AS; Khalil, Mahmoud; Abbas, HM;

Abstract

This paper investigates the enhancement of a speech recognition system that uses both audio and visual speech information in noisy environments by presenting contributions in two main system stages: front-end and back-end. The double use of Gabor filters is proposed as a feature extractor in the front-end stage of both modules to capture robust spectro-temporal features. The performance obtained from the resulted Gabor Audio Features (GAFs) and Gabor Visual Features (GVFs) is compared to the performance of other conventional features such as MFCC, PLP, RASTA-PLP audio features and DCT2 visual features. The experimental results show that a system utilizing GAFs and GVFs has a better performance, especially in a low-SNR scenario. To improve the back-end stage, a complete framework of synchronous Multi-Stream Hidden Markov Model (MSHMM) is used to solve the dynamic stream weight estimation problem for Audio-Visual Speech Recognition (AVSR). To demonstrate the usefulness of the dynamic weighting in the overall performance of AVSR system, we empirically show the preference of Late Integration (LI) compared to Early Integration (EI) especially when one of the modalities is corrupted. Results confirm the superior recognition accuracy for all SNR levels the superiority of the AVSR system with the Late Integration.

Other data

Title	Improved features and dynamic stream weight adaption for robust Audio-Visual Speech Recognition framework
Authors	Saudi, AS; Khalil, Mahmoud ; Abbas, HM
Keywords	Audio-Visual Speech Recognition;Synchronous Multi-Stream Hidden Markov Model;Visual feature extraction;Audio-visual integration;Reliability measures;Audio-visual databases;FILTER BANK FEATURES;AUTOMATIC RECOGNITION;DISCRIMINANT-ANALYSIS;EXTRACTION;FUSION
Issue Date	Jun-2019
Publisher	ACADEMIC PRESS INC ELSEVIER SCIENCE
Journal	DIGITAL SIGNAL PROCESSING
Volume	89
Start page	17
End page	29
ISSN	1051-2004
DOI	10.1016/j.dsp.2019.02.016
Scopus ID	2-s2.0-85063463245
Web of science ID	WOS:000468711300003

Attached Files

File	Description	Size	Format	Existing users please Login
2019 Improved features and dynamic stream weight adaption for robust Audio-Visual Speech Recognition framework.pdf		2.99 MB	Adobe PDF	Request a copy

Recommend this item

Similar Items from Core Recommender Database

Google Scholar^TM

Check

Citations 7 in scopus

Improved features and dynamic stream weight adaption for robust Audio-Visual Speech Recognition framework

Saudi, AS; Khalil, Mahmoud; Abbas, HM;

Abstract

Other data

Attached Files

Google ScholarTM

Google Scholar^TM