ridm@nrct.go.th   ระบบคลังข้อมูลงานวิจัยไทย   รายการโปรดที่คุณเลือกไว้

Joint spectral and temporal normalization of features for robust recognition of noisy and reverberated speech

หน่วยงาน Nanyang Technological University, Singapore

รายละเอียด

ชื่อเรื่อง : Joint spectral and temporal normalization of features for robust recognition of noisy and reverberated speech
นักวิจัย : Xiao, Xiong , Chng, Eng Siong , Li, Haizhou
คำค้น : DRNTU::Engineering::Computer science and engineering
หน่วยงาน : Nanyang Technological University, Singapore
ผู้ร่วมงาน : -
ปีพิมพ์ : 2555
อ้างอิง : Xiao, X., Chng, E. S., & Li, H. (2012). Joint spectral and temporal normalization of features for robust recognition of noisy and reverberated speech. 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4325-4328. , http://hdl.handle.net/10220/13398 , http://dx.doi.org/10.1109/ICASSP.2012.6288876
ที่มา : -
ความเชี่ยวชาญ : -
ความสัมพันธ์ : -
ขอบเขตของเนื้อหา : -
บทคัดย่อ/คำอธิบาย :

In this paper, we propose a framework for joint normalization of spectral and temporal statistics of speech features for robust speech recognition. Current feature normalization approaches normalize the spectral and temporal aspects of feature statistics separately to overcome noise and reverberation. As a result, the interaction between the spectral normalization (e.g. mean and variance normalization, MVN) and temporal normalization (e.g. temporal structure normalization, TSN) is ignored. We propose a joint spectral and temporal normalization (JSTN) framework to simultaneously normalize these two aspects of feature statistics. In JSTN, feature trajectories are filtered by linear filters and the filters' coefficients are optimized by maximizing a likelihood-based objective function. Experimental results on Aurora-5 benchmark task show that JSTN consistently out-performs the cascade of MVN and TSN on test data corrupted by both additive noise and reverberation, which validates our proposal. Specifically, JSTN reduces average word error rate by 8-9% relatively over the cascade of MVN and TSN for both artificial and real noisy data.

บรรณานุกรม :
Xiao, Xiong , Chng, Eng Siong , Li, Haizhou . (2555). Joint spectral and temporal normalization of features for robust recognition of noisy and reverberated speech.
    กรุงเทพมหานคร : Nanyang Technological University, Singapore.
Xiao, Xiong , Chng, Eng Siong , Li, Haizhou . 2555. "Joint spectral and temporal normalization of features for robust recognition of noisy and reverberated speech".
    กรุงเทพมหานคร : Nanyang Technological University, Singapore.
Xiao, Xiong , Chng, Eng Siong , Li, Haizhou . "Joint spectral and temporal normalization of features for robust recognition of noisy and reverberated speech."
    กรุงเทพมหานคร : Nanyang Technological University, Singapore, 2555. Print.
Xiao, Xiong , Chng, Eng Siong , Li, Haizhou . Joint spectral and temporal normalization of features for robust recognition of noisy and reverberated speech. กรุงเทพมหานคร : Nanyang Technological University, Singapore; 2555.