ridm@nrct.go.th   ระบบคลังข้อมูลงานวิจัยไทย   รายการโปรดที่คุณเลือกไว้

Modelling Speech Dynamics with Trajectory-HMMs

หน่วยงาน Edinburgh Research Archive, United Kingdom

รายละเอียด

ชื่อเรื่อง : Modelling Speech Dynamics with Trajectory-HMMs
นักวิจัย : Zhang, Le
คำค้น : Informatics , Institute for Communicating and Collaborative Systems , Speech Technology Research
หน่วยงาน : Edinburgh Research Archive, United Kingdom
ผู้ร่วมงาน : Renals, Steve
ปีพิมพ์ : 2552
อ้างอิง : http://hdl.handle.net/1842/3213
ที่มา : -
ความเชี่ยวชาญ : -
ความสัมพันธ์ : -
ขอบเขตของเนื้อหา : -
บทคัดย่อ/คำอธิบาย :

Institute for Communicating and Collaborative Systems

The conditional independence assumption imposed by the hidden Markov models (HMMs) makes it difficult to model temporal correlation patterns in human speech. Traditionally, this limitation is circumvented by appending the first and second-order regression coefficients to the observation feature vectors. Although this leads to improved performance in recognition tasks, we argue that a straightforward use of dynamic features in HMMs will result in an inferior model, due to the incorrect handling of dynamic constraints. In this thesis I will show that an HMM can be transformed into a Trajectory-HMM capable of generating smoothed output mean trajectories, by performing a per-utterance normalisation. The resulting model can be trained by either maximisingmodel log-likelihood or minimisingmean generation errors on the training data. To combat the exponential growth of paths in searching, the idea of delayed path merging is proposed and a new time-synchronous decoding algorithm built on the concept of token-passing is designed for use in the recognition task. The Trajectory-HMM brings a new way of sharing knowledge between speech recognition and synthesis components, by tackling both problems in a coherent statistical framework. I evaluated the Trajectory-HMM on two different speech tasks using the speaker-dependent MOCHA-TIMIT database. First as a generative model to recover articulatory features from speech signal, where the Trajectory-HMM was used in a complementary way to the conventional HMM modelling techniques, within a joint Acoustic-Articulatory framework. Experiments indicate that the jointly trained acoustic-articulatory models are more accurate (having a lower Root Mean Square error) than the separately trained ones, and that Trajectory-HMM training results in greater accuracy compared with conventional Baum-Welch parameter updating. In addition, the Root Mean Square (RMS) training objective proves to be consistently better than the Maximum Likelihood objective. However, experiment of the phone recognition task shows that the MLE trained Trajectory-HMM, while retaining attractive properties of being a proper generative model, tends to favour over-smoothed trajectories among competing hypothesises, and does not perform better than a conventional HMM. We use this to build an argument that models giving a better fit on training data may suffer a reduction of discrimination by being too faithful to the training data. Finally, experiments on using triphone models show that increasing modelling detail is an effective way to leverage modelling performance with little added complexity in training.

บรรณานุกรม :
Zhang, Le . (2552). Modelling Speech Dynamics with Trajectory-HMMs.
    กรุงเทพมหานคร : Edinburgh Research Archive, United Kingdom .
Zhang, Le . 2552. "Modelling Speech Dynamics with Trajectory-HMMs".
    กรุงเทพมหานคร : Edinburgh Research Archive, United Kingdom .
Zhang, Le . "Modelling Speech Dynamics with Trajectory-HMMs."
    กรุงเทพมหานคร : Edinburgh Research Archive, United Kingdom , 2552. Print.
Zhang, Le . Modelling Speech Dynamics with Trajectory-HMMs. กรุงเทพมหานคร : Edinburgh Research Archive, United Kingdom ; 2552.