KMID : 0917520040110020193
|
|
Journal of Speech Sciences 2004 Volume.11 No. 2 p.193 ~ p.210
|
|
Support Vector Machine Based Phoneme Segmentation for Lip Synch Application
|
|
Lee Kun-Young
Ko Han-Seok
|
|
Abstract
|
|
|
In this paper, we develop a real time lip-synch system that activates 2-D avatar¡¯s lip motion in synch with an incoming speech utterance. To realize the ¡¯real time¡¯ operation of the system, we contain the processing time by invoking merge and split procedures performing coarse-to-fine phoneme classification. At each stage of phoneme classification, we apply the support vector machine (SVM) to reduce the computational load while retraining the desired accuracy. The coarse-to-fine phoneme classification is accomplished via two stages of feature extraction: first, each speech frame is acoustically analyzed for 3 classes of lip opening using Mel Frequency Cepstral Coefficients (MFCC) as a feature; secondly, each frame is further refined in classification for detailed lip shape using formant information. We implemented the system with 2-D lip animation that shows the effectiveness of the proposed two-stage procedure in accomplishing a real-time lip-synch task. It was observed that the method of using phoneme merging and SVM achieved about twice faster speed in recognition than the method employing the Hidden Markov Model (HMM). A typical latency time per a single frame observed for our method was in the order of 18.22 milliseconds while an HMM method applied under identical conditions resulted about 30.67 milliseconds.
|
|
KEYWORD
|
|
SVM, lip-synch
|
|
FullTexts / Linksout information
|
|
|
|
Listed journal information
|
|
|