Àá½Ã¸¸ ±â´Ù·Á ÁÖ¼¼¿ä. ·ÎµùÁßÀÔ´Ï´Ù.
KMID : 0917520040110020193
Journal of Speech Sciences
2004 Volume.11 No. 2 p.193 ~ p.210
Support Vector Machine Based Phoneme Segmentation for Lip Synch Application
Lee Kun-Young

Ko Han-Seok
Abstract
In this paper, we develop a real time lip-synch system that activates 2-D avatar¡¯s lip motion in synch with an incoming speech utterance. To realize the ¡¯real time¡¯ operation of the system, we contain the processing time by invoking merge and split procedures performing coarse-to-fine phoneme classification. At each stage of phoneme classification, we apply the support vector machine (SVM) to reduce the computational load while retraining the desired accuracy. The coarse-to-fine phoneme classification is accomplished via two stages of feature extraction: first, each speech frame is acoustically analyzed for 3 classes of lip opening using Mel Frequency Cepstral Coefficients (MFCC) as a feature; secondly, each frame is further refined in classification for detailed lip shape using formant information. We implemented the system with 2-D lip animation that shows the effectiveness of the proposed two-stage procedure in accomplishing a real-time lip-synch task. It was observed that the method of using phoneme merging and SVM achieved about twice faster speed in recognition than the method employing the Hidden Markov Model (HMM). A typical latency time per a single frame observed for our method was in the order of 18.22 milliseconds while an HMM method applied under identical conditions resulted about 30.67 milliseconds.
KEYWORD
SVM, lip-synch
FullTexts / Linksout information
Listed journal information