Personal tools
You are here: Home Learn Publications Papers


Local copies of published papers

File Real-time Speaker Adapted Speech to Speech Translation System in Mobile Environment
Yong Guan, Lin Zheng,Jilei Tian. in ICSP2010 proceedings. A real-time speech to speech translation (S2ST) system in mobile environment is designed and implemented as a client-server architecture. Particularly, we apply cross lingual speaker adaptation to adapt synthesized speech to enrolling speaker to ensure personalization. This realtime S2ST system provides streaming way, multi-threading and speaker adapted speech to speech translation for mobile user. It makes it available that mobile users get personalized real-time S2ST service through 3G/WIFI network in mobile environment.
Yong Guan, Jilei Tian, Yi-Jian Wu, Junichi Yamagishi and Jani Nurminen. In SSW7. Most studies on Mandarin HTS (HMM-based text-to-speech system) have taken the initial/final as the basic acoustic units. It is, however, challenging to develop a multilingual HTS in a uniformed and consistent way since most of other languages use the phoneme as the basic phonetic unit. It becomes hard to apply cross-lingual adaptation which need map phonemes from each other, particularly in the case of unified ASR and HTS system due to the phoneme nature of most of the ASR systems. In this paper, we propose a phoneme based Mandarin HTS system, which has been systematically evaluated by comparing it with the initial/final system. The experimental results show that the use of phoneme as the acoustic unit for Mandarin HTS is a promising unified approach, thus enabling better and more uniform development with other languages while significantly reducing the number of acoustic units. The flat-start training scheme is also evaluated to show that the phoneme segmentation problem is solved without any performance degradation for phoneme based Mandarin HTS system. This performs an automatic approach without dependency with particular ASR system.
File Evaluation of Flat Start Labeling for Phoneme based Mandarin HTS System
Yong Guan, Jilei Tian in OCOCOSDA2009. we proposed a phoneme based Mandarin HTS speech synthesis system trained with flat start scheme. Conventionally the full context labels with phonetic time segmentation are required for HTS training. The segmentation is generated by ASR force alignment using the pre-trained ASR models. Thus it brings the dependency on ASR while developing HTS system and causes different label in HTS between training and testing. Flat start labeling, which uses uniformed segmentation in label, was proposed and evaluated by comparing with segmentation using ASR mode as a reference. The subject listening test results showed that flat start scheme performs equally well as the reference system using ASR force alignment when realignment labeling using trained HTS model is iteratively applied. This result is very promising for efficiently developing and porting HTS system to a new language.
Many speech and language related techniques employ models that are trained using text data. In this paper, we introduce a novel method for selecting optimized training sets from text databases. The coverage of the subset selected for training is optimized using hierarchical clustering and the generalized Levenshtein distance. The validity of the proposed subset optimization technique is verified in a data-driven syllabification task. The results clearly indicate that the proposed approach meaningfully optimizes the training set, which in turn improves the quality of the trained model. Compared to the existing state-of-the-art data selection technique, the proposed hierarchical clustering approach improves the compactness of data clusters, decreases the computational complexity and makes data set selection scalable. The presented idea can be used in a wide variety of language processing applications that require training with text data.
Document Actions