Personal tools
Skip to content. | Skip to navigation
US English male speaker ("BDL") for Festival. This is a HMM-based Speech Synthesis System (HTS) voice from the Nagoya Institute of Technology, trained using the CMU ARCTIC database. This voice is based on 1132 utterances spoken by a US English male speaker. The speaker is experienced in building synthetic voices. This was recorded at 16bit 32KHz, in a sound proof room, in stereo, one channel was the waveform, the other EGG. The database was automatically labelled using CMU Sphinx using the FestVox labelling scripts. No hand correction has been made.
US English female speaker ("CLB") for Festival. This is a HMM-based Speech Synthesis System (HTS) voice from the Nagoya Institute of Technology, trained using the CMU ARCTIC database. This voice is based on 1132 utterances spoken by a US English female speaker. The speaker is experienced in building synthetic voices. This was recorded at 16bit 32KHz, in a sound proof room, in stereo, one channel was the waveform, the other EGG. The database was automatically labelled using CMU Sphinx using the FestVox labelling scripts. No hand correction has been made.
US English male speaker ("JMK") voice for Festival. JMK is a native Canadian English speaker, but the voice uses the US English front end. This is a HMM-based Speech Synthesis System (HTS) voice from the Nagoya Institute of Technology, trained using the CMU ARCTIC database. This voice is based on 1138 utterances spoken by a US English male speaker. The speaker is experienced in building synthetic voices. This was recorded at 16bit 32KHz, in a sound proof room, in stereo, one channel was the waveform, the other EGG. The database was automatically labelled using CMU Sphinx using the FestVox labelling scripts. No hand correction has been made.
American English male speaker ("Kevin") for Festival. This voice provides an American English male voice using a residual excited LPC diphone synthesis method. It uses the CMU Lexicon pronunciations. Prosodic phrasing is provided by a statistically trained model using part of speech and local distribution of breaks. Intonation is provided by a CART tree predicting ToBI accents and an F0 contour generated from a model trained from natural speech. The duration model is also trained from data using a CART tree.