Personal tools
Skip to content. | Skip to navigation
Miscellaneous utilities from the Edinburgh Speech Tools. Unless you have a specific need for one of these programs, you probably don't need to install this.
US English male speaker ("AWB") for Festival. AWB is a native Scottish English speaker, but the voice uses the US English front end. This is a HMM-based Speech Synthesis System (HTS) voice from the Nagoya Institute of Technology, trained using the CMU ARCTIC database. This voice is based on 1138 utterances spoken by a Scottish English male speaker. The speaker is very experienced in building synthetic voices and matched prompted US English, though his vowels are very different from US English vowels. Scottish English speakers will probably find synthesizers based on this voice strange. Unlike the other CMU_ARCTIC databases this was recorded in 16 bit 16KHz mono without EGG, on a Dell Laptop in a quiet office. The database was automatically labelled using CMU Sphinx using the FestVox labelling scripts. No hand correction has been made.
US English male speaker ("BDL") for Festival. This is a HMM-based Speech Synthesis System (HTS) voice from the Nagoya Institute of Technology, trained using the CMU ARCTIC database. This voice is based on 1132 utterances spoken by a US English male speaker. The speaker is experienced in building synthetic voices. This was recorded at 16bit 32KHz, in a sound proof room, in stereo, one channel was the waveform, the other EGG. The database was automatically labelled using CMU Sphinx using the FestVox labelling scripts. No hand correction has been made.
US English female speaker ("CLB") for Festival. This is a HMM-based Speech Synthesis System (HTS) voice from the Nagoya Institute of Technology, trained using the CMU ARCTIC database. This voice is based on 1132 utterances spoken by a US English female speaker. The speaker is experienced in building synthetic voices. This was recorded at 16bit 32KHz, in a sound proof room, in stereo, one channel was the waveform, the other EGG. The database was automatically labelled using CMU Sphinx using the FestVox labelling scripts. No hand correction has been made.
US English male speaker ("JMK") voice for Festival. JMK is a native Canadian English speaker, but the voice uses the US English front end. This is a HMM-based Speech Synthesis System (HTS) voice from the Nagoya Institute of Technology, trained using the CMU ARCTIC database. This voice is based on 1138 utterances spoken by a US English male speaker. The speaker is experienced in building synthetic voices. This was recorded at 16bit 32KHz, in a sound proof room, in stereo, one channel was the waveform, the other EGG. The database was automatically labelled using CMU Sphinx using the FestVox labelling scripts. No hand correction has been made.
American English male speaker ("Kevin") for Festival. This voice provides an American English male voice using a residual excited LPC diphone synthesis method. It uses the CMU Lexicon pronunciations. Prosodic phrasing is provided by a statistically trained model using part of speech and local distribution of breaks. Intonation is provided by a CART tree predicting ToBI accents and an F0 contour generated from a model trained from natural speech. The duration model is also trained from data using a CART tree.
American English male speaker ("Kurt") for Festival. This voice provides an American English male voice using a residual excited LPC diphone synthesis method. It uses the CMU Lexicon for pronunciations. Prosodic phrasing is provided by a statistically trained model using part of speech and local distribution of breaks. Intonation is provided by a CART tree predicting ToBI accents and an F0 contour generated from a model trained from natural speech. The duration model is also trained from data using a CART tree.
US English male speaker ("RMS") voice for Festival. This is a HMM-based Speech Synthesis System (HTS) voice from the Nagoya Institute of Technology, trained using the CMU ARCTIC database. This voice is based on 1132 utterances spoken by a US English male speaker. The speaker is experienced in building synthetic voices. This was recorded at 16bit 32KHz, in a sound proof room, in stereo, one channel was the waveform, the other EGG. The database was automatically labelled using EHMM an HMM labeler that is included in the FestVox distribution. No hand correction has been made.
US English female speaker ("SLT") voice for Festival. This is a HMM-based Speech Synthesis System (HTS) voice from the Nagoya Institute of Technology, trained using the CMU ARCTIC database. This voice is based on 1132 utterances spoken by a US English female speaker. The speaker is experienced in building synthetic voices. This was recorded at 16bit 32KHz, in a sound proof room, in stereo, one channel was the waveform, the other EGG. The database was automatically labelled using CMU Sphinx using the FestVox labelling scripts. No hand correction has been made.
FFmpeg is a complete and free Internet live audio and video broadcasting solution for Linux/Unix. It also includes a digital VCR. It can encode in real time in many formats including MPEG1 audio and video, MPEG4, h263, ac3, asf, avi, real, mjpeg, and flash.