Baidu's Artificial Intelligence Lab Unveils Synthetic Speech System
From: MIT Technology Review - 03/08/2017

The Chinese search giant's Deep Voice system learns to talk in just a few
hours with little or no human interference.

Baidu's artificial intelligence research lab has developed Deep Voice, a
text-to-speech system that can learn to talk with little to no human
interference in a matter of hours. Most text-to-speech tools involve
recording a large database of speech from one individual and then rearranging
the utterances into new phrases. The Baidu researchers say Deep Voice employs
deep-learning methods to transform text into phonemes, and then applies a
speech synthesis network to replicate the sounds. Each stage of the process
follows deep learning, so once trained, Deep Voice has little need for human
modification. Deep Voice has no control over stresses on the phonemes, their
duration, or the natural frequency of the sound, enabling Baidu to change the
voice of the speaker and the emotion the word evokes. The researchers say
real-time speech synthesis is possible with Deep Voice, which can be quickly
re-educated on new datasets with no human involvement.

Read the entire article at:
https://www.technologyreview.com/s/603811/baidus-artificial-intelligence-lab-unveils-synthetic-speech-system

Links:
Deep Voice: Real-time Neural Text-to-Speech
https://arxiv.org/abs/1702.07825

Baidu's Deep Voice can quickly synthesize realistic human speech
https://www.engadget.com/2017/03/09/baidu-deep-voice-natural-sounding-speec

Deep Voice: Real-Time Neural Text-to-Speech for Production
http://research.baidu.com/deep-voice-production-quality-text-speech-system-constructed-entirely-deep-neural-networks

Google's DeepMind learns to reproduce human speech, tricks us into starting
  robot apocalypse
https://www.neowin.net/news/googles-deepmind-learns-to-reproduce-human-speech-tricks-us-into-starting-robot-apocalypse