Research Areas and Projects
Speech Synthesis with Deep Learning
Deep Learning
Introducing Deep Learning to Speech Synthesis
Neural TTS: A New Generation
Key References and Links
Speech Synthesis with Deep Learning
Deep Learning
Shallow learning
- Artificial Neural Network as a powerful modelling tool
- Barriers to depth: The recalcitrance of convergence & the computational complexity
The critical path to depth
- Pre-training with auto-encoders or restricted Boltzmann machines
- Advancements in hardware: GPU
- Availability of big data
Introducing Deep Learning to Speech Synthesis
The deep learning revolution
Replacing different components in the existing systems with deep neural network
- Replacing GMMs
- Replacing decision tree and GMMs
Neural TTS: A New Generation
WaveNet: The beginning of a new generation of speech synthesis systems
Neural vocoders
- WaveNet vocoder
- Parallel WaveNet vocoder
- Flow based vocoders
Neural acoustic models
- Tacotron2
- Transformer TTS
- FastSpeech2
Key References and Links
References
- Y. Bengio, "Learning Deep Architectures for AI," Foundations and Trends in Machine Learning, 2(1): 1-127, 2009.
- A. Fischer & C. Igel, "An Introduction to Restricted Boltzmann Machines," 2012.
- H. Zen, "Deep Learning in Speech Synthesis," Google, 2013.
- Z. Ling et al., "Deep Learning for Acoustic Modeling in Parametric Speech Generation: A systematic review of existing techniques and future trends," IEEE Signal Processing Magazine, 35-52, May 2015.
- A. Oord et al.,WaveNet: A generative model for raw audio, Google 2016
- A. Tamamori et al., “Speaker-dependent WaveNet vocoder, ” Interspeech 2017
- S. Kim et al., “FloWaveNet: A Generative Flow for Raw Audio,” PMLR 2019
- R. Prenger et al., “Waveglow: A Flow-based Generative Network for Speech Synthesis,” ICASSP 2019
- J. Shen et al., “Natural TTS Synthesis By Conditioning WaveNet on Mel Spectrogram Predictions,” ICASSP 2018
- N. Li et al., “Neural Speech Synthesis with Transformer Network,” AAAI 2019
- Y. Ren et al., "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech," ICLR 2021
- X. Tan et al., "A Survey on Neural Speech Synthesis," Microsoft Asia 2021
Links