Extensional Experiments on Text-to-Speech Synthesis
For all of the speech audio samples, we uniformly use HiFi-GAN as vocoder.
Comparison with Other Models
This type was introduced into England by Wynkyn de Worde, Caxtons successor
| Glow-TTS |
FastSpeech 2 |
DiffSpeech |
|
|
|
Most of Caxtons own types are of an earlier character, though they also much resemble Flemish or Cologne letter.
| Glow-TTS |
FastSpeech 2 |
DiffSpeech |
|
|
|
the worst, which perhaps was the English, was a terrible falling off from the work of the earlier presses;
| Glow-TTS |
FastSpeech 2 |
DiffSpeech |
|
|
|
Ablation Study
This type was introduced into England by Wynkyn de Worde, Caxtons successor
| DiffSpeech |
DiffSpeech Naive |
|
|
Most of Caxtons own types are of an earlier character, though they also much resemble Flemish or Cologne letter.
| DiffSpeech |
DiffSpeech Naive |
|
|
the worst, which perhaps was the English, was a terrible falling off from the work of the earlier presses;
| DiffSpeech |
DiffSpeech Naive |
|
|