Singing Audio Samples

For all of the singing audio samples, we uniformly use Parallel WaveGAN (PWG) as vocoder, which has been adjusted to fit singing voice synthesis.

Comparison with Baseline Model FFT-Singer (based on FastSpeech 2 + PWG)

ràng mèng héng jiǔ bǐ tiān cháng

让 梦 恒 久 比 天 长

GT GT (PWG) FFT-Singer
DiffSinger


wǒ zhōng yú áo xiáng

我 终 于 翱 翔

GT GT (PWG) FFT-Singer
DiffSinger


nǐ gòu bú gòu wǒ zhè yàng sǎ tuō

你 够 不 够 我 这 样 洒 脱

GT GT (PWG) FFT-Singer
DiffSinger


suǒ yǒu mèng xiǎng dōu kāi huā

所 有 梦 想 都 开 花

GT GT (PWG) FFT-Singer
DiffSinger


nǎ lǐ huì yǒu fēng

哪 里 会 有 风

GT GT (PWG) FFT-Singer
DiffSinger


Ablation Study

wǒ men bī bú dé yǐ yào xí guàn

我 们 逼 不 得 已 要 习 惯

DiffSinger DiffSinger Naive DiffSinger (k=25)

suǒ yǒu mèng xiǎng dōu kāi huā

所 有 梦 想 都 开 花

DiffSinger DiffSinger Naive DiffSinger (k=25)

nǎ lǐ huì yǒu fēng

哪 里 会 有 风

DiffSinger DiffSinger Naive DiffSinger (k=25)