Singing Audio Samples
For all of the singing audio samples, we uniformly use Parallel WaveGAN (PWG) as vocoder, which has been adjusted to fit singing voice synthesis.
Comparison with Baseline Model FFT-Singer (based on FastSpeech 2 + PWG)
ràng mèng héng jiǔ bǐ tiān cháng
让 梦 恒 久 比 天 长
| GT | GT (PWG) | FFT-Singer |
|---|---|---|
| DiffSinger |
|---|
wǒ zhōng yú áo xiáng
我 终 于 翱 翔
| GT | GT (PWG) | FFT-Singer |
|---|---|---|
| DiffSinger |
|---|
nǐ gòu bú gòu wǒ zhè yàng sǎ tuō
你 够 不 够 我 这 样 洒 脱
| GT | GT (PWG) | FFT-Singer |
|---|---|---|
| DiffSinger |
|---|
suǒ yǒu mèng xiǎng dōu kāi huā
所 有 梦 想 都 开 花
| GT | GT (PWG) | FFT-Singer |
|---|---|---|
| DiffSinger |
|---|
nǎ lǐ huì yǒu fēng
哪 里 会 有 风
| GT | GT (PWG) | FFT-Singer |
|---|---|---|
| DiffSinger |
|---|
Ablation Study
wǒ men bī bú dé yǐ yào xí guàn
我 们 逼 不 得 已 要 习 惯
| DiffSinger | DiffSinger Naive | DiffSinger (k=25) |
|---|---|---|
suǒ yǒu mèng xiǎng dōu kāi huā
所 有 梦 想 都 开 花
| DiffSinger | DiffSinger Naive | DiffSinger (k=25) |
|---|---|---|
nǎ lǐ huì yǒu fēng
哪 里 会 有 风
| DiffSinger | DiffSinger Naive | DiffSinger (k=25) |
|---|---|---|