Releases · openvpi/DiffSinger

15 Feb 07:29

yqzhishen

v1.6.0

eecb002

Data augmentation and gender control (usage and pretrained model)

Overview

In this release, we introduced data augmentation to DiffSinger in this forked repository.

See the dataset making pipeline for more details.

Random pitch shifting

Randomly shifts pitch of training data and embeds how many semitones the pitch is shifted into the neural networks. This broadens the pitch range and allows you to control the gender (like GEN parameter in VOCALOID) at frame level.

To enable random pitch shifting for your former dataset, add the following configuration in the config file:

augmentation_args:
  random_pitch_shifting:
    range: [-5., 5.]
    scale: 2.0
use_key_shift_embed: true

Fixed pitch shifting

Shifts pitch of the training data for several semitones. All data with pitch shifting is regarded to be from other speakers than the original speaker. Speaker embedding is enabled and the number of speakers is increased, and the pitch range is also broadened.

To enable fixed pitch shifting for your former dataset, add the following configuration in the config file:

augmentation_args:
  fixed_pitch_shifting:
    targets: [-5., 5.]
    scale: 0.75
use_key_shift_embed: false
use_spk_id: true
num_spk: X # Set this value to at least (1 + T) * N, where T is the number of targets and N is the number of speakers before augmentation.

0211_opencpop_ds1000_keyshift

The pretrained model on the opencpop dataset and applied with randomly pitch shifting.

Control gender value with CLI args of main.py:

python main.py xxx.ds --exp 0211_opencpop_ds1000_keyshift --gender GEN

where GEN is a float value between -1 and 1 (negative = male, positive = female).

Control gender curve in *.ds files:

{
  "gender_timestep": "0.005", // timestep in seconds, like f0_timestep
  "gender": "-1.0 -0.9 -0.8 ... 0.8 0.9 1.0", // sequence of float values, like f0_seq
  ... // other attributes
}

Export to ONNX format

python onnx/export/export_acoustic.py --exp 0211_opencpop_ds1000_keyshift --expose_gender

python onnx/export/export_acoustic.py --exp 0211_opencpop_ds1000_keyshift [--freeze_gender GEN]

where GEN is the gender value that you would like to freeze into the model (defaults to 0).

Assets 3

26 Jan 16:27

yqzhishen

v1.5.0

12da2af

Pretrained multi-speaker model

This is a pretrained model with multiple speakers embedded and enabled for the newest features of speaker mix with DiffSinger in this forked repository.
Demo: https://www.bilibili.com/video/BV1Yy4y1d7Cg

0116_female_triplet_ds1000

There are 3 female singers in this model:

Opencpop, which we used to train models before (set as default)
Qixuan (绮萱), a 12-year-old girl (anyone who used or mixed her voice should leave her name and credit to https://space.bilibili.com/498285939 and https://y.qq.com/n/ryqq/singer/003HjD6H4aZn1K)
XiaYeZi (夏叶子), female virtual singer from 韶和Project (anyone who used or mixed her voice should leave her name and credit to https://space.bilibili.com/13303439 and https://space.bilibili.com/787619)

Any commercial usage with this model is prohibited. This notice should be attached to all types of redistributions of this model.
If you used speaker mix, you must follow the rules of each speaker that you added with a proportion larger than zero.

Assets 3

09 Jan 15:20

yqzhishen

v1.4.1

c77e3df

ONNX version of the duration predictor with FastSpeech2MIDI Pre-release

Pre-release

This release contains ONNX models for phoneme duration prediction.
These models can be temporary tools to generate phoneme durations for MIDI-less acoustic models that have no ability to predict durations themselves.

Assets 4

25 Dec 13:03

yqzhishen

v1.4.0

3d758c1

Pretrained model for MIDI-less mode

1215_opencpop_ds1000_fix_label_nomidi

MIDI-less mode, strict pinyin dictionary, 44.1kHz sampling rate, fixed some phoneme label errors and trained for 320k steps.
Note: both ph_dur and f0_seq should be given to run inference.

Assets 3

05 Dec 07:12

yqzhishen

v1.3.0

c61bf32

Pretrained models for new 44.1 kHz vocoder

High quality, high performance pretrained acoustic models with 44.1 kHz full-band synthesis support.
To run inference with these models, a vocoder from DiffSinger Community Vocoder Project is required.

1117_opencpop_ds1000_strict_pinyin

MIDI-A mode, with 1k diffusion steps and 512x20 WaveNet, using the new strict pinyin dictionary.

1122_opencpop_ds1000_strict_pinyin_384x30

MIDI-A mode, same as above but with 384x30 WaveNet.

Assets 4

07 Nov 03:09

yqzhishen

v1.2.0

fc200b1

[Experimental] pretrained models

These are pretrained models from the OpenVPI team.
Note: models are experimental. They are currently consistent with the original repository but may not be compatible in the future. Using these models with main.py is suggested. See more details in the code.

0814_opencpop_ds_rhythm_fix

MIDI-B mode, fixes rhythm errors described in this issue.

0823_opencpop_ds_enhancement

MIDI-B mode, improve performance in high pitch range by shifting pitch of training data with WORLD vocoder, but may cause worse sound quality.

0831_opencpop_ds1000

MIDI-B mode, trained with 1k diffusion steps for better sound quality and pndm and dpm-solver acceleration.

0909_opencpop_ds100_pitchcontrol

MIDI-A mode, support manually editing pitch. It is highly recommended that the pitch must be specified because the automatic predicted pitch is very bad and is supposed to be fixed in the future updates.

0920_opencpop_ds1000

MIDI-A mode, trained with 1k diffusion steps and more training epochs.

Assets 7

Releases: openvpi/DiffSinger

Data augmentation and gender control (usage and pretrained model)

Overview

Random pitch shifting

Fixed pitch shifting

0211_opencpop_ds1000_keyshift

Control gender value with CLI args of main.py:

Control gender curve in *.ds files:

Export to ONNX format

Uh oh!

Pretrained multi-speaker model

0116_female_triplet_ds1000

Uh oh!

ONNX version of the duration predictor with FastSpeech2MIDI

Uh oh!

Pretrained model for MIDI-less mode

1215_opencpop_ds1000_fix_label_nomidi

Uh oh!

Pretrained models for new 44.1 kHz vocoder

1117_opencpop_ds1000_strict_pinyin

1122_opencpop_ds1000_strict_pinyin_384x30

Uh oh!

[Experimental] pretrained models

0814_opencpop_ds_rhythm_fix

0823_opencpop_ds_enhancement

0831_opencpop_ds1000

0909_opencpop_ds100_pitchcontrol

0920_opencpop_ds1000

Uh oh!