@ -1,6 +1,5 @@
< div align = "center" >
< div align = "center" >
< h1 > GPT-SoVITS-WebUI< / h1 >
< h1 > GPT-SoVITS-WebUI< / h1 >
A Powerful Few-shot Voice Conversion and Text-to-Speech WebUI.< br > < br >
A Powerful Few-shot Voice Conversion and Text-to-Speech WebUI.< br > < br >
@ -77,6 +76,7 @@ bash install.sh
```bash
```bash
conda create -n GPTSoVits python=3.9
conda create -n GPTSoVits python=3.9
conda activate GPTSoVits
conda activate GPTSoVits
pip install -r extra-req.txt --no-deps
pip install -r requirements.txt
pip install -r requirements.txt
```
```
@ -105,6 +105,7 @@ Download and place [ffmpeg.exe](https://huggingface.co/lj1995/VoiceConversionWeb
Install [Visual Studio 2017 ](https://aka.ms/vs/17/release/vc_redist.x86.exe ) (Korean TTS Only)
Install [Visual Studio 2017 ](https://aka.ms/vs/17/release/vc_redist.x86.exe ) (Korean TTS Only)
##### MacOS Users
##### MacOS Users
```bash
```bash
brew install ffmpeg
brew install ffmpeg
```
```
@ -112,6 +113,7 @@ brew install ffmpeg
#### Install Dependences
#### Install Dependences
```bash
```bash
pip install -r extra-req.txt --no-deps
pip install -r requirements.txt
pip install -r requirements.txt
```
```
@ -200,6 +202,7 @@ if you want to switch to V1,then
```bash
```bash
python webui.py v1 < language ( optional ) >
python webui.py v1 < language ( optional ) >
```
```
Or maunally switch version in WebUI
Or maunally switch version in WebUI
### Finetune
### Finetune
@ -224,11 +227,13 @@ Double-click `go-webui-v2.bat` or use `go-webui-v2.ps1` ,then open the inference
```bash
```bash
python GPT_SoVITS/inference_webui.py < language ( optional ) >
python GPT_SoVITS/inference_webui.py < language ( optional ) >
```
```
OR
OR
```bash
```bash
python webui.py
python webui.py
```
```
then open the inference webui at `1-GPT-SoVITS-TTS/1C-inference`
then open the inference webui at `1-GPT-SoVITS-TTS/1C-inference`
## V2 Release Notes
## V2 Release Notes
@ -243,7 +248,7 @@ New Features:
4. Improved synthesis quality for low-quality reference audio
4. Improved synthesis quality for low-quality reference audio
[more details ]( https://github.com/RVC-Boss/GPT-SoVITS/wiki/GPT%E2%80%90SoVITS%E2%80%90v2%E2%80%90features-(%E6%96%B0%E7%89%B9%E6%80%A7))
[more details ](< https://github.com/RVC-Boss/GPT-SoVITS/wiki/GPT%E2%80%90SoVITS%E2%80%90v2%E2%80%90features-(%E6%96%B0%E7%89%B9%E6%80%A7)> )
Use v2 from v1 environment:
Use v2 from v1 environment:
@ -263,7 +268,7 @@ New Features:
2. GPT model is more stable, with fewer repetitions and omissions, and it is easier to generate speech with richer emotional expression.
2. GPT model is more stable, with fewer repetitions and omissions, and it is easier to generate speech with richer emotional expression.
[more details ]( https://github.com/RVC-Boss/GPT-SoVITS/wiki/GPT%E2%80%90SoVITS%E2%80%90v3%E2%80%90features-(%E6%96%B0%E7%89%B9%E6%80%A7))
[more details ](< https://github.com/RVC-Boss/GPT-SoVITS/wiki/GPT%E2%80%90SoVITS%E2%80%90v3%E2%80%90features-(%E6%96%B0%E7%89%B9%E6%80%A7)> )
Use v3 from v2 environment:
Use v3 from v2 environment:
@ -275,7 +280,6 @@ Use v3 from v2 environment:
additional: for Audio Super Resolution model, you can read [how to download ](./tools/AP_BWE_main/24kto48k/readme.txt )
additional: for Audio Super Resolution model, you can read [how to download ](./tools/AP_BWE_main/24kto48k/readme.txt )
## Todo List
## Todo List
- [x] **High Priority:**
- [x] **High Priority:**
@ -297,15 +301,20 @@ Use v3 from v2 environment:
- [ ] model mix
- [ ] model mix
## (Additional) Method for running from the command line
## (Additional) Method for running from the command line
Use the command line to open the WebUI for UVR5
Use the command line to open the WebUI for UVR5
```
```
python tools/uvr5/webui.py "< infer_device > " < is_half > < webui_port_uvr5 >
python tools/uvr5/webui.py "< infer_device > " < is_half > < webui_port_uvr5 >
```
```
<!-- If you can't open a browser, follow the format below for UVR processing,This is using mdxnet for audio processing
<!-- If you can't open a browser, follow the format below for UVR processing,This is using mdxnet for audio processing
```
```
python mdxnet.py --model --input_root --output_vocal --output_ins --agg_level --format --device --is_half_precision
python mdxnet.py --model --input_root --output_vocal --output_ins --agg_level --format --device --is_half_precision
``` -->
``` -->
This is how the audio segmentation of the dataset is done using the command line
This is how the audio segmentation of the dataset is done using the command line
```
```
python audio_slicer.py \
python audio_slicer.py \
--input_path "< path_to_original_audio_file_or_directory > " \
--input_path "< path_to_original_audio_file_or_directory > " \
@ -315,16 +324,21 @@ python audio_slicer.py \
--min_interval < shortest_time_gap_between_adjacent_subclips >
--min_interval < shortest_time_gap_between_adjacent_subclips >
--hop_size < step_size_for_computing_volume_curve >
--hop_size < step_size_for_computing_volume_curve >
```
```
This is how dataset ASR processing is done using the command line(Only Chinese)
This is how dataset ASR processing is done using the command line(Only Chinese)
```
```
python tools/asr/funasr_asr.py -i < input > -o < output >
python tools/asr/funasr_asr.py -i < input > -o < output >
```
```
ASR processing is performed through Faster_Whisper(ASR marking except Chinese)
ASR processing is performed through Faster_Whisper(ASR marking except Chinese)
(No progress bars, GPU performance may cause time delays)
(No progress bars, GPU performance may cause time delays)
```
```
python ./tools/asr/fasterwhisper_asr.py -i < input > -o < output > -l < language > -p < precision >
python ./tools/asr/fasterwhisper_asr.py -i < input > -o < output > -l < language > -p < precision >
```
```
A custom list save path is enabled
A custom list save path is enabled
## Credits
## Credits
@ -332,6 +346,7 @@ A custom list save path is enabled
Special thanks to the following projects and contributors:
Special thanks to the following projects and contributors:
### Theoretical Research
### Theoretical Research
- [ar-vits ](https://github.com/innnky/ar-vits )
- [ar-vits ](https://github.com/innnky/ar-vits )
- [SoundStorm ](https://github.com/yangdongchao/SoundStorm/tree/master/soundstorm/s1/AR )
- [SoundStorm ](https://github.com/yangdongchao/SoundStorm/tree/master/soundstorm/s1/AR )
- [vits ](https://github.com/jaywalnut310/vits )
- [vits ](https://github.com/jaywalnut310/vits )
@ -341,17 +356,23 @@ Special thanks to the following projects and contributors:
- [fish-speech ](https://github.com/fishaudio/fish-speech/blob/main/tools/llama/generate.py#L41 )
- [fish-speech ](https://github.com/fishaudio/fish-speech/blob/main/tools/llama/generate.py#L41 )
- [f5-TTS ](https://github.com/SWivid/F5-TTS/blob/main/src/f5_tts/model/backbones/dit.py )
- [f5-TTS ](https://github.com/SWivid/F5-TTS/blob/main/src/f5_tts/model/backbones/dit.py )
- [shortcut flow matching ](https://github.com/kvfrans/shortcut-models/blob/main/targets_shortcut.py )
- [shortcut flow matching ](https://github.com/kvfrans/shortcut-models/blob/main/targets_shortcut.py )
### Pretrained Models
### Pretrained Models
- [Chinese Speech Pretrain ](https://github.com/TencentGameMate/chinese_speech_pretrain )
- [Chinese Speech Pretrain ](https://github.com/TencentGameMate/chinese_speech_pretrain )
- [Chinese-Roberta-WWM-Ext-Large ](https://huggingface.co/hfl/chinese-roberta-wwm-ext-large )
- [Chinese-Roberta-WWM-Ext-Large ](https://huggingface.co/hfl/chinese-roberta-wwm-ext-large )
- [BigVGAN ](https://github.com/NVIDIA/BigVGAN )
- [BigVGAN ](https://github.com/NVIDIA/BigVGAN )
### Text Frontend for Inference
### Text Frontend for Inference
- [paddlespeech zh_normalization ](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/paddlespeech/t2s/frontend/zh_normalization )
- [paddlespeech zh_normalization ](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/paddlespeech/t2s/frontend/zh_normalization )
- [split-lang ](https://github.com/DoodleBears/split-lang )
- [split-lang ](https://github.com/DoodleBears/split-lang )
- [g2pW ](https://github.com/GitYCC/g2pW )
- [g2pW ](https://github.com/GitYCC/g2pW )
- [pypinyin-g2pW ](https://github.com/mozillazg/pypinyin-g2pW )
- [pypinyin-g2pW ](https://github.com/mozillazg/pypinyin-g2pW )
- [paddlespeech g2pw ](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/paddlespeech/t2s/frontend/g2pw )
- [paddlespeech g2pw ](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/paddlespeech/t2s/frontend/g2pw )
### WebUI Tools
### WebUI Tools
- [ultimatevocalremovergui ](https://github.com/Anjok07/ultimatevocalremovergui )
- [ultimatevocalremovergui ](https://github.com/Anjok07/ultimatevocalremovergui )
- [audio-slicer ](https://github.com/openvpi/audio-slicer )
- [audio-slicer ](https://github.com/openvpi/audio-slicer )
- [SubFix ](https://github.com/cronrpc/SubFix )
- [SubFix ](https://github.com/cronrpc/SubFix )