0. Regarding image tags: Due to rapid updates in the codebase and the slow process of packaging and testing images, please check [Docker Hub](https://hub.docker.com/r/breakstring/gpt-sovits) for the currently packaged latest images and select as per your situation, or alternatively, build locally using a Dockerfile according to your own needs.
1. Environment Variables:
- is_half: Controls half-precision/double-precision. This is typically the cause if the content under the directories 4-cnhubert/5-wav32k is not generated correctly during the "SSL extracting" step. Adjust to True or False based on your actual situation.
- is_half: Controls half-precision/double-precision. This is typically the cause if the content under the directories 4-cnhubert/5-wav32k is not generated correctly during the "SSL extracting" step. Adjust to True or False based on your actual situation.
2. Volumes Configuration,The application's root directory inside the container is set to /workspace. The default docker-compose.yaml lists some practical examples for uploading/downloading content.
3. shm_size: The default available memory for Docker Desktop on Windows is too small, which can cause abnormal operations. Adjust according to your own situation.
4. Under the deploy section, GPU-related settings should be adjusted cautiously according to your system and actual circumstances.
4. For Chinese ASR (additionally), download models from [Damo ASR Model](https://modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/files), [Damo VAD Model](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/files), and [Damo Punc Model](https://modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/files) and place them in `tools/asr/models`.
5. For English or Japanese ASR (additionally), download models from [Faster Whisper Large V3](https://huggingface.co/Systran/faster-whisper-large-v3) and place them in `tools/asr/models`. Also, [other models](https://huggingface.co/Systran) may have the similar effect with smaller disk footprint.
5. For English or Japanese ASR (additionally), download models from [Faster Whisper Large V3](https://huggingface.co/Systran/faster-whisper-large-v3) and place them in `tools/asr/models`. Also, [other models](https://huggingface.co/Systran) may have the similar effect with smaller disk footprint.
## Dataset Format
@ -175,7 +173,7 @@ Language dictionary:
- 'en': English
- 'ko': Korean
- 'yue': Cantonese
Example:
```
@ -184,61 +182,56 @@ D:\GPT-SoVITS\xxx/xxx.wav|xxx|en|I like playing Genshin.
## Finetune and inference
### Open WebUI
### Open WebUI
#### Integrated Package Users
#### Integrated Package Users
Double-click `go-webui.bat`or use `go-webui.ps1`
if you want to switch to V1,then double-click`go-webui-v1.bat` or use `go-webui-v1.ps1`
Double-click `go-webui.bat`or use `go-webui.ps1`
if you want to switch to V1,then double-click`go-webui-v1.bat` or use `go-webui-v1.ps1`
#### Others
#### Others
```bash
python webui.py <language(optional)>
```
```bash
python webui.py <language(optional)>
```
if you want to switch to V1,then
if you want to switch to V1,then
```bash
python webui.py v1 <language(optional)>
```
```bash
python webui.py v1 <language(optional)>
```
Or maunally switch version in WebUI
### Finetune
#### Path Auto-filling is now supported
### Finetune
1.Fill in the audio path
#### Path Auto-filling is now supported
2.Slice the audio into small chunks
1. Fill in the audio path
2. Slice the audio into small chunks
3. Denoise(optinal)
4. ASR
5. Proofreading ASR transcriptions
6. Go to the next Tab, then finetune the model
3.Denoise(optinal)
### Open Inference WebUI
4.ASR
#### Integrated Package Users
5.Proofreading ASR transcriptions
Double-click `go-webui-v2.bat` or use `go-webui-v2.ps1` ,then open the inference webui at `1-GPT-SoVITS-TTS/1C-inference`
6.Go to the next Tab, then finetune the model
#### Others
### Open Inference WebUI
#### Integrated Package Users
Double-click `go-webui-v2.bat` or use `go-webui-v2.ps1` ,then open the inference webui at `1-GPT-SoVITS-TTS/1C-inference`
1. `pip install -r requirements.txt` to update some packages
@ -262,7 +255,7 @@ Use v2 from v1 environment:
Chinese v2 additional: [G2PWModel_1.1.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/g2p/G2PWModel_1.1.zip)(Download G2PW models, unzip and rename to `G2PWModel`, and then place them in `GPT_SoVITS/text`.
## V3 Release Notes
## V3 Release Notes
New Features:
@ -270,9 +263,9 @@ New Features:
2. GPT model is more stable, with fewer repetitions and omissions, and it is easier to generate speech with richer emotional expression.