From d5e479dad6342222eb4887df627e69c048d2338c Mon Sep 17 00:00:00 2001
From: XXXXRT666 <157766680+XXXXRT666@users.noreply.github.com>
Date: Mon, 26 May 2025 05:45:14 +0300
Subject: [PATCH] Introduce Docker and Windows CI Workflow, Pre-commit
Formatting, and Language Resource Auto-Download (#2351)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
* Docker Auto-Build Workflow
* Rename
* Update
* Fix Bugs
* Disable Progress Bar When workflows triggered
* Fix Wget
* Fix Bugs
* Fix Bugs
* Update Wget
* Update Workflows
* Accelerate Docker Image Building
* Fix Install.sh
* Add Skip-Check For Action Runner
* Fix Dockerfile
* .
* .
* .
* .
* Delete File in Runner
* Add Sort
* Delete More Files
* Delete More
* .
* .
* .
* Add Pre-Commit Hook
Update Docker
* Add Code Spell Check
* [pre-commit.ci] trigger
* [pre-commit.ci] trigger
* [pre-commit.ci] trigger
* Fix Bugs
* .
* Disable Progress Bar and Logs while using GitHub Actions
* .
* .
* Fix Bugs
* update conda
* fix bugs
* Fix Bugs
* fix bugs
* .
* .
* Quiet Installation
* fix bugs
* .
* fix bug
* .
* Fix pre-commit.ci and Docker
* fix bugs
* .
* Update Docker & Pre-Commit
* fix bugs
* Update Req
* Update Req
* Update OpenCC
* update precommit
* .
* Update .pre-commit-config.yaml
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Update Docs and fix bugs
* Fix \
* Fix MacOS
* .
* test
* .
* Add Tag Alias
* .
* fix bugs
* fix bugs
* make image smaller
* update pre-commit config
* .
* .
* fix bugs
* use miniconda
* Fix Wrong Path
* .
* debug
* debug
* revert
* Fix Bugs
* Update Docs, Add Dict Auto Download in install.sh
* update docker_build
* Update Docs for Install.sh
* update docker docs about architecture
* Add Xcode-Commandline-Tool Installation
* Update Docs
1. Add Missing VC17
2. Modufied the Order of FFmpeg Installation and Requirements Installation
3. Remove Duplicate FFmpeg
* Fix Wrong Cuda Version
* Update TESTED ENV
* Add PYTHONNOUSERSITE(-s)
* Fix Wrapper
* Update install.sh For Robustness
* Ignore .git
* Preload CUDNN For Ctranslate2
* Remove Gradio Warnings
* Update Colab
* Fix OpenCC Problems
* Update Win DLL Strategy
* Fix Onnxruntime-gpu NVRTC Error
* Fix Path Problems
* Add Windows Packages Workflow
* WIP
* WIP
* WIP
* WIP
* WIP
* WIP
* .
* WIP
* WIP
* WIP
* WIP
* WIP
* WIP
* WIP
* WIP
* WIP
* WIP
* WIP
* WIP
* WIP
* WIP
* WIP
* WIP
* WIP
* WIP
* WIP
* WIP
* WIP
* WIP
* WIP
* WIP
* WIP
* WIP
* WIP
* WIP
* WIP
* WIP
* WIP
* WIP
* WIP
* WIP
* WIP
* WIP
* WIP
* Fix Path
* Fix Path
* Enable Logging
* Set 7-Zip compression level to maximum (-mx=9)
* Use Multithread in ONNX Session
* Fix Tag Bugs
* Add Time
* Add Time
* Add Time
* Compress More
* Copy DLL to Solve VC Runtime DLL Missing Issues
* Expose FFmpeg Errors, Copy Only Part of Visual C++ Runtime
* Update build_windows_packages.ps1
* Update build_windows_packages.ps1
* Update build_windows_packages.ps1
* Update build_windows_packages.ps1
* WIP
* WIP
* WIP
* Update build_windows_packages.ps1
* Update install.sh
* Update build_windows_packages.ps1
* Update docker-publish.yaml
* Update install.sh
* Update Dockerfile
* Update docker_build.sh
* Update miniconda_install.sh
* Update README.md
* Update README.md
* Update README.md
* Update README.md
* Update README.md
* Update README.md
* Update Colab-WebUI.ipynb
* Update Colab-Inference.ipynb
* Update docker-compose.yaml
* 更新 build_windows_packages.ps1
* Update install.sh
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
---
.dockerignore | 202 ++++++++++++-
.github/build_windows_packages.ps1 | 194 ++++++++++++
.github/workflows/build_windows_packages.yaml | 38 +++
.github/workflows/docker-publish.yaml | 276 ++++++++++++++++++
.gitignore | 9 +-
.pre-commit-config.yaml | 15 +
Colab-Inference.ipynb | 13 +-
colab_webui.ipynb => Colab-WebUI.ipynb | 4 +-
Docker/damo.sha256 | 3 -
Docker/download.py | 8 -
Docker/download.sh | 11 -
Docker/install_wrapper.sh | 33 +++
Docker/links.sha256 | 12 -
Docker/links.txt | 34 ---
Docker/miniconda_install.sh | 70 +++++
Dockerfile | 80 +++--
GPT_SoVITS/TTS_infer_pack/TTS.py | 40 ++-
GPT_SoVITS/inference_webui.py | 140 +++++----
GPT_SoVITS/inference_webui_fast.py | 20 +-
GPT_SoVITS/module/data_utils.py | 5 +-
GPT_SoVITS/module/mel_processing.py | 66 +++--
GPT_SoVITS/module/models.py | 11 +-
GPT_SoVITS/process_ckpt.py | 8 +-
GPT_SoVITS/s2_train_v3_lora.py | 8 +-
GPT_SoVITS/text/g2pw/onnx_api.py | 15 +-
GPT_SoVITS/utils.py | 6 +-
README.md | 130 +++++----
docker-compose.yaml | 103 +++++--
docker_build.sh | 82 ++++++
dockerbuild.sh | 21 --
docs/cn/README.md | 149 ++++++----
docs/ja/README.md | 127 ++++----
docs/ko/README.md | 118 +++++---
docs/tr/README.md | 123 +++++---
go-webui.bat | 4 +
go-webui.ps1 | 5 +-
install.sh | 255 ++++++++++------
requirements.txt | 8 +-
tools/asr/fasterwhisper_asr.py | 3 +
tools/my_utils.py | 112 ++++++-
tools/subfix_webui.py | 6 +-
tools/uvr5/lib/lib_v5/dataset.py | 28 +-
tools/uvr5/lib/lib_v5/layers.py | 24 +-
tools/uvr5/lib/lib_v5/layers_123812KB.py | 24 +-
tools/uvr5/lib/lib_v5/layers_123821KB.py | 24 +-
tools/uvr5/lib/lib_v5/layers_33966KB.py | 32 +-
tools/uvr5/lib/lib_v5/layers_537227KB.py | 32 +-
tools/uvr5/lib/lib_v5/layers_537238KB.py | 32 +-
tools/uvr5/lib/lib_v5/layers_new.py | 28 +-
tools/uvr5/lib/lib_v5/model_param_init.py | 7 +-
tools/uvr5/lib/lib_v5/nets.py | 2 -
tools/uvr5/lib/lib_v5/nets_537227KB.py | 1 -
tools/uvr5/lib/lib_v5/nets_537238KB.py | 1 -
tools/uvr5/lib/lib_v5/nets_new.py | 16 +-
tools/uvr5/lib/lib_v5/spec_utils.py | 119 +++-----
tools/uvr5/lib/utils.py | 16 +-
tools/uvr5/webui.py | 18 +-
webui.py | 78 +++--
58 files changed, 2079 insertions(+), 970 deletions(-)
create mode 100644 .github/build_windows_packages.ps1
create mode 100644 .github/workflows/build_windows_packages.yaml
create mode 100644 .github/workflows/docker-publish.yaml
create mode 100644 .pre-commit-config.yaml
rename colab_webui.ipynb => Colab-WebUI.ipynb (95%)
delete mode 100644 Docker/damo.sha256
delete mode 100644 Docker/download.py
delete mode 100644 Docker/download.sh
create mode 100644 Docker/install_wrapper.sh
delete mode 100644 Docker/links.sha256
delete mode 100644 Docker/links.txt
create mode 100644 Docker/miniconda_install.sh
create mode 100644 docker_build.sh
delete mode 100755 dockerbuild.sh
diff --git a/.dockerignore b/.dockerignore
index 4eca27b..bf36b88 100644
--- a/.dockerignore
+++ b/.dockerignore
@@ -1,8 +1,198 @@
-docs
-logs
+GPT_SoVITS/pretrained_models/*
+tools/asr/models/*
+tools/uvr5/uvr5_weights/*
+
+.git
+.DS_Store
+.vscode
+*.pyc
+env
+runtime
+.idea
output
-reference
-SoVITS_weights
-GPT_weights
+logs
+SoVITS_weights*/
+GPT_weights*/
TEMP
-.git
+weight.json
+ffmpeg*
+ffprobe*
+cfg.json
+speakers.json
+ref_audios
+
+# Byte-compiled / optimized / DLL files
+__pycache__/
+**/__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+# Usually these files are written by a python script from a template
+# before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+cover/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+.pybuilder/
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# IPython
+profile_default/
+ipython_config.py
+
+# pyenv
+# For a library or package, you might want to ignore these files since the code is
+# intended to run in multiple environments; otherwise, check them in:
+.python-version
+
+# pipenv
+# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
+# However, in case of collaboration, if having platform-specific dependencies or dependencies
+# having no cross-platform support, pipenv may install dependencies that don't work, or not
+# install all needed dependencies.
+Pipfile.lock
+
+# UV
+# Similar to Pipfile.lock, it is generally recommended to include uv.lock in version control.
+# This is especially recommended for binary packages to ensure reproducibility, and is more
+# commonly ignored for libraries.
+uv.lock
+
+# poetry
+# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
+# This is especially recommended for binary packages to ensure reproducibility, and is more
+# commonly ignored for libraries.
+# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
+poetry.lock
+
+# pdm
+# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
+#pdm.lock
+# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
+# in version control.
+# https://pdm.fming.dev/latest/usage/project/#working-with-version-control
+.pdm.toml
+.pdm-python
+.pdm-build/
+
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
+__pypackages__/
+
+# Celery stuff
+celerybeat-schedule
+celerybeat.pid
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+
+# Pyre type checker
+.pyre/
+
+# pytype static type analyzer
+.pytype/
+
+# Cython debug symbols
+cython_debug/
+
+# PyCharm
+# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
+# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
+# and can be added to the global gitignore or merged into this file. For a more nuclear
+# option (not recommended) you can uncomment the following to ignore the entire idea folder.
+.idea/
+
+# Ruff stuff:
+.ruff_cache/
+
+# PyPI configuration file
+.pypirc
diff --git a/.github/build_windows_packages.ps1 b/.github/build_windows_packages.ps1
new file mode 100644
index 0000000..2e4acb2
--- /dev/null
+++ b/.github/build_windows_packages.ps1
@@ -0,0 +1,194 @@
+$ErrorActionPreference = "Stop"
+
+Write-Host "Current location: $(Get-Location)"
+
+$cuda = $env:TORCH_CUDA
+if (-not $cuda) {
+ Write-Error "Missing TORCH_CUDA env (cu124 or cu128)"
+ exit 1
+}
+
+$date = $env:DATE_SUFFIX
+if ([string]::IsNullOrWhiteSpace($date)) {
+ $date = Get-Date -Format "MMdd"
+}
+
+$pkgName = "GPT-SoVITS-$date"
+$tmpDir = "tmp"
+$srcDir = $PWD
+
+$suffix = $env:PKG_SUFFIX
+if (-not [string]::IsNullOrWhiteSpace($suffix)) {
+ $pkgName = "$pkgName$suffix"
+}
+
+$pkgName = "$pkgName-$cuda"
+
+$baseHF = "https://huggingface.co/XXXXRT/GPT-SoVITS-Pretrained/resolve/main"
+$PRETRAINED_URL = "$baseHF/pretrained_models.zip"
+$G2PW_URL = "$baseHF/G2PWModel.zip"
+$UVR5_URL = "$baseHF/uvr5_weights.zip"
+$NLTK_URL = "$baseHF/nltk_data.zip"
+$JTALK_URL = "$baseHF/open_jtalk_dic_utf_8-1.11.tar.gz"
+
+$PYTHON_VERSION = "3.11.12"
+$PY_RELEASE_VERSION = "20250409"
+
+Write-Host "[INFO] Cleaning .git..."
+Remove-Item "$srcDir\.git" -Recurse -Force -ErrorAction SilentlyContinue
+
+Write-Host "[INFO] Creating tmp dir..."
+New-Item -ItemType Directory -Force -Path $tmpDir
+
+Write-Host "[INFO] System Python version:"
+python --version
+python -m site
+
+Write-Host "[INFO] Downloading Python $PYTHON_VERSION..."
+$zst = "$tmpDir\python.tar.zst"
+Invoke-WebRequest "https://github.com/astral-sh/python-build-standalone/releases/download/$PY_RELEASE_VERSION/cpython-$PYTHON_VERSION+$PY_RELEASE_VERSION-x86_64-pc-windows-msvc-pgo-full.tar.zst" -OutFile $zst
+& "C:\Program Files\7-Zip\7z.exe" e $zst -o"$tmpDir" -aoa
+$tar = Get-ChildItem "$tmpDir" -Filter "*.tar" | Select-Object -First 1
+& "C:\Program Files\7-Zip\7z.exe" x $tar.FullName -o"$tmpDir\extracted" -aoa
+Move-Item "$tmpDir\extracted\python\install" "$srcDir\runtime"
+
+Write-Host "[INFO] Copying Redistributing Visual C++ Runtime..."
+$vswhere = "${env:ProgramFiles(x86)}\Microsoft Visual Studio\Installer\vswhere.exe"
+$vsPath = & $vswhere -latest -products * -requires Microsoft.VisualStudio.Component.VC.Tools.x86.x64 -property installationPath
+$redistRoot = Join-Path $vsPath "VC\Redist\MSVC"
+$targetVer = Get-ChildItem -Path $redistRoot -Directory |
+ Where-Object { $_.Name -match "^14\." } |
+ Sort-Object Name -Descending |
+ Select-Object -First 1
+$x64Path = Join-Path $targetVer.FullName "x64"
+Get-ChildItem -Path $x64Path -Directory | Where-Object {
+ $_.Name -match '^Microsoft\..*\.(CRT|OpenMP)$'
+} | ForEach-Object {
+ Get-ChildItem -Path $_.FullName -Filter "*.dll" | ForEach-Object {
+ Copy-Item -Path $_.FullName -Destination "$srcDir\runtime" -Force
+ }
+}
+
+function DownloadAndUnzip($url, $targetRelPath) {
+ $filename = Split-Path $url -Leaf
+ $tmpZip = "$tmpDir\$filename"
+ Invoke-WebRequest $url -OutFile $tmpZip
+ Expand-Archive -Path $tmpZip -DestinationPath $tmpDir -Force
+ $subdirName = $filename -replace '\.zip$', ''
+ $sourcePath = Join-Path $tmpDir $subdirName
+ $destRoot = Join-Path $srcDir $targetRelPath
+ $destPath = Join-Path $destRoot $subdirName
+ if (Test-Path $destPath) {
+ Remove-Item $destPath -Recurse -Force
+ }
+ Move-Item $sourcePath $destRoot
+ Remove-Item $tmpZip
+}
+
+Write-Host "[INFO] Download pretrained_models..."
+DownloadAndUnzip $PRETRAINED_URL "GPT_SoVITS"
+
+Write-Host "[INFO] Download G2PWModel..."
+DownloadAndUnzip $G2PW_URL "GPT_SoVITS\text"
+
+Write-Host "[INFO] Download UVR5 model..."
+DownloadAndUnzip $UVR5_URL "tools\uvr5"
+
+Write-Host "[INFO] Downloading funasr..."
+$funasrUrl = "https://huggingface.co/XXXXRT/GPT-SoVITS-Pretrained/resolve/main/funasr.zip"
+$funasrZip = "$tmpDir\funasr.zip"
+Invoke-WebRequest -Uri $funasrUrl -OutFile $funasrZip
+Expand-Archive -Path $funasrZip -DestinationPath "$srcDir\tools\asr\models" -Force
+Remove-Item $funasrZip
+
+Write-Host "[INFO] Download ffmpeg..."
+$ffUrl = "https://www.gyan.dev/ffmpeg/builds/ffmpeg-release-essentials.zip"
+$ffZip = "$tmpDir\ffmpeg.zip"
+Invoke-WebRequest -Uri $ffUrl -OutFile $ffZip
+Expand-Archive $ffZip -DestinationPath $tmpDir -Force
+$ffDir = Get-ChildItem -Directory "$tmpDir" | Where-Object { $_.Name -like "ffmpeg*" } | Select-Object -First 1
+Move-Item "$($ffDir.FullName)\bin\ffmpeg.exe" "$srcDir\runtime"
+Move-Item "$($ffDir.FullName)\bin\ffprobe.exe" "$srcDir\runtime"
+Remove-Item $ffZip
+Remove-Item $ffDir.FullName -Recurse -Force
+
+Write-Host "[INFO] Installing PyTorch..."
+& ".\runtime\python.exe" -m ensurepip
+& ".\runtime\python.exe" -m pip install --upgrade pip --no-warn-script-location
+switch ($cuda) {
+ "cu124" {
+ & ".\runtime\python.exe" -m pip install torch==2.6 torchaudio --index-url https://download.pytorch.org/whl/cu124 --no-warn-script-location
+ }
+ "cu128" {
+ & ".\runtime\python.exe" -m pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu128 --no-warn-script-location
+ }
+ default {
+ Write-Error "Unsupported CUDA version: $cuda"
+ exit 1
+ }
+}
+
+Write-Host "[INFO] Installing dependencies..."
+& ".\runtime\python.exe" -m pip install -r extra-req.txt --no-deps --no-warn-script-location
+& ".\runtime\python.exe" -m pip install -r requirements.txt --no-warn-script-location
+
+Write-Host "[INFO] Downloading NLTK and pyopenjtalk dictionary..."
+$PYTHON = ".\runtime\python.exe"
+$prefix = & $PYTHON -c "import sys; print(sys.prefix)"
+$jtalkPath = & $PYTHON -c "import os, pyopenjtalk; print(os.path.dirname(pyopenjtalk.__file__))"
+$nltkZip = "$tmpDir\nltk_data.zip"
+$jtalkTar = "$tmpDir\open_jtalk_dic_utf_8-1.11.tar.gz"
+
+Invoke-WebRequest -Uri $NLTK_URL -OutFile $nltkZip
+Expand-Archive -Path $nltkZip -DestinationPath $prefix -Force
+Remove-Item $nltkZip
+
+Invoke-WebRequest -Uri $JTALK_URL -OutFile $jtalkTar
+& "C:\Program Files\7-Zip\7z.exe" e $jtalkTar -o"$tmpDir" -aoa
+$innerTar = Get-ChildItem "$tmpDir" -Filter "*.tar" | Select-Object -First 1
+& "C:\Program Files\7-Zip\7z.exe" x $innerTar.FullName -o"$jtalkPath" -aoa
+Remove-Item $jtalkTar
+Remove-Item $innerTar.FullName
+
+Write-Host "[INFO] Preparing final directory $pkgName ..."
+$items = @(Get-ChildItem -Filter "*.sh") +
+ @(Get-ChildItem -Filter "*.ipynb") +
+ @("$tmpDir", ".github", "Docker", "docs", ".gitignore", ".dockerignore", "README.md")
+Remove-Item $items -Force -Recurse -ErrorAction SilentlyContinue
+$curr = Get-Location
+Set-Location ../
+Get-ChildItem .
+Copy-Item -Path $curr -Destination $pkgName -Recurse
+$7zPath = "$pkgName.7z"
+$start = Get-Date
+Write-Host "Compress Starting at $start"
+& "C:\Program Files\7-Zip\7z.exe" a -t7z "$7zPath" "$pkgName" -m0=lzma2 -mx=9 -md=1g -ms=1g -mmc=500 -mfb=273 -mlc=0 -mlp=4 -mpb=4 -mc=8g -mmt=on -bsp1
+$end = Get-Date
+Write-Host "Elapsed time: $($end - $start)"
+Get-ChildItem .
+
+python -m pip install --upgrade pip
+python -m pip install "modelscope" "huggingface_hub[hf_transfer]" --no-warn-script-location
+
+Write-Host "[INFO] Uploading to ModelScope..."
+$msUser = $env:MODELSCOPE_USERNAME
+$msToken = $env:MODELSCOPE_TOKEN
+if (-not $msUser -or -not $msToken) {
+ Write-Error "Missing MODELSCOPE_USERNAME or MODELSCOPE_TOKEN"
+ exit 1
+}
+modelscope upload "$msUser/GPT-SoVITS-Packages" "$7zPath" "$7zPath" --repo-type model --token $msToken
+
+Write-Host "[SUCCESS] Uploaded: $7zPath to ModelScope"
+
+Write-Host "[INFO] Uploading to HuggingFace..."
+$hfUser = $env:HUGGINGFACE_USERNAME
+$hfToken = $env:HUGGINGFACE_TOKEN
+if (-not $hfUser -or -not $hfToken) {
+ Write-Error "Missing HUGGINGFACE_USERNAME or HUGGINGFACE_TOKEN"
+ exit 1
+}
+$env:HF_HUB_ENABLE_HF_TRANSFER = "1"
+huggingface-cli upload "$hfUser/GPT-SoVITS-Packages" "$7zPath" "$7zPath" --repo-type model --token $hfToken
+
+Write-Host "[SUCCESS] Uploaded: $7zPath to HuggingFace"
diff --git a/.github/workflows/build_windows_packages.yaml b/.github/workflows/build_windows_packages.yaml
new file mode 100644
index 0000000..3286146
--- /dev/null
+++ b/.github/workflows/build_windows_packages.yaml
@@ -0,0 +1,38 @@
+name: Build and Upload Windows Package
+
+on:
+ workflow_dispatch:
+ inputs:
+ date:
+ description: "Date suffix (optional)"
+ required: false
+ default: ""
+ suffix:
+ description: "Package name suffix (optional)"
+ required: false
+ default: ""
+
+jobs:
+ build:
+ runs-on: windows-latest
+ strategy:
+ matrix:
+ torch_cuda: [cu124, cu128]
+ env:
+ TORCH_CUDA: ${{ matrix.torch_cuda }}
+ MODELSCOPE_USERNAME: ${{ secrets.MODELSCOPE_USERNAME }}
+ MODELSCOPE_TOKEN: ${{ secrets.MODELSCOPE_TOKEN }}
+ HUGGINGFACE_USERNAME: ${{ secrets.HUGGINGFACE_USERNAME }}
+ HUGGINGFACE_TOKEN: ${{ secrets.HUGGINGFACE_TOKEN }}
+ DATE_SUFFIX: ${{ github.event.inputs.date }}
+ PKG_SUFFIX: ${{ github.event.inputs.suffix }}
+
+ steps:
+ - name: Checkout
+ uses: actions/checkout@v4
+
+ - name: Run Build and Upload Script
+ shell: pwsh
+ run: |
+ Move-Item .github/build_windows_packages.ps1 ../build_windows_packages.ps1
+ ../build_windows_packages.ps1
\ No newline at end of file
diff --git a/.github/workflows/docker-publish.yaml b/.github/workflows/docker-publish.yaml
new file mode 100644
index 0000000..a00a0a7
--- /dev/null
+++ b/.github/workflows/docker-publish.yaml
@@ -0,0 +1,276 @@
+name: Build and Publish Docker Image
+
+on:
+ workflow_dispatch:
+
+jobs:
+ generate-meta:
+ runs-on: ubuntu-22.04
+ outputs:
+ tag: ${{ steps.meta.outputs.tag }}
+ steps:
+ - name: Checkout Code
+ uses: actions/checkout@v4
+
+ - name: Generate Tag
+ id: meta
+ run: |
+ DATE=$(date +'%Y%m%d')
+ COMMIT=$(git rev-parse --short=6 HEAD)
+ echo "tag=${DATE}-${COMMIT}" >> $GITHUB_OUTPUT
+ build-amd64:
+ needs: generate-meta
+ runs-on: ubuntu-22.04
+ strategy:
+ matrix:
+ include:
+ - cuda_version: 12.6
+ lite: true
+ torch_base: lite
+ tag_prefix: cu126-lite
+ - cuda_version: 12.6
+ lite: false
+ torch_base: full
+ tag_prefix: cu126
+ - cuda_version: 12.8
+ lite: true
+ torch_base: lite
+ tag_prefix: cu128-lite
+ - cuda_version: 12.8
+ lite: false
+ torch_base: full
+ tag_prefix: cu128
+
+ steps:
+ - name: Checkout Code
+ uses: actions/checkout@v4
+
+ - name: Free up disk space
+ run: |
+ echo "Before cleanup:"
+ df -h
+
+ sudo rm -rf /opt/ghc
+ sudo rm -rf /opt/hostedtoolcache/CodeQL
+ sudo rm -rf /opt/hostedtoolcache/PyPy
+ sudo rm -rf /opt/hostedtoolcache/go
+ sudo rm -rf /opt/hostedtoolcache/node
+ sudo rm -rf /opt/hostedtoolcache/Ruby
+ sudo rm -rf /opt/microsoft
+ sudo rm -rf /opt/pipx
+ sudo rm -rf /opt/az
+ sudo rm -rf /opt/google
+
+
+ sudo rm -rf /usr/lib/jvm
+ sudo rm -rf /usr/lib/google-cloud-sdk
+ sudo rm -rf /usr/lib/dotnet
+
+ sudo rm -rf /usr/local/lib/android
+ sudo rm -rf /usr/local/.ghcup
+ sudo rm -rf /usr/local/julia1.11.5
+ sudo rm -rf /usr/local/share/powershell
+ sudo rm -rf /usr/local/share/chromium
+
+ sudo rm -rf /usr/share/swift
+ sudo rm -rf /usr/share/miniconda
+ sudo rm -rf /usr/share/az_12.1.0
+ sudo rm -rf /usr/share/dotnet
+
+ echo "After cleanup:"
+ df -h
+
+ - name: Set up Docker Buildx
+ uses: docker/setup-buildx-action@v3
+
+ - name: Log in to Docker Hub
+ uses: docker/login-action@v3
+ with:
+ username: ${{ secrets.DOCKER_HUB_USERNAME }}
+ password: ${{ secrets.DOCKER_HUB_PASSWORD }}
+
+ - name: Build and Push Docker Image (amd64)
+ uses: docker/build-push-action@v5
+ with:
+ context: .
+ file: ./Dockerfile
+ push: true
+ platforms: linux/amd64
+ build-args: |
+ LITE=${{ matrix.lite }}
+ TORCH_BASE=${{ matrix.torch_base }}
+ CUDA_VERSION=${{ matrix.cuda_version }}
+ WORKFLOW=true
+ tags: |
+ xxxxrt666/gpt-sovits:${{ matrix.tag_prefix }}-${{ needs.generate-meta.outputs.tag }}-amd64
+ xxxxrt666/gpt-sovits:latest-${{ matrix.tag_prefix }}-amd64
+
+ build-arm64:
+ needs: generate-meta
+ runs-on: ubuntu-22.04-arm
+ strategy:
+ matrix:
+ include:
+ - cuda_version: 12.6
+ lite: true
+ torch_base: lite
+ tag_prefix: cu126-lite
+ - cuda_version: 12.6
+ lite: false
+ torch_base: full
+ tag_prefix: cu126
+ - cuda_version: 12.8
+ lite: true
+ torch_base: lite
+ tag_prefix: cu128-lite
+ - cuda_version: 12.8
+ lite: false
+ torch_base: full
+ tag_prefix: cu128
+
+ steps:
+ - name: Checkout Code
+ uses: actions/checkout@v4
+
+ - name: Free up disk space
+ run: |
+ echo "Before cleanup:"
+ df -h
+
+ sudo rm -rf /opt/ghc
+ sudo rm -rf /opt/hostedtoolcache/CodeQL
+ sudo rm -rf /opt/hostedtoolcache/PyPy
+ sudo rm -rf /opt/hostedtoolcache/go
+ sudo rm -rf /opt/hostedtoolcache/node
+ sudo rm -rf /opt/hostedtoolcache/Ruby
+ sudo rm -rf /opt/microsoft
+ sudo rm -rf /opt/pipx
+ sudo rm -rf /opt/az
+ sudo rm -rf /opt/google
+
+
+ sudo rm -rf /usr/lib/jvm
+ sudo rm -rf /usr/lib/google-cloud-sdk
+ sudo rm -rf /usr/lib/dotnet
+
+ sudo rm -rf /usr/local/lib/android
+ sudo rm -rf /usr/local/.ghcup
+ sudo rm -rf /usr/local/julia1.11.5
+ sudo rm -rf /usr/local/share/powershell
+ sudo rm -rf /usr/local/share/chromium
+
+ sudo rm -rf /usr/share/swift
+ sudo rm -rf /usr/share/miniconda
+ sudo rm -rf /usr/share/az_12.1.0
+ sudo rm -rf /usr/share/dotnet
+
+ echo "After cleanup:"
+ df -h
+
+ - name: Set up Docker Buildx
+ uses: docker/setup-buildx-action@v3
+
+ - name: Log in to Docker Hub
+ uses: docker/login-action@v3
+ with:
+ username: ${{ secrets.DOCKER_HUB_USERNAME }}
+ password: ${{ secrets.DOCKER_HUB_PASSWORD }}
+
+ - name: Build and Push Docker Image (arm64)
+ uses: docker/build-push-action@v5
+ with:
+ context: .
+ file: ./Dockerfile
+ push: true
+ platforms: linux/arm64
+ build-args: |
+ LITE=${{ matrix.lite }}
+ TORCH_BASE=${{ matrix.torch_base }}
+ CUDA_VERSION=${{ matrix.cuda_version }}
+ WORKFLOW=true
+ tags: |
+ xxxxrt666/gpt-sovits:${{ matrix.tag_prefix }}-${{ needs.generate-meta.outputs.tag }}-arm64
+ xxxxrt666/gpt-sovits:latest-${{ matrix.tag_prefix }}-arm64
+
+
+ merge-and-clean:
+ needs:
+ - build-amd64
+ - build-arm64
+ - generate-meta
+ runs-on: ubuntu-latest
+ strategy:
+ matrix:
+ include:
+ - tag_prefix: cu126-lite
+ - tag_prefix: cu126
+ - tag_prefix: cu128-lite
+ - tag_prefix: cu128
+
+ steps:
+ - name: Set up Docker Buildx
+ uses: docker/setup-buildx-action@v3
+
+ - name: Log in to Docker Hub
+ uses: docker/login-action@v3
+ with:
+ username: ${{ secrets.DOCKER_HUB_USERNAME }}
+ password: ${{ secrets.DOCKER_HUB_PASSWORD }}
+
+ - name: Merge amd64 and arm64 into multi-arch image
+ run: |
+ DATE_TAG=${{ needs.generate-meta.outputs.tag }}
+ TAG_PREFIX=${{ matrix.tag_prefix }}
+
+ docker buildx imagetools create \
+ --tag ${{ secrets.DOCKER_HUB_USERNAME }}/gpt-sovits:${TAG_PREFIX}-${DATE_TAG} \
+ ${{ secrets.DOCKER_HUB_USERNAME }}/gpt-sovits:${TAG_PREFIX}-${DATE_TAG}-amd64 \
+ ${{ secrets.DOCKER_HUB_USERNAME }}/gpt-sovits:${TAG_PREFIX}-${DATE_TAG}-arm64
+
+ docker buildx imagetools create \
+ --tag ${{ secrets.DOCKER_HUB_USERNAME }}/gpt-sovits:latest-${TAG_PREFIX} \
+ ${{ secrets.DOCKER_HUB_USERNAME }}/gpt-sovits:latest-${TAG_PREFIX}-amd64 \
+ ${{ secrets.DOCKER_HUB_USERNAME }}/gpt-sovits:latest-${TAG_PREFIX}-arm64
+ - name: Delete old platform-specific tags via Docker Hub API
+ env:
+ DOCKER_HUB_USERNAME: ${{ secrets.DOCKER_HUB_USERNAME }}
+ DOCKER_HUB_TOKEN: ${{ secrets.DOCKER_HUB_PASSWORD }}
+ TAG_PREFIX: ${{ matrix.tag_prefix }}
+ DATE_TAG: ${{ needs.generate-meta.outputs.tag }}
+ run: |
+ sudo apt-get update && sudo apt-get install -y jq
+
+ TOKEN=$(curl -s -u $DOCKER_HUB_USERNAME:$DOCKER_HUB_TOKEN \
+ "https://auth.docker.io/token?service=registry.docker.io&scope=repository:$DOCKER_HUB_USERNAME/gpt-sovits:pull,push,delete" \
+ | jq -r .token)
+
+ for PLATFORM in amd64 arm64; do
+ SAFE_PLATFORM=$(echo $PLATFORM | sed 's/\//-/g')
+ TAG="${TAG_PREFIX}-${DATE_TAG}-${SAFE_PLATFORM}"
+ LATEST_TAG="latest-${TAG_PREFIX}-${SAFE_PLATFORM}"
+
+ for DEL_TAG in "$TAG" "$LATEST_TAG"; do
+ echo "Deleting tag: $DEL_TAG"
+ curl -X DELETE -H "Authorization: Bearer $TOKEN" https://registry-1.docker.io/v2/$DOCKER_HUB_USERNAME/gpt-sovits/manifests/$DEL_TAG
+ done
+ done
+ create-default:
+ runs-on: ubuntu-latest
+ needs:
+ - merge-and-clean
+ steps:
+ - name: Set up Docker Buildx
+ uses: docker/setup-buildx-action@v3
+
+ - name: Log in to Docker Hub
+ uses: docker/login-action@v3
+ with:
+ username: ${{ secrets.DOCKER_HUB_USERNAME }}
+ password: ${{ secrets.DOCKER_HUB_PASSWORD }}
+
+ - name: Create Default Tag
+ run: |
+ docker buildx imagetools create \
+ --tag ${{ secrets.DOCKER_HUB_USERNAME }}/gpt-sovits:latest \
+ ${{ secrets.DOCKER_HUB_USERNAME }}/gpt-sovits:latest-cu126-lite
+
\ No newline at end of file
diff --git a/.gitignore b/.gitignore
index 0bb4e0b..d280e45 100644
--- a/.gitignore
+++ b/.gitignore
@@ -7,13 +7,8 @@ runtime
.idea
output
logs
-reference
-GPT_weights
-SoVITS_weights
-GPT_weights_v2
-SoVITS_weights_v2
-GPT_weights_v3
-SoVITS_weights_v3
+SoVITS_weights*/
+GPT_weights*/
TEMP
weight.json
ffmpeg*
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
new file mode 100644
index 0000000..2434e74
--- /dev/null
+++ b/.pre-commit-config.yaml
@@ -0,0 +1,15 @@
+ci:
+ autoupdate_schedule: monthly
+
+repos:
+- repo: https://github.com/astral-sh/ruff-pre-commit
+ rev: v0.11.7
+ hooks:
+ # Run the linter.
+ - id: ruff
+ types_or: [ python, pyi ]
+ args: [ --fix ]
+ # Run the formatter.
+ - id: ruff-format
+ types_or: [ python, pyi ]
+ args: [ --line-length, "120", --target-version, "py310" ]
diff --git a/Colab-Inference.ipynb b/Colab-Inference.ipynb
index 8a31701..b962c9b 100644
--- a/Colab-Inference.ipynb
+++ b/Colab-Inference.ipynb
@@ -1,5 +1,12 @@
{
"cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "
"
+ ]
+ },
{
"cell_type": "markdown",
"metadata": {},
@@ -39,9 +46,9 @@
"\n",
"cd GPT-SoVITS\n",
"\n",
- "mkdir GPT_weights\n",
+ "mkdir -p GPT_weights\n",
"\n",
- "mkdir SoVITS_weights\n",
+ "mkdir -p SoVITS_weights\n",
"\n",
"if conda env list | awk '{print $1}' | grep -Fxq \"GPTSoVITS\"; then\n",
" :\n",
@@ -53,7 +60,7 @@
"\n",
"pip install ipykernel\n",
"\n",
- "bash install.sh --source HF"
+ "bash install.sh --device CU126 --source HF"
]
},
{
diff --git a/colab_webui.ipynb b/Colab-WebUI.ipynb
similarity index 95%
rename from colab_webui.ipynb
rename to Colab-WebUI.ipynb
index b410775..b1403f3 100644
--- a/colab_webui.ipynb
+++ b/Colab-WebUI.ipynb
@@ -7,7 +7,7 @@
"id": "view-in-github"
},
"source": [
- "
"
+ "
"
]
},
{
@@ -59,7 +59,7 @@
"\n",
"pip install ipykernel\n",
"\n",
- "bash install.sh --source HF --download-uvr5"
+ "bash install.sh --device CU126 --source HF --download-uvr5"
]
},
{
diff --git a/Docker/damo.sha256 b/Docker/damo.sha256
deleted file mode 100644
index 6e9804d..0000000
--- a/Docker/damo.sha256
+++ /dev/null
@@ -1,3 +0,0 @@
-5bba782a5e9196166233b9ab12ba04cadff9ef9212b4ff6153ed9290ff679025 /workspace/tools/damo_asr/models/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/model.pb
-b3be75be477f0780277f3bae0fe489f48718f585f3a6e45d7dd1fbb1a4255fc5 /workspace/tools/damo_asr/models/speech_fsmn_vad_zh-cn-16k-common-pytorch/model.pb
-a5818bb9d933805a916eebe41eb41648f7f9caad30b4bd59d56f3ca135421916 /workspace/tools/damo_asr/models/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/model.pb
\ No newline at end of file
diff --git a/Docker/download.py b/Docker/download.py
deleted file mode 100644
index 952423d..0000000
--- a/Docker/download.py
+++ /dev/null
@@ -1,8 +0,0 @@
-# Download moda ASR related models
-from modelscope import snapshot_download
-
-model_dir = snapshot_download(
- "damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch", revision="v2.0.4"
-)
-model_dir = snapshot_download("damo/speech_fsmn_vad_zh-cn-16k-common-pytorch", revision="v2.0.4")
-model_dir = snapshot_download("damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch", revision="v2.0.4")
diff --git a/Docker/download.sh b/Docker/download.sh
deleted file mode 100644
index 447e018..0000000
--- a/Docker/download.sh
+++ /dev/null
@@ -1,11 +0,0 @@
-#!/usr/bin/env bash
-
-set -Eeuo pipefail
-
-echo "Downloading models..."
-
-aria2c --disable-ipv6 --input-file /workspace/Docker/links.txt --dir /workspace --continue
-
-echo "Checking SHA256..."
-
-parallel --will-cite -a /workspace/Docker/links.sha256 "echo -n {} | sha256sum -c"
diff --git a/Docker/install_wrapper.sh b/Docker/install_wrapper.sh
new file mode 100644
index 0000000..6dd93e5
--- /dev/null
+++ b/Docker/install_wrapper.sh
@@ -0,0 +1,33 @@
+#!/bin/bash
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" &>/dev/null && pwd)"
+
+cd "$SCRIPT_DIR" || exit 1
+
+cd .. || exit 1
+
+set -e
+
+source "$HOME/miniconda3/etc/profile.d/conda.sh"
+
+mkdir -p GPT_SoVITS
+
+mkdir -p GPT_SoVITS/text
+
+ln -s /workspace/models/pretrained_models /workspace/GPT-SoVITS/GPT_SoVITS/pretrained_models
+
+ln -s /workspace/models/G2PWModel /workspace/GPT-SoVITS/GPT_SoVITS/text/G2PWModel
+
+bash install.sh --device "CU${CUDA_VERSION//./}" --source HF
+
+pip cache purge
+
+pip show torch
+
+rm -rf /tmp/* /var/tmp/*
+
+rm -rf "$HOME/miniconda3/pkgs"
+
+mkdir -p "$HOME/miniconda3/pkgs"
+
+rm -rf /root/.conda /root/.cache
diff --git a/Docker/links.sha256 b/Docker/links.sha256
deleted file mode 100644
index cda6dc1..0000000
--- a/Docker/links.sha256
+++ /dev/null
@@ -1,12 +0,0 @@
-b1c1e17e9c99547a89388f72048cd6e1b41b5a18b170e86a46dfde0324d63eb1 /workspace/GPT_SoVITS/pretrained_models/s1bert25hz-2kh-longer-epoch=68e-step=50232.ckpt
-fc579c1db3c1e21b721001cf99d7a584214280df19b002e200b630a34fa06eb8 /workspace/GPT_SoVITS/pretrained_models/s2D488k.pth
-020a014e1e01e550e510f2f61fae5e5f5b6aab40f15c22f1f12f724df507e835 /workspace/GPT_SoVITS/pretrained_models/s2G488k.pth
-24164f129c66499d1346e2aa55f183250c223161ec2770c0da3d3b08cf432d3c /workspace/GPT_SoVITS/pretrained_models/chinese-hubert-base/pytorch_model.bin
-e53a693acc59ace251d143d068096ae0d7b79e4b1b503fa84c9dcf576448c1d8 /workspace/GPT_SoVITS/pretrained_models/chinese-roberta-wwm-ext-large/pytorch_model.bin
-39796caa5db18d7f9382d8ac997ac967bfd85f7761014bb807d2543cc844ef05 /workspace/tools/uvr5/uvr5_weights/HP2_all_vocals.pth
-45e6b65199e781b4a6542002699be9f19cd3d1cb7d1558bc2bfbcd84674dfe28 /workspace/tools/uvr5/uvr5_weights/HP3_all_vocals.pth
-5908891829634926119720241e8573d97cbeb8277110a7512bdb0bd7563258ee /workspace/tools/uvr5/uvr5_weights/HP5_only_main_vocal.pth
-8c8fd1582f9aabc363e47af62ddb88df6cae7e064cae75bbf041a067a5e0aee2 /workspace/tools/uvr5/uvr5_weights/VR-DeEchoAggressive.pth
-01376dd2a571bf3cb9cced680732726d2d732609d09216a610b0d110f133febe /workspace/tools/uvr5/uvr5_weights/VR-DeEchoDeReverb.pth
-56aba59db3bcdd14a14464e62f3129698ecdea62eee0f003b9360923eb3ac79e /workspace/tools/uvr5/uvr5_weights/VR-DeEchoNormal.pth
-233bb5c6aaa365e568659a0a81211746fa881f8f47f82d9e864fce1f7692db80 /workspace/tools/uvr5/uvr5_weights/onnx_dereverb_By_FoxJoy/vocals.onnx
\ No newline at end of file
diff --git a/Docker/links.txt b/Docker/links.txt
deleted file mode 100644
index e6603db..0000000
--- a/Docker/links.txt
+++ /dev/null
@@ -1,34 +0,0 @@
-# GPT-SoVITS models
-https://huggingface.co/lj1995/GPT-SoVITS/resolve/main/s1bert25hz-2kh-longer-epoch%3D68e-step%3D50232.ckpt
- out=GPT_SoVITS/pretrained_models/s1bert25hz-2kh-longer-epoch=68e-step=50232.ckpt
-https://huggingface.co/lj1995/GPT-SoVITS/resolve/main/s2D488k.pth
- out=GPT_SoVITS/pretrained_models/s2D488k.pth
-https://huggingface.co/lj1995/GPT-SoVITS/resolve/main/s2G488k.pth
- out=GPT_SoVITS/pretrained_models/s2G488k.pth
-https://huggingface.co/lj1995/GPT-SoVITS/resolve/main/chinese-hubert-base/config.json
- out=GPT_SoVITS/pretrained_models/chinese-hubert-base/config.json
-https://huggingface.co/lj1995/GPT-SoVITS/resolve/main/chinese-hubert-base/preprocessor_config.json
- out=GPT_SoVITS/pretrained_models/chinese-hubert-base/preprocessor_config.json
-https://huggingface.co/lj1995/GPT-SoVITS/resolve/main/chinese-hubert-base/pytorch_model.bin
- out=GPT_SoVITS/pretrained_models/chinese-hubert-base/pytorch_model.bin
-https://huggingface.co/lj1995/GPT-SoVITS/resolve/main/chinese-roberta-wwm-ext-large/config.json
- out=GPT_SoVITS/pretrained_models/chinese-roberta-wwm-ext-large/config.json
-https://huggingface.co/lj1995/GPT-SoVITS/resolve/main/chinese-roberta-wwm-ext-large/pytorch_model.bin
- out=GPT_SoVITS/pretrained_models/chinese-roberta-wwm-ext-large/pytorch_model.bin
-https://huggingface.co/lj1995/GPT-SoVITS/resolve/main/chinese-roberta-wwm-ext-large/tokenizer.json
- out=GPT_SoVITS/pretrained_models/chinese-roberta-wwm-ext-large/tokenizer.json
-# UVR5
-https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/uvr5_weights/HP2_all_vocals.pth
- out=tools/uvr5/uvr5_weights/HP2_all_vocals.pth
-https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/uvr5_weights/HP3_all_vocals.pth
- out=tools/uvr5/uvr5_weights/HP3_all_vocals.pth
-https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/uvr5_weights/HP5_only_main_vocal.pth
- out=tools/uvr5/uvr5_weights/HP5_only_main_vocal.pth
-https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/uvr5_weights/VR-DeEchoAggressive.pth
- out=tools/uvr5/uvr5_weights/VR-DeEchoAggressive.pth
-https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/uvr5_weights/VR-DeEchoDeReverb.pth
- out=tools/uvr5/uvr5_weights/VR-DeEchoDeReverb.pth
-https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/uvr5_weights/VR-DeEchoNormal.pth
- out=tools/uvr5/uvr5_weights/VR-DeEchoNormal.pth
-https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/uvr5_weights/onnx_dereverb_By_FoxJoy/vocals.onnx
- out=tools/uvr5/uvr5_weights/onnx_dereverb_By_FoxJoy/vocals.onnx
\ No newline at end of file
diff --git a/Docker/miniconda_install.sh b/Docker/miniconda_install.sh
new file mode 100644
index 0000000..001a2a4
--- /dev/null
+++ b/Docker/miniconda_install.sh
@@ -0,0 +1,70 @@
+#!/bin/bash
+
+set -e
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" &>/dev/null && pwd)"
+
+cd "$SCRIPT_DIR" || exit 1
+
+cd .. || exit 1
+
+if [ -d "$HOME/miniconda3" ]; then
+ exit 0
+fi
+
+WORKFLOW=${WORKFLOW:-"false"}
+TARGETPLATFORM=${TARGETPLATFORM:-"linux/amd64"}
+
+if [ "$WORKFLOW" = "true" ]; then
+ WGET_CMD=(wget -nv --tries=25 --wait=5 --read-timeout=40 --retry-on-http-error=404)
+else
+ WGET_CMD=(wget --tries=25 --wait=5 --read-timeout=40 --retry-on-http-error=404)
+fi
+
+if [ "$TARGETPLATFORM" = "linux/amd64" ]; then
+ "${WGET_CMD[@]}" -O miniconda.sh https://repo.anaconda.com/miniconda/Miniconda3-py311_25.3.1-1-Linux-x86_64.sh
+elif [ "$TARGETPLATFORM" = "linux/arm64" ]; then
+ "${WGET_CMD[@]}" -O miniconda.sh https://repo.anaconda.com/miniconda/Miniconda3-py311_25.3.1-1-Linux-aarch64.sh
+else
+ exit 1
+fi
+
+LOG_PATH="/tmp/miniconda-install.log"
+
+bash miniconda.sh -b -p "$HOME/miniconda3" >"$LOG_PATH" 2>&1
+
+if [ $? -eq 0 ]; then
+ echo "== Miniconda Installed =="
+else
+ echo "Failed to Install miniconda"
+ tail -n 50 "$LOG_PATH"
+ exit 1
+fi
+
+rm miniconda.sh
+
+source "$HOME/miniconda3/etc/profile.d/conda.sh"
+
+"$HOME/miniconda3/bin/conda" config --add channels conda-forge
+
+"$HOME/miniconda3/bin/conda" update -q --all -y 1>/dev/null
+
+"$HOME/miniconda3/bin/conda" install python=3.11 -q -y
+
+"$HOME/miniconda3/bin/conda" install gcc=14 gxx ffmpeg cmake make unzip -q -y
+
+if [ "$CUDA_VERSION" = "12.8" ]; then
+ "$HOME/miniconda3/bin/pip" install torch torchaudio --no-cache-dir --index-url https://download.pytorch.org/whl/cu128
+elif [ "$CUDA_VERSION" = "12.6" ]; then
+ "$HOME/miniconda3/bin/pip" install torch==2.6 torchaudio --no-cache-dir --index-url https://download.pytorch.org/whl/cu126
+fi
+
+"$HOME/miniconda3/bin/pip" cache purge
+
+rm $LOG_PATH
+
+rm -rf "$HOME/miniconda3/pkgs"
+
+mkdir -p "$HOME/miniconda3/pkgs"
+
+rm -rf "$HOME/.conda" "$HOME/.cache"
diff --git a/Dockerfile b/Dockerfile
index 80cd9f3..71bf6fa 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -1,42 +1,62 @@
-# Base CUDA image
-FROM cnstark/pytorch:2.0.1-py3.9.17-cuda11.8.0-ubuntu20.04
+ARG CUDA_VERSION=12.6
+ARG TORCH_BASE=full
-LABEL maintainer="breakstring@hotmail.com"
-LABEL version="dev-20240209"
+FROM xxxxrt666/torch-base:cu${CUDA_VERSION}-${TORCH_BASE}
+
+LABEL maintainer="XXXXRT"
+LABEL version="V4"
LABEL description="Docker image for GPT-SoVITS"
+ARG CUDA_VERSION=12.6
-# Install 3rd party apps
-ENV DEBIAN_FRONTEND=noninteractive
-ENV TZ=Etc/UTC
-RUN apt-get update && \
- apt-get install -y --no-install-recommends tzdata ffmpeg libsox-dev parallel aria2 git git-lfs && \
- git lfs install && \
- rm -rf /var/lib/apt/lists/*
+ENV CUDA_VERSION=${CUDA_VERSION}
-# Copy only requirements.txt initially to leverage Docker cache
-WORKDIR /workspace
-COPY requirements.txt /workspace/
-RUN pip install --no-cache-dir -r requirements.txt
+SHELL ["/bin/bash", "-c"]
+
+WORKDIR /workspace/GPT-SoVITS
+
+COPY Docker /workspace/GPT-SoVITS/Docker/
+
+ARG LITE=false
+ENV LITE=${LITE}
+
+ARG WORKFLOW=false
+ENV WORKFLOW=${WORKFLOW}
+
+ARG TARGETPLATFORM
+ENV TARGETPLATFORM=${TARGETPLATFORM}
+
+RUN bash Docker/miniconda_install.sh
-# Define a build-time argument for image type
-ARG IMAGE_TYPE=full
+COPY extra-req.txt /workspace/GPT-SoVITS/
-# Conditional logic based on the IMAGE_TYPE argument
-# Always copy the Docker directory, but only use it if IMAGE_TYPE is not "elite"
-COPY ./Docker /workspace/Docker
-# elite 类型的镜像里面不包含额外的模型
-RUN if [ "$IMAGE_TYPE" != "elite" ]; then \
- chmod +x /workspace/Docker/download.sh && \
- /workspace/Docker/download.sh && \
- python /workspace/Docker/download.py && \
- python -m nltk.downloader averaged_perceptron_tagger cmudict; \
- fi
+COPY requirements.txt /workspace/GPT-SoVITS/
+COPY install.sh /workspace/GPT-SoVITS/
-# Copy the rest of the application
-COPY . /workspace
+RUN bash Docker/install_wrapper.sh
EXPOSE 9871 9872 9873 9874 9880
-CMD ["python", "webui.py"]
+ENV PYTHONPATH="/workspace/GPT-SoVITS"
+
+RUN conda init bash && echo "conda activate base" >> ~/.bashrc
+
+WORKDIR /workspace
+
+RUN rm -rf /workspace/GPT-SoVITS
+
+WORKDIR /workspace/GPT-SoVITS
+
+COPY . /workspace/GPT-SoVITS
+
+CMD ["/bin/bash", "-c", "\
+ rm -rf /workspace/GPT-SoVITS/GPT_SoVITS/pretrained_models && \
+ rm -rf /workspace/GPT-SoVITS/GPT_SoVITS/text/G2PWModel && \
+ rm -rf /workspace/GPT-SoVITS/tools/asr/models && \
+ rm -rf /workspace/GPT-SoVITS/tools/uvr5/uvr5_weights && \
+ ln -s /workspace/models/pretrained_models /workspace/GPT-SoVITS/GPT_SoVITS/pretrained_models && \
+ ln -s /workspace/models/G2PWModel /workspace/GPT-SoVITS/GPT_SoVITS/text/G2PWModel && \
+ ln -s /workspace/models/asr_models /workspace/GPT-SoVITS/tools/asr/models && \
+ ln -s /workspace/models/uvr5_weights /workspace/GPT-SoVITS/tools/uvr5/uvr5_weights && \
+ exec bash"]
\ No newline at end of file
diff --git a/GPT_SoVITS/TTS_infer_pack/TTS.py b/GPT_SoVITS/TTS_infer_pack/TTS.py
index d20daee..6ef46eb 100644
--- a/GPT_SoVITS/TTS_infer_pack/TTS.py
+++ b/GPT_SoVITS/TTS_infer_pack/TTS.py
@@ -108,7 +108,7 @@ resample_transform_dict = {}
def resample(audio_tensor, sr0, sr1, device):
global resample_transform_dict
- key="%s-%s"%(sr0,sr1)
+ key = "%s-%s" % (sr0, sr1)
if key not in resample_transform_dict:
resample_transform_dict[key] = torchaudio.transforms.Resample(sr0, sr1).to(device)
return resample_transform_dict[key](audio_tensor)
@@ -252,7 +252,6 @@ class TTS_Config:
"cnhuhbert_base_path": "GPT_SoVITS/pretrained_models/chinese-hubert-base",
"bert_base_path": "GPT_SoVITS/pretrained_models/chinese-roberta-wwm-ext-large",
},
-
}
configs: dict = None
v1_languages: list = ["auto", "en", "zh", "ja", "all_zh", "all_ja"]
@@ -432,7 +431,6 @@ class TTS:
"aux_ref_audio_paths": [],
}
-
self.stop_flag: bool = False
self.precision: torch.dtype = torch.float16 if self.configs.is_half else torch.float32
@@ -468,7 +466,7 @@ class TTS:
path_sovits = self.configs.default_configs[model_version]["vits_weights_path"]
if if_lora_v3 == True and os.path.exists(path_sovits) == False:
- info = path_sovits + i18n("SoVITS %s 底模缺失,无法加载相应 LoRA 权重"%model_version)
+ info = path_sovits + i18n("SoVITS %s 底模缺失,无法加载相应 LoRA 权重" % model_version)
raise FileExistsError(info)
# dict_s2 = torch.load(weights_path, map_location=self.configs.device,weights_only=False)
@@ -507,7 +505,7 @@ class TTS:
)
self.configs.use_vocoder = False
else:
- kwargs["version"]=model_version
+ kwargs["version"] = model_version
vits_model = SynthesizerTrnV3(
self.configs.filter_length // 2 + 1,
self.configs.segment_size // self.configs.hop_length,
@@ -572,7 +570,7 @@ class TTS:
self.vocoder.cpu()
del self.vocoder
self.empty_cache()
-
+
self.vocoder = BigVGAN.from_pretrained(
"%s/GPT_SoVITS/pretrained_models/models--nvidia--bigvgan_v2_24khz_100band_256x" % (now_dir,),
use_cuda_kernel=False,
@@ -595,18 +593,21 @@ class TTS:
self.empty_cache()
self.vocoder = Generator(
- initial_channel=100,
- resblock="1",
- resblock_kernel_sizes=[3, 7, 11],
- resblock_dilation_sizes=[[1, 3, 5], [1, 3, 5], [1, 3, 5]],
- upsample_rates=[10, 6, 2, 2, 2],
- upsample_initial_channel=512,
- upsample_kernel_sizes=[20, 12, 4, 4, 4],
- gin_channels=0, is_bias=True
- )
+ initial_channel=100,
+ resblock="1",
+ resblock_kernel_sizes=[3, 7, 11],
+ resblock_dilation_sizes=[[1, 3, 5], [1, 3, 5], [1, 3, 5]],
+ upsample_rates=[10, 6, 2, 2, 2],
+ upsample_initial_channel=512,
+ upsample_kernel_sizes=[20, 12, 4, 4, 4],
+ gin_channels=0,
+ is_bias=True,
+ )
self.vocoder.remove_weight_norm()
- state_dict_g = torch.load("%s/GPT_SoVITS/pretrained_models/gsv-v4-pretrained/vocoder.pth" % (now_dir,), map_location="cpu")
- print("loading vocoder",self.vocoder.load_state_dict(state_dict_g))
+ state_dict_g = torch.load(
+ "%s/GPT_SoVITS/pretrained_models/gsv-v4-pretrained/vocoder.pth" % (now_dir,), map_location="cpu"
+ )
+ print("loading vocoder", self.vocoder.load_state_dict(state_dict_g))
self.vocoder_configs["sr"] = 48000
self.vocoder_configs["T_ref"] = 500
@@ -614,9 +615,6 @@ class TTS:
self.vocoder_configs["upsample_rate"] = 480
self.vocoder_configs["overlapped_len"] = 12
-
-
-
self.vocoder = self.vocoder.eval()
if self.configs.is_half == True:
self.vocoder = self.vocoder.half().to(self.configs.device)
@@ -1439,7 +1437,7 @@ class TTS:
ref_audio = ref_audio.to(self.configs.device).float()
if ref_audio.shape[0] == 2:
ref_audio = ref_audio.mean(0).unsqueeze(0)
-
+
# tgt_sr = self.vocoder_configs["sr"]
tgt_sr = 24000 if self.configs.version == "v3" else 32000
if ref_sr != tgt_sr:
diff --git a/GPT_SoVITS/inference_webui.py b/GPT_SoVITS/inference_webui.py
index 4bee27c..4682014 100644
--- a/GPT_SoVITS/inference_webui.py
+++ b/GPT_SoVITS/inference_webui.py
@@ -7,11 +7,17 @@
全部按日文识别
"""
+import json
import logging
+import os
+import re
+import sys
import traceback
import warnings
+import torch
import torchaudio
+from text.LangSegmenter import LangSegmenter
logging.getLogger("markdown_it").setLevel(logging.ERROR)
logging.getLogger("urllib3").setLevel(logging.ERROR)
@@ -23,20 +29,6 @@ logging.getLogger("torchaudio._extension").setLevel(logging.ERROR)
logging.getLogger("multipart.multipart").setLevel(logging.ERROR)
warnings.simplefilter(action="ignore", category=FutureWarning)
-import json
-import os
-import re
-import sys
-
-import torch
-from text.LangSegmenter import LangSegmenter
-
-try:
- import gradio.analytics as analytics
-
- analytics.version_check = lambda: None
-except:
- ...
version = model_version = os.environ.get("version", "v2")
path_sovits_v3 = "GPT_SoVITS/pretrained_models/s2Gv3.pth"
path_sovits_v4 = "GPT_SoVITS/pretrained_models/gsv-v4-pretrained/s2Gv4.pth"
@@ -106,7 +98,7 @@ cnhubert.cnhubert_base_path = cnhubert_base_path
import random
-from GPT_SoVITS.module.models import SynthesizerTrn, SynthesizerTrnV3,Generator
+from GPT_SoVITS.module.models import Generator, SynthesizerTrn, SynthesizerTrnV3
def set_seed(seed):
@@ -226,9 +218,9 @@ else:
resample_transform_dict = {}
-def resample(audio_tensor, sr0,sr1):
+def resample(audio_tensor, sr0, sr1):
global resample_transform_dict
- key="%s-%s"%(sr0,sr1)
+ key = "%s-%s" % (sr0, sr1)
if key not in resample_transform_dict:
resample_transform_dict[key] = torchaudio.transforms.Resample(sr0, sr1).to(device)
return resample_transform_dict[key](audio_tensor)
@@ -238,14 +230,18 @@ def resample(audio_tensor, sr0,sr1):
# symbol_version-model_version-if_lora_v3
from process_ckpt import get_sovits_version_from_path_fast, load_sovits_new
-v3v4set={"v3","v4"}
+v3v4set = {"v3", "v4"}
+
+
def change_sovits_weights(sovits_path, prompt_language=None, text_language=None):
global vq_model, hps, version, model_version, dict_language, if_lora_v3
version, model_version, if_lora_v3 = get_sovits_version_from_path_fast(sovits_path)
- print(sovits_path,version, model_version, if_lora_v3)
- is_exist=is_exist_s2gv3 if model_version=="v3"else is_exist_s2gv4
+ print(sovits_path, version, model_version, if_lora_v3)
+ is_exist = is_exist_s2gv3 if model_version == "v3" else is_exist_s2gv4
if if_lora_v3 == True and is_exist == False:
- info = "GPT_SoVITS/pretrained_models/s2Gv3.pth" + i18n("SoVITS %s 底模缺失,无法加载相应 LoRA 权重"%model_version)
+ info = "GPT_SoVITS/pretrained_models/s2Gv3.pth" + i18n(
+ "SoVITS %s 底模缺失,无法加载相应 LoRA 权重" % model_version
+ )
gr.Warning(info)
raise FileExistsError(info)
dict_language = dict_language_v1 if version == "v1" else dict_language_v2
@@ -276,10 +272,15 @@ def change_sovits_weights(sovits_path, prompt_language=None, text_language=None)
prompt_language_update,
text_update,
text_language_update,
- {"__type__": "update", "visible": visible_sample_steps, "value": 32 if model_version=="v3"else 8,"choices":[4, 8, 16, 32,64,128]if model_version=="v3"else [4, 8, 16, 32]},
+ {
+ "__type__": "update",
+ "visible": visible_sample_steps,
+ "value": 32 if model_version == "v3" else 8,
+ "choices": [4, 8, 16, 32, 64, 128] if model_version == "v3" else [4, 8, 16, 32],
+ },
{"__type__": "update", "visible": visible_inp_refs},
{"__type__": "update", "value": False, "interactive": True if model_version not in v3v4set else False},
- {"__type__": "update", "visible": True if model_version =="v3" else False},
+ {"__type__": "update", "visible": True if model_version == "v3" else False},
{"__type__": "update", "value": i18n("模型加载中,请等待"), "interactive": False},
)
@@ -304,7 +305,7 @@ def change_sovits_weights(sovits_path, prompt_language=None, text_language=None)
)
model_version = version
else:
- hps.model.version=model_version
+ hps.model.version = model_version
vq_model = SynthesizerTrnV3(
hps.data.filter_length // 2 + 1,
hps.train.segment_size // hps.data.hop_length,
@@ -326,7 +327,7 @@ def change_sovits_weights(sovits_path, prompt_language=None, text_language=None)
else:
path_sovits = path_sovits_v3 if model_version == "v3" else path_sovits_v4
print(
- "loading sovits_%spretrained_G"%model_version,
+ "loading sovits_%spretrained_G" % model_version,
vq_model.load_state_dict(load_sovits_new(path_sovits)["weight"], strict=False),
)
lora_rank = dict_s2["lora_rank"]
@@ -337,7 +338,7 @@ def change_sovits_weights(sovits_path, prompt_language=None, text_language=None)
init_lora_weights=True,
)
vq_model.cfm = get_peft_model(vq_model.cfm, lora_config)
- print("loading sovits_%s_lora%s" % (model_version,lora_rank))
+ print("loading sovits_%s_lora%s" % (model_version, lora_rank))
vq_model.load_state_dict(dict_s2["weight"], strict=False)
vq_model.cfm = vq_model.cfm.merge_and_unload()
# torch.save(vq_model.state_dict(),"merge_win.pth")
@@ -350,10 +351,15 @@ def change_sovits_weights(sovits_path, prompt_language=None, text_language=None)
prompt_language_update,
text_update,
text_language_update,
- {"__type__": "update", "visible": visible_sample_steps, "value":32 if model_version=="v3"else 8,"choices":[4, 8, 16, 32,64,128]if model_version=="v3"else [4, 8, 16, 32]},
+ {
+ "__type__": "update",
+ "visible": visible_sample_steps,
+ "value": 32 if model_version == "v3" else 8,
+ "choices": [4, 8, 16, 32, 64, 128] if model_version == "v3" else [4, 8, 16, 32],
+ },
{"__type__": "update", "visible": visible_inp_refs},
{"__type__": "update", "value": False, "interactive": True if model_version not in v3v4set else False},
- {"__type__": "update", "visible": True if model_version =="v3" else False},
+ {"__type__": "update", "visible": True if model_version == "v3" else False},
{"__type__": "update", "value": i18n("合成语音"), "interactive": True},
)
with open("./weight.json") as f:
@@ -400,7 +406,7 @@ now_dir = os.getcwd()
def init_bigvgan():
- global bigvgan_model,hifigan_model
+ global bigvgan_model, hifigan_model
from BigVGAN import bigvgan
bigvgan_model = bigvgan.BigVGAN.from_pretrained(
@@ -411,17 +417,20 @@ def init_bigvgan():
bigvgan_model.remove_weight_norm()
bigvgan_model = bigvgan_model.eval()
if hifigan_model:
- hifigan_model=hifigan_model.cpu()
- hifigan_model=None
- try:torch.cuda.empty_cache()
- except:pass
+ hifigan_model = hifigan_model.cpu()
+ hifigan_model = None
+ try:
+ torch.cuda.empty_cache()
+ except:
+ pass
if is_half == True:
bigvgan_model = bigvgan_model.half().to(device)
else:
bigvgan_model = bigvgan_model.to(device)
+
def init_hifigan():
- global hifigan_model,bigvgan_model
+ global hifigan_model, bigvgan_model
hifigan_model = Generator(
initial_channel=100,
resblock="1",
@@ -430,26 +439,32 @@ def init_hifigan():
upsample_rates=[10, 6, 2, 2, 2],
upsample_initial_channel=512,
upsample_kernel_sizes=[20, 12, 4, 4, 4],
- gin_channels=0, is_bias=True
+ gin_channels=0,
+ is_bias=True,
)
hifigan_model.eval()
hifigan_model.remove_weight_norm()
- state_dict_g = torch.load("%s/GPT_SoVITS/pretrained_models/gsv-v4-pretrained/vocoder.pth" % (now_dir,), map_location="cpu")
- print("loading vocoder",hifigan_model.load_state_dict(state_dict_g))
+ state_dict_g = torch.load(
+ "%s/GPT_SoVITS/pretrained_models/gsv-v4-pretrained/vocoder.pth" % (now_dir,), map_location="cpu"
+ )
+ print("loading vocoder", hifigan_model.load_state_dict(state_dict_g))
if bigvgan_model:
- bigvgan_model=bigvgan_model.cpu()
- bigvgan_model=None
- try:torch.cuda.empty_cache()
- except:pass
+ bigvgan_model = bigvgan_model.cpu()
+ bigvgan_model = None
+ try:
+ torch.cuda.empty_cache()
+ except:
+ pass
if is_half == True:
hifigan_model = hifigan_model.half().to(device)
else:
hifigan_model = hifigan_model.to(device)
-bigvgan_model=hifigan_model=None
-if model_version=="v3":
+
+bigvgan_model = hifigan_model = None
+if model_version == "v3":
init_bigvgan()
-if model_version=="v4":
+if model_version == "v4":
init_hifigan()
@@ -831,17 +846,17 @@ def get_tts_wav(
ref_audio = ref_audio.to(device).float()
if ref_audio.shape[0] == 2:
ref_audio = ref_audio.mean(0).unsqueeze(0)
- tgt_sr=24000 if model_version=="v3"else 32000
+ tgt_sr = 24000 if model_version == "v3" else 32000
if sr != tgt_sr:
- ref_audio = resample(ref_audio, sr,tgt_sr)
+ ref_audio = resample(ref_audio, sr, tgt_sr)
# print("ref_audio",ref_audio.abs().mean())
- mel2 = mel_fn(ref_audio)if model_version=="v3"else mel_fn_v4(ref_audio)
+ mel2 = mel_fn(ref_audio) if model_version == "v3" else mel_fn_v4(ref_audio)
mel2 = norm_spec(mel2)
T_min = min(mel2.shape[2], fea_ref.shape[2])
mel2 = mel2[:, :, :T_min]
fea_ref = fea_ref[:, :, :T_min]
- Tref=468 if model_version=="v3"else 500
- Tchunk=934 if model_version=="v3"else 1000
+ Tref = 468 if model_version == "v3" else 500
+ Tchunk = 934 if model_version == "v3" else 1000
if T_min > Tref:
mel2 = mel2[:, :, -Tref:]
fea_ref = fea_ref[:, :, -Tref:]
@@ -866,13 +881,13 @@ def get_tts_wav(
cfm_resss.append(cfm_res)
cfm_res = torch.cat(cfm_resss, 2)
cfm_res = denorm_spec(cfm_res)
- if model_version=="v3":
+ if model_version == "v3":
if bigvgan_model == None:
init_bigvgan()
- else:#v4
+ else: # v4
if hifigan_model == None:
init_hifigan()
- vocoder_model=bigvgan_model if model_version=="v3"else hifigan_model
+ vocoder_model = bigvgan_model if model_version == "v3" else hifigan_model
with torch.inference_mode():
wav_gen = vocoder_model(cfm_res)
audio = wav_gen[0][0] # .cpu().detach().numpy()
@@ -886,9 +901,12 @@ def get_tts_wav(
t1 = ttime()
print("%.3f\t%.3f\t%.3f\t%.3f" % (t[0], sum(t[1::3]), sum(t[2::3]), sum(t[3::3])))
audio_opt = torch.cat(audio_opt, 0) # np.concatenate
- if model_version in {"v1","v2"}:opt_sr=32000
- elif model_version=="v3":opt_sr=24000
- else:opt_sr=48000#v4
+ if model_version in {"v1", "v2"}:
+ opt_sr = 32000
+ elif model_version == "v3":
+ opt_sr = 24000
+ else:
+ opt_sr = 48000 # v4
if if_sr == True and opt_sr == 24000:
print(i18n("音频超分中"))
audio_opt, opt_sr = audio_sr(audio_opt.unsqueeze(0), opt_sr)
@@ -1061,7 +1079,7 @@ def html_left(text, label="p"):
"""
-with gr.Blocks(title="GPT-SoVITS WebUI") as app:
+with gr.Blocks(title="GPT-SoVITS WebUI", analytics_enabled=False) as app:
gr.Markdown(
value=i18n("本软件以MIT协议开源, 作者不对软件具备任何控制力, 使用软件者、传播软件导出的声音者自负全责.")
+ "
"
@@ -1131,16 +1149,16 @@ with gr.Blocks(title="GPT-SoVITS WebUI") as app:
sample_steps = (
gr.Radio(
label=i18n("采样步数,如果觉得电,提高试试,如果觉得慢,降低试试"),
- value=32 if model_version=="v3"else 8,
- choices=[4, 8, 16, 32,64,128]if model_version=="v3"else [4, 8, 16, 32],
+ value=32 if model_version == "v3" else 8,
+ choices=[4, 8, 16, 32, 64, 128] if model_version == "v3" else [4, 8, 16, 32],
visible=True,
)
if model_version in v3v4set
else gr.Radio(
label=i18n("采样步数,如果觉得电,提高试试,如果觉得慢,降低试试"),
- choices=[4, 8, 16, 32,64,128]if model_version=="v3"else [4, 8, 16, 32],
+ choices=[4, 8, 16, 32, 64, 128] if model_version == "v3" else [4, 8, 16, 32],
visible=False,
- value=32 if model_version=="v3"else 8,
+ value=32 if model_version == "v3" else 8,
)
)
if_sr_Checkbox = gr.Checkbox(
@@ -1148,7 +1166,7 @@ with gr.Blocks(title="GPT-SoVITS WebUI") as app:
value=False,
interactive=True,
show_label=True,
- visible=False if model_version !="v3" else True,
+ visible=False if model_version != "v3" else True,
)
gr.Markdown(html_center(i18n("*请填写需要合成的目标文本和语种模式"), "h3"))
with gr.Row():
diff --git a/GPT_SoVITS/inference_webui_fast.py b/GPT_SoVITS/inference_webui_fast.py
index 311994b..0b9525e 100644
--- a/GPT_SoVITS/inference_webui_fast.py
+++ b/GPT_SoVITS/inference_webui_fast.py
@@ -14,6 +14,8 @@ import random
import re
import sys
+import torch
+
now_dir = os.getcwd()
sys.path.append(now_dir)
sys.path.append("%s/GPT_SoVITS" % (now_dir))
@@ -25,14 +27,6 @@ logging.getLogger("httpx").setLevel(logging.ERROR)
logging.getLogger("asyncio").setLevel(logging.ERROR)
logging.getLogger("charset_normalizer").setLevel(logging.ERROR)
logging.getLogger("torchaudio._extension").setLevel(logging.ERROR)
-import torch
-
-try:
- import gradio.analytics as analytics
-
- analytics.version_check = lambda: None
-except:
- ...
infer_ttswebui = os.environ.get("infer_ttswebui", 9872)
@@ -262,15 +256,17 @@ SoVITS_names, GPT_names = get_weights_names(GPT_weight_root, SoVITS_weight_root)
from process_ckpt import get_sovits_version_from_path_fast
-v3v4set={"v3","v4"}
+v3v4set = {"v3", "v4"}
+
+
def change_sovits_weights(sovits_path, prompt_language=None, text_language=None):
global version, model_version, dict_language, if_lora_v3
version, model_version, if_lora_v3 = get_sovits_version_from_path_fast(sovits_path)
# print(sovits_path,version, model_version, if_lora_v3)
- is_exist=is_exist_s2gv3 if model_version=="v3"else is_exist_s2gv4
+ is_exist = is_exist_s2gv3 if model_version == "v3" else is_exist_s2gv4
path_sovits = path_sovits_v3 if model_version == "v3" else path_sovits_v4
if if_lora_v3 == True and is_exist == False:
- info = path_sovits + i18n("SoVITS %s 底模缺失,无法加载相应 LoRA 权重"%model_version)
+ info = path_sovits + i18n("SoVITS %s 底模缺失,无法加载相应 LoRA 权重" % model_version)
gr.Warning(info)
raise FileExistsError(info)
dict_language = dict_language_v1 if version == "v1" else dict_language_v2
@@ -328,7 +324,7 @@ def change_sovits_weights(sovits_path, prompt_language=None, text_language=None)
f.write(json.dumps(data))
-with gr.Blocks(title="GPT-SoVITS WebUI") as app:
+with gr.Blocks(title="GPT-SoVITS WebUI", analytics_enabled=False) as app:
gr.Markdown(
value=i18n("本软件以MIT协议开源, 作者不对软件具备任何控制力, 使用软件者、传播软件导出的声音者自负全责.")
+ "
"
diff --git a/GPT_SoVITS/module/data_utils.py b/GPT_SoVITS/module/data_utils.py
index 1bda2b3..11f6b09 100644
--- a/GPT_SoVITS/module/data_utils.py
+++ b/GPT_SoVITS/module/data_utils.py
@@ -470,6 +470,7 @@ class TextAudioSpeakerCollateV3:
# return ssl_padded, spec_padded,mel_padded, ssl_lengths, spec_lengths, text_padded, text_lengths, wav_padded, wav_lengths,mel_lengths
return ssl_padded, spec_padded, mel_padded, ssl_lengths, spec_lengths, text_padded, text_lengths, mel_lengths
+
class TextAudioSpeakerLoaderV4(torch.utils.data.Dataset):
"""
1) loads audio, speaker_id, text pairs
@@ -596,7 +597,7 @@ class TextAudioSpeakerLoaderV4(torch.utils.data.Dataset):
audio_norm, self.filter_length, self.sampling_rate, self.hop_length, self.win_length, center=False
)
spec = torch.squeeze(spec, 0)
- spec1 = spectrogram_torch(audio_norm, 1280,32000, 320, 1280,center=False)
+ spec1 = spectrogram_torch(audio_norm, 1280, 32000, 320, 1280, center=False)
mel = spec_to_mel_torch(spec1, 1280, 100, 32000, 0, None)
mel = self.norm_spec(torch.squeeze(mel, 0))
return spec, mel
@@ -643,7 +644,7 @@ class TextAudioSpeakerCollateV4:
mel_lengths = torch.LongTensor(len(batch))
spec_padded = torch.FloatTensor(len(batch), batch[0][1].size(0), max_spec_len)
- mel_padded = torch.FloatTensor(len(batch), batch[0][2].size(0), max_spec_len*2)
+ mel_padded = torch.FloatTensor(len(batch), batch[0][2].size(0), max_spec_len * 2)
ssl_padded = torch.FloatTensor(len(batch), batch[0][0].size(1), max_ssl_len)
text_padded = torch.LongTensor(len(batch), max_text_len)
# wav_padded = torch.FloatTensor(len(batch), 1, max_wav_len)
diff --git a/GPT_SoVITS/module/mel_processing.py b/GPT_SoVITS/module/mel_processing.py
index 7a17c54..62c7b40 100644
--- a/GPT_SoVITS/module/mel_processing.py
+++ b/GPT_SoVITS/module/mel_processing.py
@@ -39,24 +39,36 @@ hann_window = {}
def spectrogram_torch(y, n_fft, sampling_rate, hop_size, win_size, center=False):
if torch.min(y) < -1.2:
- print('min value is ', torch.min(y))
+ print("min value is ", torch.min(y))
if torch.max(y) > 1.2:
- print('max value is ', torch.max(y))
+ print("max value is ", torch.max(y))
global hann_window
- dtype_device = str(y.dtype) + '_' + str(y.device)
+ dtype_device = str(y.dtype) + "_" + str(y.device)
# wnsize_dtype_device = str(win_size) + '_' + dtype_device
- key = "%s-%s-%s-%s-%s" %(dtype_device,n_fft, sampling_rate, hop_size, win_size)
+ key = "%s-%s-%s-%s-%s" % (dtype_device, n_fft, sampling_rate, hop_size, win_size)
# if wnsize_dtype_device not in hann_window:
if key not in hann_window:
# hann_window[wnsize_dtype_device] = torch.hann_window(win_size).to(dtype=y.dtype, device=y.device)
hann_window[key] = torch.hann_window(win_size).to(dtype=y.dtype, device=y.device)
- y = torch.nn.functional.pad(y.unsqueeze(1), (int((n_fft-hop_size)/2), int((n_fft-hop_size)/2)), mode='reflect')
+ y = torch.nn.functional.pad(
+ y.unsqueeze(1), (int((n_fft - hop_size) / 2), int((n_fft - hop_size) / 2)), mode="reflect"
+ )
y = y.squeeze(1)
# spec = torch.stft(y, n_fft, hop_length=hop_size, win_length=win_size, window=hann_window[wnsize_dtype_device],
- spec = torch.stft(y, n_fft, hop_length=hop_size, win_length=win_size, window=hann_window[key],
- center=center, pad_mode='reflect', normalized=False, onesided=True, return_complex=False)
+ spec = torch.stft(
+ y,
+ n_fft,
+ hop_length=hop_size,
+ win_length=win_size,
+ window=hann_window[key],
+ center=center,
+ pad_mode="reflect",
+ normalized=False,
+ onesided=True,
+ return_complex=False,
+ )
spec = torch.sqrt(spec.pow(2).sum(-1) + 1e-8)
return spec
@@ -64,9 +76,9 @@ def spectrogram_torch(y, n_fft, sampling_rate, hop_size, win_size, center=False)
def spec_to_mel_torch(spec, n_fft, num_mels, sampling_rate, fmin, fmax):
global mel_basis
- dtype_device = str(spec.dtype) + '_' + str(spec.device)
+ dtype_device = str(spec.dtype) + "_" + str(spec.device)
# fmax_dtype_device = str(fmax) + '_' + dtype_device
- key = "%s-%s-%s-%s-%s-%s"%(dtype_device,n_fft, num_mels, sampling_rate, fmin, fmax)
+ key = "%s-%s-%s-%s-%s-%s" % (dtype_device, n_fft, num_mels, sampling_rate, fmin, fmax)
# if fmax_dtype_device not in mel_basis:
if key not in mel_basis:
mel = librosa_mel_fn(sr=sampling_rate, n_fft=n_fft, n_mels=num_mels, fmin=fmin, fmax=fmax)
@@ -78,17 +90,25 @@ def spec_to_mel_torch(spec, n_fft, num_mels, sampling_rate, fmin, fmax):
return spec
-
def mel_spectrogram_torch(y, n_fft, num_mels, sampling_rate, hop_size, win_size, fmin, fmax, center=False):
if torch.min(y) < -1.2:
- print('min value is ', torch.min(y))
+ print("min value is ", torch.min(y))
if torch.max(y) > 1.2:
- print('max value is ', torch.max(y))
+ print("max value is ", torch.max(y))
global mel_basis, hann_window
- dtype_device = str(y.dtype) + '_' + str(y.device)
+ dtype_device = str(y.dtype) + "_" + str(y.device)
# fmax_dtype_device = str(fmax) + '_' + dtype_device
- fmax_dtype_device = "%s-%s-%s-%s-%s-%s-%s-%s"%(dtype_device,n_fft, num_mels, sampling_rate, hop_size, win_size, fmin, fmax)
+ fmax_dtype_device = "%s-%s-%s-%s-%s-%s-%s-%s" % (
+ dtype_device,
+ n_fft,
+ num_mels,
+ sampling_rate,
+ hop_size,
+ win_size,
+ fmin,
+ fmax,
+ )
# wnsize_dtype_device = str(win_size) + '_' + dtype_device
wnsize_dtype_device = fmax_dtype_device
if fmax_dtype_device not in mel_basis:
@@ -97,11 +117,23 @@ def mel_spectrogram_torch(y, n_fft, num_mels, sampling_rate, hop_size, win_size,
if wnsize_dtype_device not in hann_window:
hann_window[wnsize_dtype_device] = torch.hann_window(win_size).to(dtype=y.dtype, device=y.device)
- y = torch.nn.functional.pad(y.unsqueeze(1), (int((n_fft-hop_size)/2), int((n_fft-hop_size)/2)), mode='reflect')
+ y = torch.nn.functional.pad(
+ y.unsqueeze(1), (int((n_fft - hop_size) / 2), int((n_fft - hop_size) / 2)), mode="reflect"
+ )
y = y.squeeze(1)
- spec = torch.stft(y, n_fft, hop_length=hop_size, win_length=win_size, window=hann_window[wnsize_dtype_device],
- center=center, pad_mode='reflect', normalized=False, onesided=True, return_complex=False)
+ spec = torch.stft(
+ y,
+ n_fft,
+ hop_length=hop_size,
+ win_length=win_size,
+ window=hann_window[wnsize_dtype_device],
+ center=center,
+ pad_mode="reflect",
+ normalized=False,
+ onesided=True,
+ return_complex=False,
+ )
spec = torch.sqrt(spec.pow(2).sum(-1) + 1e-8)
diff --git a/GPT_SoVITS/module/models.py b/GPT_SoVITS/module/models.py
index 21f60d9..3e37f0f 100644
--- a/GPT_SoVITS/module/models.py
+++ b/GPT_SoVITS/module/models.py
@@ -414,7 +414,8 @@ class Generator(torch.nn.Module):
upsample_rates,
upsample_initial_channel,
upsample_kernel_sizes,
- gin_channels=0,is_bias=False,
+ gin_channels=0,
+ is_bias=False,
):
super(Generator, self).__init__()
self.num_kernels = len(resblock_kernel_sizes)
@@ -1173,7 +1174,7 @@ class SynthesizerTrnV3(nn.Module):
quantized = F.interpolate(quantized, scale_factor=2, mode="nearest") ##BCT
x, m_p, logs_p, y_mask = self.enc_p(quantized, y_lengths, text, text_lengths, ge)
fea = self.bridge(x)
- fea = F.interpolate(fea, scale_factor=(1.875 if self.version=="v3"else 2), mode="nearest") ##BCT
+ fea = F.interpolate(fea, scale_factor=(1.875 if self.version == "v3" else 2), mode="nearest") ##BCT
fea, y_mask_ = self.wns1(
fea, mel_lengths, ge
) ##If the 1-minute fine-tuning works fine, no need to manually adjust the learning rate.
@@ -1196,9 +1197,9 @@ class SynthesizerTrnV3(nn.Module):
ge = self.ref_enc(refer[:, :704] * refer_mask, refer_mask)
y_lengths = torch.LongTensor([int(codes.size(2) * 2)]).to(codes.device)
if speed == 1:
- sizee = int(codes.size(2) * (3.875 if self.version=="v3"else 4))
+ sizee = int(codes.size(2) * (3.875 if self.version == "v3" else 4))
else:
- sizee = int(codes.size(2) * (3.875 if self.version=="v3"else 4) / speed) + 1
+ sizee = int(codes.size(2) * (3.875 if self.version == "v3" else 4) / speed) + 1
y_lengths1 = torch.LongTensor([sizee]).to(codes.device)
text_lengths = torch.LongTensor([text.size(-1)]).to(text.device)
@@ -1207,7 +1208,7 @@ class SynthesizerTrnV3(nn.Module):
quantized = F.interpolate(quantized, scale_factor=2, mode="nearest") ##BCT
x, m_p, logs_p, y_mask = self.enc_p(quantized, y_lengths, text, text_lengths, ge, speed)
fea = self.bridge(x)
- fea = F.interpolate(fea, scale_factor=(1.875 if self.version=="v3"else 2), mode="nearest") ##BCT
+ fea = F.interpolate(fea, scale_factor=(1.875 if self.version == "v3" else 2), mode="nearest") ##BCT
####more wn paramter to learn mel
fea, y_mask_ = self.wns1(fea, y_lengths1, ge)
return fea, ge
diff --git a/GPT_SoVITS/process_ckpt.py b/GPT_SoVITS/process_ckpt.py
index 4a2a1ba..1c458a4 100644
--- a/GPT_SoVITS/process_ckpt.py
+++ b/GPT_SoVITS/process_ckpt.py
@@ -28,18 +28,18 @@ def my_save(fea, path): #####fix issue: torch.save doesn't support chinese path
from io import BytesIO
-def my_save2(fea, path,cfm_version):
+def my_save2(fea, path, cfm_version):
bio = BytesIO()
torch.save(fea, bio)
bio.seek(0)
data = bio.getvalue()
- byte=b"03" if cfm_version=="v3"else b"04"
+ byte = b"03" if cfm_version == "v3" else b"04"
data = byte + data[2:]
with open(path, "wb") as f:
f.write(data)
-def savee(ckpt, name, epoch, steps, hps, cfm_version=None,lora_rank=None):
+def savee(ckpt, name, epoch, steps, hps, cfm_version=None, lora_rank=None):
try:
opt = OrderedDict()
opt["weight"] = {}
@@ -51,7 +51,7 @@ def savee(ckpt, name, epoch, steps, hps, cfm_version=None,lora_rank=None):
opt["info"] = "%sepoch_%siteration" % (epoch, steps)
if lora_rank:
opt["lora_rank"] = lora_rank
- my_save2(opt, "%s/%s.pth" % (hps.save_weight_dir, name),cfm_version)
+ my_save2(opt, "%s/%s.pth" % (hps.save_weight_dir, name), cfm_version)
else:
my_save(opt, "%s/%s.pth" % (hps.save_weight_dir, name))
return "Success."
diff --git a/GPT_SoVITS/s2_train_v3_lora.py b/GPT_SoVITS/s2_train_v3_lora.py
index ddeec4f..4d8d23d 100644
--- a/GPT_SoVITS/s2_train_v3_lora.py
+++ b/GPT_SoVITS/s2_train_v3_lora.py
@@ -31,7 +31,6 @@ from module.data_utils import (
TextAudioSpeakerLoaderV3,
TextAudioSpeakerCollateV4,
TextAudioSpeakerLoaderV4,
-
)
from module.models import (
SynthesizerTrnV3 as SynthesizerTrn,
@@ -88,8 +87,8 @@ def run(rank, n_gpus, hps):
if torch.cuda.is_available():
torch.cuda.set_device(rank)
- TextAudioSpeakerLoader=TextAudioSpeakerLoaderV3 if hps.model.version=="v3"else TextAudioSpeakerLoaderV4
- TextAudioSpeakerCollate=TextAudioSpeakerCollateV3 if hps.model.version=="v3"else TextAudioSpeakerCollateV4
+ TextAudioSpeakerLoader = TextAudioSpeakerLoaderV3 if hps.model.version == "v3" else TextAudioSpeakerLoaderV4
+ TextAudioSpeakerCollate = TextAudioSpeakerCollateV3 if hps.model.version == "v3" else TextAudioSpeakerCollateV4
train_dataset = TextAudioSpeakerLoader(hps.data) ########
train_sampler = DistributedBucketSampler(
train_dataset,
@@ -365,7 +364,8 @@ def train_and_evaluate(rank, epoch, hps, nets, optims, schedulers, scaler, loade
hps.name + "_e%s_s%s_l%s" % (epoch, global_step, lora_rank),
epoch,
global_step,
- hps,cfm_version=hps.model.version,
+ hps,
+ cfm_version=hps.model.version,
lora_rank=lora_rank,
),
)
diff --git a/GPT_SoVITS/text/g2pw/onnx_api.py b/GPT_SoVITS/text/g2pw/onnx_api.py
index bf3109e..9282739 100644
--- a/GPT_SoVITS/text/g2pw/onnx_api.py
+++ b/GPT_SoVITS/text/g2pw/onnx_api.py
@@ -1,27 +1,28 @@
# This code is modified from https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/paddlespeech/t2s/frontend/g2pw
# This code is modified from https://github.com/GitYCC/g2pW
-import warnings
-
-warnings.filterwarnings("ignore")
import json
import os
+import warnings
import zipfile
from typing import Any, Dict, List, Tuple
import numpy as np
import onnxruntime
import requests
-
-onnxruntime.set_default_logger_severity(3)
+import torch
from opencc import OpenCC
from pypinyin import Style, pinyin
-from transformers import AutoTokenizer
+from transformers.models.auto.tokenization_auto import AutoTokenizer
from ..zh_normalization.char_convert import tranditional_to_simplified
from .dataset import get_char_phoneme_labels, get_phoneme_labels, prepare_onnx_input
from .utils import load_config
+onnxruntime.set_default_logger_severity(3)
+onnxruntime.preload_dlls()
+warnings.filterwarnings("ignore")
+
model_version = "1.1"
@@ -87,7 +88,7 @@ class G2PWOnnxConverter:
sess_options = onnxruntime.SessionOptions()
sess_options.graph_optimization_level = onnxruntime.GraphOptimizationLevel.ORT_ENABLE_ALL
sess_options.execution_mode = onnxruntime.ExecutionMode.ORT_SEQUENTIAL
- sess_options.intra_op_num_threads = 2
+ sess_options.intra_op_num_threads = 2 if torch.cuda.is_available() else 0
try:
self.session_g2pW = onnxruntime.InferenceSession(
os.path.join(uncompress_path, "g2pW.onnx"),
diff --git a/GPT_SoVITS/utils.py b/GPT_SoVITS/utils.py
index 1cc2d97..f6f388a 100644
--- a/GPT_SoVITS/utils.py
+++ b/GPT_SoVITS/utils.py
@@ -16,7 +16,7 @@ logging.getLogger("matplotlib").setLevel(logging.ERROR)
MATPLOTLIB_FLAG = False
-logging.basicConfig(stream=sys.stdout, level=logging.ERROR)
+logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logger = logging
@@ -309,13 +309,13 @@ def check_git_hash(model_dir):
def get_logger(model_dir, filename="train.log"):
global logger
logger = logging.getLogger(os.path.basename(model_dir))
- logger.setLevel(logging.ERROR)
+ logger.setLevel(logging.INFO)
formatter = logging.Formatter("%(asctime)s\t%(name)s\t%(levelname)s\t%(message)s")
if not os.path.exists(model_dir):
os.makedirs(model_dir)
h = logging.FileHandler(os.path.join(model_dir, filename))
- h.setLevel(logging.ERROR)
+ h.setLevel(logging.INFO)
h.setFormatter(formatter)
logger.addHandler(h)
return logger
diff --git a/README.md b/README.md
index 463649a..b32d2fd 100644
--- a/README.md
+++ b/README.md
@@ -44,15 +44,15 @@ For users in China, you can [click here](https://www.codewithgpu.com/i/RVC-Boss/
### Tested Environments
-| Python Version | PyTorch Version | Device |
-|----------------|------------------|-----------------|
-| Python 3.9 | PyTorch 2.0.1 | CUDA 11.8 |
-| Python 3.10.13 | PyTorch 2.1.2 | CUDA 12.3 |
-| Python 3.10.17 | PyTorch 2.5.1 | CUDA 12.4 |
-| Python 3.9 | PyTorch 2.5.1 | Apple silicon |
-| Python 3.11 | PyTorch 2.6.0 | Apple silicon |
-| Python 3.9 | PyTorch 2.2.2 | CPU |
-| Python 3.9 | PyTorch 2.8.0dev | CUDA12.8(for Nvidia50x0) |
+| Python Version | PyTorch Version | Device |
+| -------------- | ---------------- | ------------- |
+| Python 3.10 | PyTorch 2.5.1 | CUDA 12.4 |
+| Python 3.11 | PyTorch 2.5.1 | CUDA 12.4 |
+| Python 3.11 | PyTorch 2.7.0 | CUDA 12.8 |
+| Python 3.9 | PyTorch 2.8.0dev | CUDA 12.8 |
+| Python 3.9 | PyTorch 2.5.1 | Apple silicon |
+| Python 3.11 | PyTorch 2.7.0 | Apple silicon |
+| Python 3.9 | PyTorch 2.2.2 | CPU |
### Windows
@@ -63,31 +63,41 @@ If you are a Windows user (tested with win>=10), you can [download the integrate
### Linux
```bash
-conda create -n GPTSoVits python=3.9
+conda create -n GPTSoVits python=3.10
conda activate GPTSoVits
-bash install.sh --source [--download-uvr5]
+bash install.sh --device --source [--download-uvr5]
```
### macOS
**Note: The models trained with GPUs on Macs result in significantly lower quality compared to those trained on other devices, so we are temporarily using CPUs instead.**
-1. Install Xcode command-line tools by running `xcode-select --install`.
-2. Install the program by running the following commands:
+Install the program by running the following commands:
```bash
-conda create -n GPTSoVits python=3.9
+conda create -n GPTSoVits python=3.10
conda activate GPTSoVits
-bash install.sh --source [--download-uvr5]
+bash install.sh --device --source [--download-uvr5]
```
### Install Manually
+#### Install Dependences
+
+```bash
+conda create -n GPTSoVits python=3.10
+conda activate GPTSoVits
+
+pip install -r extra-req.txt --no-deps
+pip install -r requirements.txt
+```
+
#### Install FFmpeg
##### Conda Users
```bash
+conda activate GPTSoVits
conda install ffmpeg
```
@@ -96,14 +106,13 @@ conda install ffmpeg
```bash
sudo apt install ffmpeg
sudo apt install libsox-dev
-conda install -c conda-forge 'ffmpeg<7'
```
##### Windows Users
-Download and place [ffmpeg.exe](https://huggingface.co/lj1995/VoiceConversionWebUI/blob/main/ffmpeg.exe) and [ffprobe.exe](https://huggingface.co/lj1995/VoiceConversionWebUI/blob/main/ffprobe.exe) in the GPT-SoVITS root.
+Download and place [ffmpeg.exe](https://huggingface.co/lj1995/VoiceConversionWebUI/blob/main/ffmpeg.exe) and [ffprobe.exe](https://huggingface.co/lj1995/VoiceConversionWebUI/blob/main/ffprobe.exe) in the GPT-SoVITS root
-Install [Visual Studio 2017](https://aka.ms/vs/17/release/vc_redist.x86.exe) (Korean TTS Only)
+Install [Visual Studio 2017](https://aka.ms/vs/17/release/vc_redist.x86.exe)
##### MacOS Users
@@ -111,36 +120,53 @@ Install [Visual Studio 2017](https://aka.ms/vs/17/release/vc_redist.x86.exe) (Ko
brew install ffmpeg
```
-#### Install Dependences
+### Running GPT-SoVITS with Docker
-```bash
-pip install -r extra-req.txt --no-deps
-pip install -r requirements.txt
-```
+#### Docker Image Selection
-### Using Docker
+Due to rapid development in the codebase and a slower Docker image release cycle, please:
-#### docker-compose.yaml configuration
+- Check [Docker Hub](https://hub.docker.com/r/xxxxrt666/gpt-sovits) for the latest available image tags
+- Choose an appropriate image tag for your environment
+- `Lite` means the Docker image does not include ASR models and UVR5 models. You can manually download the UVR5 models, while the program will automatically download the ASR models as needed
+- The appropriate architecture image (amd64/arm64) will be automatically pulled during Docker Compose
+- Optionally, build the image locally using the provided Dockerfile for the most up-to-date changes
-0. Regarding image tags: Due to rapid updates in the codebase and the slow process of packaging and testing images, please check [Docker Hub](https://hub.docker.com/r/breakstring/gpt-sovits)(outdated) for the currently packaged latest images and select as per your situation, or alternatively, build locally using a Dockerfile according to your own needs.
-1. Environment Variables:
- - is_half: Controls half-precision/double-precision. This is typically the cause if the content under the directories 4-cnhubert/5-wav32k is not generated correctly during the "SSL extracting" step. Adjust to True or False based on your actual situation.
-2. Volumes Configuration, The application's root directory inside the container is set to /workspace. The default docker-compose.yaml lists some practical examples for uploading/downloading content.
-3. shm_size: The default available memory for Docker Desktop on Windows is too small, which can cause abnormal operations. Adjust according to your own situation.
-4. Under the deploy section, GPU-related settings should be adjusted cautiously according to your system and actual circumstances.
+#### Environment Variables
-#### Running with docker compose
+- `is_half`: Controls whether half-precision (fp16) is enabled. Set to `true` if your GPU supports it to reduce memory usage.
-```
-docker compose -f "docker-compose.yaml" up -d
+#### Shared Memory Configuration
+
+On Windows (Docker Desktop), the default shared memory size is small and may cause unexpected behavior. Increase `shm_size` (e.g., to `16g`) in your Docker Compose file based on your available system memory.
+
+#### Choosing a Service
+
+The `docker-compose.yaml` defines two services:
+
+- `GPT-SoVITS-CU126` & `GPT-SoVITS-CU128`: Full version with all features.
+- `GPT-SoVITS-CU126-Lite` & `GPT-SoVITS-CU128-Lite`: Lightweight version with reduced dependencies and functionality.
+
+To run a specific service with Docker Compose, use:
+
+```bash
+docker compose run --service-ports
```
-#### Running with docker command
+#### Building the Docker Image Locally
-As above, modify the corresponding parameters based on your actual situation, then run the following command:
+If you want to build the image yourself, use:
+```bash
+bash docker_build.sh --cuda <12.6|12.8> [--lite]
```
-docker run --rm -it --gpus=all --env=is_half=False --volume=G:\GPT-SoVITS-DockerTest\output:/workspace/output --volume=G:\GPT-SoVITS-DockerTest\logs:/workspace/logs --volume=G:\GPT-SoVITS-DockerTest\SoVITS_weights:/workspace/SoVITS_weights --workdir=/workspace -p 9880:9880 -p 9871:9871 -p 9872:9872 -p 9873:9873 -p 9874:9874 --shm-size="16G" -d breakstring/gpt-sovits:xxxxx
+
+#### Accessing the Running Container (Bash Shell)
+
+Once the container is running in the background, you can access it using:
+
+```bash
+docker exec -it bash
```
## Pretrained Models
@@ -168,7 +194,9 @@ docker run --rm -it --gpus=all --env=is_half=False --volume=G:\GPT-SoVITS-Docker
The TTS annotation .list file format:
```
+
vocal_path|speaker_name|language|text
+
```
Language dictionary:
@@ -182,7 +210,9 @@ Language dictionary:
Example:
```
+
D:\GPT-SoVITS\xxx/xxx.wav|xxx|en|I like playing Genshin.
+
```
## Finetune and inference
@@ -212,12 +242,12 @@ Or maunally switch version in WebUI
#### Path Auto-filling is now supported
- 1. Fill in the audio path
- 2. Slice the audio into small chunks
- 3. Denoise(optinal)
- 4. ASR
- 5. Proofreading ASR transcriptions
- 6. Go to the next Tab, then finetune the model
+1. Fill in the audio path
+2. Slice the audio into small chunks
+3. Denoise(optinal)
+4. ASR
+5. Proofreading ASR transcriptions
+6. Go to the next Tab, then finetune the model
### Open Inference WebUI
@@ -259,7 +289,7 @@ Use v2 from v1 environment:
2. Clone the latest codes from github.
-3. Download v2 pretrained models from [huggingface](https://huggingface.co/lj1995/GPT-SoVITS/tree/main/gsv-v2final-pretrained) and put them into `GPT_SoVITS\pretrained_models\gsv-v2final-pretrained`.
+3. Download v2 pretrained models from [huggingface](https://huggingface.co/lj1995/GPT-SoVITS/tree/main/gsv-v2final-pretrained) and put them into `GPT_SoVITS/pretrained_models/gsv-v2final-pretrained`.
Chinese v2 additional: [G2PWModel.zip(HF)](https://huggingface.co/XXXXRT/GPT-SoVITS-Pretrained/resolve/main/G2PWModel.zip)| [G2PWModel.zip(ModelScope)](https://www.modelscope.cn/models/XXXXRT/GPT-SoVITS-Pretrained/resolve/master/G2PWModel.zip)(Download G2PW models, unzip and rename to `G2PWModel`, and then place them in `GPT_SoVITS/text`.)
@@ -279,7 +309,7 @@ Use v3 from v2 environment:
2. Clone the latest codes from github.
-3. Download v3 pretrained models (s1v3.ckpt, s2Gv3.pth and models--nvidia--bigvgan_v2_24khz_100band_256x folder) from [huggingface](https://huggingface.co/lj1995/GPT-SoVITS/tree/main) and put them into `GPT_SoVITS\pretrained_models`.
+3. Download v3 pretrained models (s1v3.ckpt, s2Gv3.pth and models--nvidia--bigvgan_v2_24khz_100band_256x folder) from [huggingface](https://huggingface.co/lj1995/GPT-SoVITS/tree/main) and put them into `GPT_SoVITS/pretrained_models`.
additional: for Audio Super Resolution model, you can read [how to download](./tools/AP_BWE_main/24kto48k/readme.txt)
@@ -296,7 +326,7 @@ Use v4 from v1/v2/v3 environment:
2. Clone the latest codes from github.
-3. Download v4 pretrained models (gsv-v4-pretrained/s2v4.ckpt, and gsv-v4-pretrained/vocoder.pth) from [huggingface](https://huggingface.co/lj1995/GPT-SoVITS/tree/main) and put them into `GPT_SoVITS\pretrained_models`.
+3. Download v4 pretrained models (gsv-v4-pretrained/s2v4.ckpt, and gsv-v4-pretrained/vocoder.pth) from [huggingface](https://huggingface.co/lj1995/GPT-SoVITS/tree/main) and put them into `GPT_SoVITS/pretrained_models`.
## Todo List
@@ -322,7 +352,7 @@ Use v4 from v1/v2/v3 environment:
Use the command line to open the WebUI for UVR5
-```
+```bash
python tools/uvr5/webui.py ""
```
@@ -333,7 +363,7 @@ python mdxnet.py --model --input_root --output_vocal --output_ins --agg_level --
This is how the audio segmentation of the dataset is done using the command line
-```
+```bash
python audio_slicer.py \
--input_path "" \
--output_root "" \
@@ -345,7 +375,7 @@ python audio_slicer.py \
This is how dataset ASR processing is done using the command line(Only Chinese)
-```
+```bash
python tools/asr/funasr_asr.py -i -o