LiveTalking

You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

Go to file

Marstaos a55de002fa Fix/smooth mouth (#412 ) * 1. 修复了musetalk方案中，当数字人说话状态变化时，嘴部画面跳变问题； 2. 新增现代美观的前端dashboard.html，集成了对话与朗读功能； 3. 修复了“'weights_only' is an invalid keyword argument for load()”报错。 * bugfix:修复视频连接状态不更新的bug * feature:新增可选是否启用musereal中的混合过度选项 * 参照fix log修复log --------- Co-authored-by: marstaos <liu.marstaos@outlook.com>		4 months ago
.github	Create FUNDING.yml	5 months ago
assets	update readme	1 year ago
data	add wav2lip384	6 months ago
ernerf	support mac m1-m4 (#376 )	4 months ago
llm	add wav2lip customvideo	12 months ago
models	add musetalk	1 year ago
musetalk	(new) 聚合前端；musetalk过渡优化；bugfix (#407 )	4 months ago
ultralight	support mac m1-m4 (#376 )	4 months ago
wav2lip	add wav2lip256 model	5 months ago
web	fix log	4 months ago
.gitignore	Fix/smooth mouth (#412 )	4 months ago
Dockerfile	add docker	2 years ago
LICENSE	fix jquery.js	9 months ago
README.md	fix spelling mistakes	4 months ago
README_ZH.md	fix spelling mistakes	4 months ago
app.py	(new) 聚合前端；musetalk过渡优化；bugfix (#407 )	4 months ago
baseasr.py	fix typing	5 months ago
basereal.py	fix spelling mistakes	4 months ago
hubertasr.py	add eventpoint sync with audio	6 months ago
lightreal.py	support mac m1-m4 (#376 )	4 months ago
lipasr.py	add eventpoint sync with audio	6 months ago
lipreal.py	support mac m1-m4 (#376 )	4 months ago
llm.py	add log and typing	5 months ago
logger.py	fix log	5 months ago
museasr.py	add eventpoint sync with audio	6 months ago
musereal.py	Fix/smooth mouth (#412 )	4 months ago
nerfasr.py	support mac m1-m4 (#376 )	4 months ago
nerfreal.py	support mac m1-m4 (#376 )	4 months ago
requirements.txt	fix sounfile version	5 months ago
ttsreal.py	fix spelling mistakes	4 months ago
webrtc.py	fix log	5 months ago

README.md

Real-time interactive streaming digital human enables synchronous audio and video dialogue. It can basically achieve commercial effects.

Effect of wav2lip | Effect of ernerf | Effect of musetalk

中文版

News

December 8, 2024: Improved multi-concurrency, and the video memory does not increase with the number of concurrent connections.
December 21, 2024: Added model warm-up for wav2lip and musetalk to solve the problem of stuttering during the first inference. Thanks to @heimaojinzhangyz
December 28, 2024: Added the digital human model Ultralight-Digital-Human. Thanks to @lijihua2017
February 7, 2025: Added fish-speech tts
February 21, 2025: Added the open-source model wav2lip256. Thanks to @不蠢不蠢
March 2, 2025: Added Tencent's speech synthesis service
March 16, 2025: Supports mac gpu inference. Thanks to @GcsSloop

Features

Supports multiple digital human models: ernerf, musetalk, wav2lip, Ultralight-Digital-Human
Supports voice cloning
Supports interrupting the digital human while it is speaking
Supports full-body video stitching
Supports rtmp and webrtc
Supports video arrangement: Play custom videos when not speaking
Supports multi-concurrency

1. Installation

Tested on Ubuntu 20.04, Python 3.10, Pytorch 1.12 and CUDA 11.3

1.1 Install dependency

conda create -n nerfstream python=3.10
conda activate nerfstream
# If the cuda version is not 11.3 (confirm the version by running nvidia-smi), install the corresponding version of pytorch according to <https://pytorch.org/get-started/previous-versions/> 
conda install pytorch==1.12.1 torchvision==0.13.1 cudatoolkit=11.3 -c pytorch
pip install -r requirements.txt
# If you need to train the ernerf model, install the following libraries
# pip install "git+https://github.com/facebookresearch/pytorch3d.git"
# pip install tensorflow-gpu==2.8.0
# pip install --upgrade "protobuf<=3.20.1"

Common installation issues FAQ
For setting up the linux cuda environment, you can refer to this article https://zhuanlan.zhihu.com/p/674972886

2. Quick Start

Download the models
Quark Cloud Disk https://pan.quark.cn/s/83a750323ef0
Google Drive https://drive.google.com/drive/folders/1FOC_MD6wdogyyX_7V1d4NDIO7P9NlSAJ?usp=sharing
Copy wav2lip256.pth to the models folder of this project and rename it to wav2lip.pth;
Extract wav2lip256_avatar1.tar.gz and copy the entire folder to the data/avatars folder of this project.
Run
python app.py --transport webrtc --model wav2lip --avatar_id wav2lip256_avatar1
Open http://serverip:8010/webrtcapi.html in a browser. First click'start' to play the digital human video; then enter any text in the text box and submit it. The digital human will broadcast this text.
The server side needs to open ports tcp:8010; udp:1-65536
If you need to purchase a high-definition wav2lip model for commercial use, you can contact me to make the purchase.
Quick experience
https://www.compshare.cn/images-detail?ImageID=compshareImage-18tpjhhxoq3j&referral_code=3XW3852OBmnD089hMMrtuU&ytag=GPU_GitHub_livetalking1.3 Create an instance with this image to run it.

If you can't access huggingface, before running

export HF_ENDPOINT=https://hf-mirror.com

3. More Usage

Usage instructions: https://livetalking-doc.readthedocs.io/en/latest

4. Docker Run

No need for the previous installation, just run directly.

docker run --gpus all -it --network=host --rm registry.cn-beijing.aliyuncs.com/codewithgpu2/lipku-metahuman-stream:2K9qaMBu8v

The code is in /root/metahuman-stream. First, git pull to get the latest code, and then execute the commands as in steps 2 and 3.

The following images are provided:

autodl image: https://www.codewithgpu.com/i/lipku/metahuman-stream/base
autodl Tutorial
ucloud image: https://www.compshare.cn/images-detail?ImageID=compshareImage-18tpjhhxoq3j&referral_code=3XW3852OBmnD089hMMrtuU&ytag=GPU_livetalking1.3
Any port can be opened, and there is no need to deploy an srs service additionally.
ucloud Tutorial

5. TODO

Added chatgpt to enable digital human dialogue
Voice cloning
Replace the digital human with a video when it is silent
MuseTalk
Wav2Lip
Ultralight-Digital-Human

If this project is helpful to you, please give it a star. Friends who are interested are also welcome to join in and improve this project together.

Knowledge Planet: https://t.zsxq.com/7NMyO, where high-quality common problems, best practice experiences, and problem solutions are accumulated.
WeChat Official Account: Digital Human Technology