[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Winfredy/SadTalker/blob/main/quick_demo.ipynb) [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/vinthony/SadTalker)

Wenxuan Zhang ^*,1,2 Xiaodong Cun ^*,2 Xuan Wang ³ Yong Zhang ² Xi Shen ²
Yu Guo¹ Ying Shan ² Fei Wang ¹

¹ Xi'an Jiaotong University ² Tencent AI Lab ³ Ant Group

CVPR 2023

![sadtalker](https://user-images.githubusercontent.com/4397546/222490039-b1f6156b-bf00-405b-9fda-0c9a9156f991.gif) TL;DR: single portrait image 🙎‍♂️ + audio 🎤 = talking head video 🎞.

## 🔥 Highlight - 🔥 The extension of the [stable-diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui) is online. Just install it in `extensions -> install from URL -> https://github.com/Winfredy/SadTalker`, checkout more details [here](#sd-webui-extension). https://user-images.githubusercontent.com/4397546/222513483-89161f58-83d0-40e4-8e41-96c32b47bd4e.mp4 - 🔥 `full image mode` is online! checkout [here](https://github.com/Winfredy/SadTalker#beta-full-bodyimage-generation) for more details. | still+enhancer in v0.0.1 | still + enhancer in v0.0.2 | [input image @bagbag1815](https://twitter.com/bagbag1815/status/1642754319094108161) | |:--------------------: |:--------------------: | :----: | |

- 🔥 Several new mode, eg, `still mode`, `reference mode`, `resize mode` are online for better and custom applications. - 🔥 Happy to see our method is used in various talking or singing avatar, checkout these wonderful demos at [bilibili](https://search.bilibili.com/all?keyword=sadtalker&from_source=webtop_search&spm_id_from=333.1007&search_source=3 ) and [twitter #sadtalker](https://twitter.com/search?q=%23sadtalker&src=typed_query). ## 📋 Changelog (Previous changelog can be founded [here](docs/changlelog.md)) - __[2023.04.08]__: In v0.0.2, we add a logo watermark to the generated video to prevent abusing since it is very realistic. - __[2023.04.08]__: v0.0.2, full image animation, adding baidu driver for download checkpoints. Optimizing the logic about enhancer. - __[2023.04.06]__: stable-diffiusion webui extension is release. - __[2023.04.03]__: Enable TTS in huggingface and gradio local demo. - __[2023.03.30]__: Launch beta version of the full body mode. - __[2023.03.30]__: Launch new feature: through using reference videos, our algorithm can generate videos with more natural eye blinking and some eyebrow movement. - __[2023.03.29]__: `resize mode` is online by `python infererence.py --preprocess resize`! Where we can produce a larger crop of the image as discussed in https://github.com/Winfredy/SadTalker/issues/35. - __[2023.03.29]__: local gradio demo is online! `python app.py` to start the demo. New `requirments.txt` is used to avoid the bugs in `librosa`. - __[2023.03.28]__: Online demo is launched in [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/vinthony/SadTalker), thanks AK! ## 🎼 Pipeline ![main_of_sadtalker](https://user-images.githubusercontent.com/4397546/222490596-4c8a2115-49a7-42ad-a2c3-3bb3288a5f36.png) > Our method uses the coefficients of 3DMM as intermediate motion representation. To this end, we first generate realistic 3D motion coefficients (facial expression β, head pose ρ) from audio, then these coefficients are used to implicitly modulate the 3D-aware face render for final video generation. ## 🚧 TODO

Previous TODOs

- [x] Generating 2D face from a single Image. - [x] Generating 3D face from Audio. - [x] Generating 4D free-view talking examples from audio and a single image. - [x] Gradio/Colab Demo. - [x] Full body/image Generation.

- [ ] training code of each componments. - [ ] Audio-driven Anime Avatar. - [ ] interpolate ChatGPT for a conversation demo 🤔 - [x] integrade with stable-diffusion-web-ui. (stay tunning!) ## ⚙️ Installation ([中文教程](https://www.bilibili.com/video/BV17N411P7m7/?vd_source=653f1e6e187ffc29a9b677b6ed23169a)) #### Installing Sadtalker on Linux: ```bash git clone https://github.com/Winfredy/SadTalker.git cd SadTalker conda create -n sadtalker python=3.8 conda activate sadtalker pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113 conda install ffmpeg pip install -r requirements.txt ### tts is optional for gradio demo. ### pip install TTS ``` More tips about installnation on Windows and the Docker file can be founded [here](docs/install.md) #### Sd-Webui-Extension:

CLICK ME

Installing the lastest version of [stable-diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui) and install the sadtalker via `extension`.

Then, retarting the stable-diffusion-webui, set some commandline args. The models will be downloaded automatically in the right place. Alternatively, you can add the path of pre-downloaded sadtalker checkpoints to `SADTALKTER_CHECKPOINTS` in `webui_user.sh`(linux) or `webui_user.bat`(windows) by: ```bash # windows (webui_user.bat) set COMMANDLINE_ARGS=--no-gradio-queue --disable-safe-unpickle set SADTALKER_CHECKPOINTS=D:\SadTalker\checkpoints # linux (webui_user.sh) export COMMANDLINE_ARGS=--no-gradio-queue --disable-safe-unpickle export SADTALKER_CHECKPOINTS=/path/to/SadTalker/checkpoints ``` After installation, the SadTalker can be used in stable-diffusion-webui directly.

#### Download Trained Models

CLICK ME

You can run the following script to put all the models in the right place. ```bash bash scripts/download_models.sh ``` OR download our pre-trained model from [google drive](https://drive.google.com/drive/folders/1Wd88VDoLhVzYsQ30_qDVluQr_Xm46yHT?usp=sharing) or our [github release page](https://github.com/Winfredy/SadTalker/releases/tag/v0.0.1), and then, put it in ./checkpoints. OR we provided the downloaded model in [百度云盘](https://pan.baidu.com/s/1nXuVNd0exUl37ISwWqbFGA?pwd=sadt) 提取码: sadt. | Model | Description | :--- | :---------- |checkpoints/auido2exp_00300-model.pth | Pre-trained ExpNet in Sadtalker. |checkpoints/auido2pose_00140-model.pth | Pre-trained PoseVAE in Sadtalker. |checkpoints/mapping_00229-model.pth.tar | Pre-trained MappingNet in Sadtalker. |checkpoints/mapping_00109-model.pth.tar | Pre-trained MappingNet in Sadtalker. |checkpoints/facevid2vid_00189-model.pth.tar | Pre-trained face-vid2vid model from [the reappearance of face-vid2vid](https://github.com/zhanglonghao1992/One-Shot_Free-View_Neural_Talking_Head_Synthesis). |checkpoints/epoch_20.pth | Pre-trained 3DMM extractor in [Deep3DFaceReconstruction](https://github.com/microsoft/Deep3DFaceReconstruction). |checkpoints/wav2lip.pth | Highly accurate lip-sync model in [Wav2lip](https://github.com/Rudrabha/Wav2Lip). |checkpoints/shape_predictor_68_face_landmarks.dat | Face landmark model used in [dilb](http://dlib.net/). |checkpoints/BFM | 3DMM library file. |checkpoints/hub | Face detection models used in [face alignment](https://github.com/1adrianb/face-alignment).

## 🔮 Quick Start #### Generating 2D face from a single Image from default config. ```bash python inference.py --driven_audio --source_image ``` The results will be saved in `results/$SOME_TIMESTAMP/*.mp4`. Or a local gradio demo similar to our [hugging-face demo](https://huggingface.co/spaces/vinthony/SadTalker) can be run by: ```bash ## you need manually install TTS(https://github.com/coqui-ai/TTS) via `pip install tts` in advanced. python app.py ``` #### Advanced Configuration

Click Me

#### Examples | basic | w/ still mode | w/ exp_scale 1.3 | w/ gfpgan | |:-------------: |:-------------: |:-------------: |:-------------: | |