[Guide] How to make it work in (Fedora 42) Linux #1005

thedarkbird · 2025-08-17T11:35:23Z

thedarkbird
Aug 17, 2025

This git does not seem to be very actively maintained anymore, so I had a pretty hard time getting it working in linux. (Surprisingly I got it working quite easily in Windows, including the built-in webui; not so in linux.) So let's go...

The git manual instructs to install python 3.8, but that causes dependency issues , so we'll take 3.10 (learned by trial and error):
conda create -n sadtalker python=3.10 -y
conda activate sadtalker

Clone the git
git clone https://github.com/OpenTalker/SadTalker
cd SadTalker

Make the batch file that downloads the models executable
chmox +x ./scripts/download_models.sh

Download models
./scripts/download_models.sh

Install the requirements with uv+pip. In my (modest) experience with uv, it's faster and better in resolving package versions than just pip. If you don't have uv, install it with a regular 'sudo dnf install uv' (or apt on Debian-distro's)
uv pip install -r requirements.txt

Install ipykernel to be able to run stuff as a jupyter notebook in VSCode
uv pip install ipykernel

Then install ffmpeg via conda:
conda install ffmpeg -y

Now there is one dependency that won't work, but instead of downgrading packages we'll just change the code; it's dirty, but quick and avoids a headache. Open this file in a text editor:
~/miniconda3/envs/sadtalker/lib/python3.10/site-packages/basicsr/data/degradations.py

Find:
from torchvision.transforms.functional_tensor import rgb_to_grayscale

Replace by:
from torchvision.transforms import functional as F
rgb_to_grayscale = F.rgb_to_grayscale

Now we have a working SadTalker install.

I haven't bothered to get the built-in webui running as it is a sure way to get into another dependency mess. So let's skip that and make it work with a very basic VSCode Jupyter Notebook, or just a regular .py file.

Make a new project directory where you will store your image, audio and jupyter notebook or .py file. Launch VSCode, open the directory, make a new Jupyter Notebook (or .py file) with following code:

from pathlib import Path
app_path = Path('/path/to/SadTalker')
img = 'yourimage.jpg'
audio = 'youraudio.wav'
!python3.10 {app_path}/inference.py \
            --driven_audio {audio} \
            --source_image {img} \
            --result_dir ./results \
            --checkpoint_dir {app_path}/checkpoints \
            --preprocess crop \
            --still \
            --enhancer gfpgan \
            --size 512

Stating the obvious here, but don't forget to change the path and image/audio filenames to YOUR particular case :)
Check the different CLI-command options in the official docs on the git here.

I used SadTalker in combination with Tortoise-TTS and PyTubeFix to make a video that wishes someone a happy birthday:

Get a conda environment running with tortoise-tts (quite good text-to-speech module)
Install pytubefix to be able to download YT-video/audio
Fetch a YT-video containing the voice you want to clone (or you might be able to fetch an MP3/WAV with the voice you want)
Extract the audio and convert it to a mono, 22050 Hz, 16-bit PCM WAV (easy to do with Audacity, or manually via ffmpeg CLI)
Use Audacity to cut and extract a few audio clips (minimum 3 audio clips) from 6 to 30 seconds containing clear speech (no background noise or other voices)
Use tortoise-tts load_voice() function to train it on the voice from the audio clips you extracted; this will basically do voice cloning
Use tortoise-tts to generate speech from a typed text and output to WAV
Then switch to the SadTalker environment and use this generated WAV as input

Worked out quite well for me, but it's a bit involved to get it all up and running. Chatgpt helps :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Guide] How to make it work in (Fedora 42) Linux #1005

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

[Guide] How to make it work in (Fedora 42) Linux #1005

Uh oh!

Uh oh!

thedarkbird Aug 17, 2025

Replies: 0 comments

thedarkbird
Aug 17, 2025