Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -15,4 +15,5 @@ ffmprobe*
ffplay*
debug
exp_out
.gradio
.gradio
.claude/
118 changes: 114 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -222,25 +222,28 @@ You can also download the weights manually from the following links:
1. Download our trained [weights](https://huggingface.co/TMElyralab/MuseTalk/tree/main)
2. Download the weights of other components:
- [sd-vae-ft-mse](https://huggingface.co/stabilityai/sd-vae-ft-mse/tree/main)
- [whisper](https://huggingface.co/openai/whisper-tiny/tree/main)
- [whisper](https://huggingface.co/openai/whisper-tiny/tree/main) (make sure to include `preprocessor_config.json`)
- [dwpose](https://huggingface.co/yzd-v/DWPose/tree/main)
- [syncnet](https://huggingface.co/ByteDance/LatentSync/tree/main)
- [face-parse-bisent](https://drive.google.com/file/d/154JgKpzCPW82qINcVieuPH3fZ2e0P812/view?pli=1)
- [resnet18](https://download.pytorch.org/models/resnet18-5c106cde.pth)
- [s3fd](https://www.adrianbulat.com/downloads/python-fan/s3fd-619a316812.pth) (rename to `s3fd.pth` and place in `models/s3fd/`)

Finally, these weights should be organized in `models` as follows:
```
./models/
├── musetalk
── musetalk.json
── musetalk.json
│ └── pytorch_model.bin
├── musetalkV15
── musetalk.json
── musetalk.json
│ └── unet.pth
├── syncnet
│ └── latentsync_syncnet.pt
├── dwpose
│ └── dw-ll_ucoco_384.pth
├── s3fd
│ └── s3fd.pth
├── face-parse-bisent
│ ├── 79999_iter.pth
│ └── resnet18-5c106cde.pth
Expand All @@ -251,8 +254,13 @@ Finally, these weights should be organized in `models` as follows:
├── config.json
├── pytorch_model.bin
└── preprocessor_config.json

```

> **Note:**
> - `s3fd/s3fd.pth` is the SFD face detection model weight, download from [s3fd-619a316812.pth](https://www.adrianbulat.com/downloads/python-fan/s3fd-619a316812.pth) and rename to `s3fd.pth`.
> - `whisper/preprocessor_config.json` is required by `AutoFeatureExtractor`. If it is missing from the downloaded whisper weights, you can download it from [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny/blob/main/preprocessor_config.json).
> - If you encounter `ImportError: cannot import name 'split_torch_state_dict_into_shards' from 'huggingface_hub'`, upgrade `huggingface_hub` to fix the version incompatibility: `pip install --upgrade huggingface_hub`.
## Quickstart

### Inference
Expand Down Expand Up @@ -512,6 +520,108 @@ python -m scripts.inference --inference_config configs/inference/test.yaml --bbo

As a complete solution to virtual human generation, you are suggested to first apply [MuseV](https://github.com/TMElyralab/MuseV) to generate a video (text-to-video, image-to-video or pose-to-video) by referring [this](https://github.com/TMElyralab/MuseV?tab=readme-ov-file#text2video). Frame interpolation is suggested to increase frame rate. Then, you can use `MuseTalk` to generate a lip-sync video by referring [this](https://github.com/TMElyralab/MuseTalk?tab=readme-ov-file#inference).

# Troubleshooting

## New GPU Architecture (Blackwell sm_120)

If you're using **NVIDIA RTX PRO 6000 Blackwell** or other new GPU with `sm_120` architecture, you'll encounter:

```
CUDA error: no kernel image is available for execution on the device
The current PyTorch install supports CUDA capabilities sm_37 sm_50 ... sm_90.
```

This is because the default PyTorch 2.0.1 does not support the Blackwell architecture. Follow these steps:

### Step 1: Upgrade PyTorch

Install **PyTorch 2.6+ with CUDA 12.6/12.8** which supports `sm_120`:

```bash
# For CUDA 12.8 (tested with PyTorch 2.9.0)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128

# Or for CUDA 12.6
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126
```

### Step 2: Reinstall mmcv for the New CUDA Version

After upgrading PyTorch, the existing mmcv (compiled for older CUDA) will fail with:

```
ImportError: libcudart.so.11.0: cannot open shared object file: No such file or directory
```

**Important version constraint:** `mmdet 3.1.0` and `mmpose 1.1.0` require **mmcv 2.0.1 ~ 2.1.x**. Do **not** use mmcv 2.2.0+ as they have strict runtime version checks that will raise `AssertionError`.

If `mim install mmcv` finds a pre-built wheel for your CUDA/PyTorch combo:
```bash
pip uninstall mmcv -y
mim install "mmcv==2.1.0"
```

If there is **no pre-built wheel** (common for new CUDA/PyTorch combinations), install from source:
```bash
pip install setuptools
pip install mmcv==2.1.0 --no-binary mmcv --no-build-isolation
```

The `--no-build-isolation` flag is required so that pip uses the `setuptools` already in your environment instead of creating an isolated build environment (which may lack `pkg_resources`).

### Step 3: Verify Installation

```bash
python -c "import mmcv; import mmdet; import mmpose; print('mmcv:', mmcv.__version__); print('mmdet:', mmdet.__version__); print('mmpose:', mmpose.__version__)"
```

Expected output:
```
mmcv: 2.1.0
mmdet: 3.1.0
mmpose: 1.1.0
```

## PyTorch 2.6+ Checkpoint Loading Error

If you see `_pickle.UnpicklingError: Weights only load failed` when loading dwpose checkpoint:

```
WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray._reconstruct
```

This is because PyTorch 2.6+ changed `torch.load` default to `weights_only=True`. The code has been patched to handle this automatically. If you encounter this issue, make sure you have the latest version of `preprocessing.py`.

## FFmpeg Missing PNG Encoder

```
[vost#0:0] Automatic encoder selection failed Default encoder for format image2 (codec png) is probably disabled.
Error opening output files: Encoder not found
```

This usually happens when FFmpeg is compiled without certain encoders (e.g., Windows static builds with `--disable-x86asm`).

**Solution 1**: Use system FFmpeg:
```bash
sudo apt-get install ffmpeg # Ubuntu/Debian
```

**Solution 2**: Specify FFmpeg path when running inference:
```bash
python -m scripts.inference --ffmpeg_path /usr/bin ...
```

**Solution 3**: Add system FFmpeg to PATH with higher priority:
```bash
export PATH=/usr/bin:$PATH
```

**Verify** your FFmpeg has PNG encoder:
```bash
ffmpeg -encoders | grep png
# Should show: VF...D png PNG (Portable Network Graphics) image
```

# Acknowledgement
1. We thank open-source components like [whisper](https://github.com/openai/whisper), [dwpose](https://github.com/IDEA-Research/DWPose), [face-alignment](https://github.com/1adrianb/face-alignment), [face-parsing](https://github.com/zllrunning/face-parsing.PyTorch), [S3FD](https://github.com/yxlijun/S3FD.pytorch) and [LatentSync](https://huggingface.co/ByteDance/LatentSync/tree/main).
1. MuseTalk has referred much to [diffusers](https://github.com/huggingface/diffusers) and [isaacOnline/whisper](https://github.com/isaacOnline/whisper/tree/extract-embeddings).
Expand Down
69 changes: 69 additions & 0 deletions create_symlinks.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
#!/bin/bash

# 源目录
SOURCE_DIR="/mnt/f/Models_Download/MuseTalk"

# 目标目录(项目中的 models 目录)
TARGET_DIR="/home/smy/LM_projects/agentscope/MuseTalk/models"

echo "========================================="
echo "创建 MuseTalk 模型软链接"
echo "========================================="
echo "源目录: $SOURCE_DIR"
echo "目标目录: $TARGET_DIR"
echo "========================================="

# 检查源目录是否存在
if [ ! -d "$SOURCE_DIR" ]; then
echo "错误: 源目录不存在: $SOURCE_DIR"
exit 1
fi

# 创建目标目录(如果不存在)
if [ ! -d "$TARGET_DIR" ]; then
echo "创建目标目录: $TARGET_DIR"
mkdir -p "$TARGET_DIR"
fi

# 遍历源目录中的所有文件夹
count=0
for folder in "$SOURCE_DIR"/*/; do
# 检查是否是有效的目录
if [ -d "$folder" ]; then
# 获取文件夹名称
folder_name=$(basename "$folder")

# 跳过隐藏文件夹(以 . 开头)
if [[ "$folder_name" == .* ]]; then
echo "跳过隐藏文件夹: $folder_name"
continue
fi

# 检查目标链接是否已存在
if [ -e "$TARGET_DIR/$folder_name" ] || [ -L "$TARGET_DIR/$folder_name" ]; then
echo "警告: 目标已存在,跳过: $folder_name"
continue
fi

# 创建绝对路径
source_absolute_path=$(cd "$folder" && pwd)

# 创建软链接
ln -s "$source_absolute_path" "$TARGET_DIR/$folder_name"
if [ $? -eq 0 ]; then
echo "✓ 创建软链接: $folder_name -> $source_absolute_path"
count=$((count + 1))
else
echo "✗ 创建软链接失败: $folder_name"
fi
fi
done

echo "========================================="
echo "完成!共创建了 $count 个软链接"
echo "========================================="

# 显示创建的软链接
echo ""
echo "创建的软链接列表:"
ls -la "$TARGET_DIR"
Empty file modified download_weights.sh
100644 → 100755
Empty file.
Empty file modified entrypoint.sh
100644 → 100755
Empty file.
Empty file modified inference.sh
100644 → 100755
Empty file.
5 changes: 4 additions & 1 deletion musetalk/utils/face_detection/detection/sfd/sfd_detector.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,16 @@
from .bbox import *
from .detect import *

_PROJECT_ROOT = os.path.abspath(os.path.join(os.path.dirname(__file__), '..', '..', '..', '..', '..'))
_SFD_MODEL_PATH = os.path.join(_PROJECT_ROOT, 'models', 's3fd', 's3fd.pth')

models_urls = {
's3fd': 'https://www.adrianbulat.com/downloads/python-fan/s3fd-619a316812.pth',
}


class SFDDetector(FaceDetector):
def __init__(self, device, path_to_detector=os.path.join(os.path.dirname(os.path.abspath(__file__)), 's3fd.pth'), verbose=False):
def __init__(self, device, path_to_detector=_SFD_MODEL_PATH, verbose=False):
super(SFDDetector, self).__init__(device, verbose)

# Initialise the face detector
Expand Down
11 changes: 11 additions & 0 deletions musetalk/utils/preprocessing.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,17 @@
import torch
from tqdm import tqdm

# Monkey-patch mmengine's torch.load to disable weights_only for PyTorch 2.6+
import mmengine.runner.checkpoint as _ckpt_module
_orig_torch_load = torch.load
def _patched_torch_load(*args, **kwargs):
if 'weights_only' not in kwargs:
kwargs['weights_only'] = False
return _orig_torch_load(*args, **kwargs)
# Patch at module level so mmengine uses it
torch.load = _patched_torch_load
_ckpt_module.torch.load = _patched_torch_load

# initialize the mmpose model
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
config_file = './musetalk/utils/dwpose/rtmpose-l_8xb32-270e_coco-ubody-wholebody-384x288.py'
Expand Down
10 changes: 7 additions & 3 deletions scripts/inference.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,12 @@ def fast_check_ffmpeg():

@torch.no_grad()
def main(args):
# Configure ffmpeg path
if not fast_check_ffmpeg():
# Configure ffmpeg path - always use provided ffmpeg_path if specified
if args.ffmpeg_path:
print(f"Using ffmpeg from: {args.ffmpeg_path}")
path_separator = ';' if sys.platform == 'win32' else ':'
os.environ["PATH"] = f"{args.ffmpeg_path}{path_separator}{os.environ['PATH']}"
elif not fast_check_ffmpeg():
print("Adding ffmpeg to PATH")
# Choose path separator based on operating system
path_separator = ';' if sys.platform == 'win32' else ':'
Expand Down Expand Up @@ -250,7 +254,7 @@ def main(args):

if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--ffmpeg_path", type=str, default="./ffmpeg-4.4-amd64-static/", help="Path to ffmpeg executable")
parser.add_argument("--ffmpeg_path", type=str, default="/usr/bin", help="Path to ffmpeg executable")
parser.add_argument("--gpu_id", type=int, default=0, help="GPU ID to use")
parser.add_argument("--vae_type", type=str, default="sd-vae", help="Type of VAE model")
parser.add_argument("--unet_config", type=str, default="./models/musetalk/config.json", help="Path to UNet configuration file")
Expand Down
Empty file modified train.sh
100644 → 100755
Empty file.