Skip to content

Add Windows + RTX 5090 (Blackwell) support#70

Open
HurtzDonutStudios wants to merge 1 commit intoVAST-AI-Research:mainfrom
HurtzDonutStudios:windows-rtx5090-support
Open

Add Windows + RTX 5090 (Blackwell) support#70
HurtzDonutStudios wants to merge 1 commit intoVAST-AI-Research:mainfrom
HurtzDonutStudios:windows-rtx5090-support

Conversation

@HurtzDonutStudios
Copy link
Copy Markdown

Summary

  • Enables UniRig to run end-to-end on Windows with modern NVIDIA GPUs (tested RTX 5090 + CUDA 12.8 + PyTorch 2.11)
  • Adds flash_attn SDPA shim so no CUDA flash-attention compilation is needed
  • Fixes spconv/cumm DLL loading, checkpoint unpickling, torch_scatter/torch_cluster CPU fallbacks, and shell script cross-platform issues

Changes

Fix File(s)
CUDA DLL path for spconv/cumm run.py
PyTorch 2.11 weights_only=False run.py
torch_scatter.segment_csr CPU fallback run.py
fps() CPU fallback sal_perceiver.py
flash_attn SDPA shim (PyTorch native) flash_attn_shim/, install_flash_attn_shim.py
enable_flash: False for PTv3 unirig_skin.yaml
eager attention for OPT LLM unirig_ar_350m_*.yaml
NPZ export in user_mode ar.py
Venv auto-activation + backslash fix launch/inference/*.sh
Better error logging skin.py

Test Environment

  • Windows 11 Pro, RTX 5090, CUDA 12.8, PyTorch 2.11+cu128, spconv-cu124 v2.3.8
  • Tested on 884K vertex / 1.5M face mesh — full pipeline in ~46 seconds
  • All 4 stages verified: extract → skeleton → skin → merge

Test plan

  • Extract mesh (bpy) — produces raw_data.npz
  • Skeleton inference — produces skeleton FBX + predict_skeleton.npz
  • Skinning inference — produces skin FBX
  • Merge — produces final rigged GLB
  • Verify on Linux (changes are additive/guarded, should not affect existing Linux paths)

🤖 Generated with Claude Code

Enables UniRig to run end-to-end on Windows with modern NVIDIA GPUs
(tested RTX 5090 + CUDA 12.8 + PyTorch 2.11).

Key changes:
- CUDA DLL path registration for spconv/cumm on Windows
- PyTorch 2.11 weights_only=False patch for checkpoint loading
- torch_scatter/torch_cluster CPU fallback for missing CUDA kernels
- flash_attn SDPA shim (PyTorch native attention, no CUDA compilation needed)
- PTv3 non-flash attention path (enable_flash: False)
- Shell script fixes: venv auto-activation, backslash path conversion
- NPZ export fix for user_mode pipeline (skeleton → skin data chain)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants