Skip to content

feat: Use SDPA as default attention for broader compatibility#149

Open
xander1421 wants to merge 1 commit intomultimodal-art-projection:mainfrom
xander1421:sdpa-attention-fallback
Open

feat: Use SDPA as default attention for broader compatibility#149
xander1421 wants to merge 1 commit intomultimodal-art-projection:mainfrom
xander1421:sdpa-attention-fallback

Conversation

@xander1421
Copy link
Copy Markdown

Summary

  • Replace flash_attention_2 with sdpa (Scaled Dot Product Attention) as the default attention implementation in inference/infer.py
  • Improves compatibility across different environments without sacrificing performance

Motivation

The current default requires flash-attn to be installed, which:

  • Has complex build requirements (CUDA, specific compilers)
  • Doesn't support Python 3.14+ yet
  • Fails with cryptic errors when not installed

SDPA (Scaled Dot Product Attention) is:

  • Native to PyTorch (no extra dependencies)
  • Compatible with Python 3.14+
  • Still highly optimized (uses cuDNN/Flash under the hood when available)

Changes

  • Changed attn_implementation="flash_attention_2" to attn_implementation="sdpa" in both stage1 and stage2 model loading

Testing

Tested inference successfully with:

  • Python 3.14
  • PyTorch 2.7
  • transformers 5.0

Users who prefer flash_attention_2 can still modify this locally for potentially better performance on supported systems.

🤖 Generated with Claude Code

…ibility

Replace flash_attention_2 with sdpa (Scaled Dot Product Attention) as the
default attention implementation. This provides better compatibility across
different environments:

- Works on systems without flash-attn installed
- Compatible with Python 3.14+
- No additional dependencies required (uses PyTorch native SDPA)

Users who have flash-attn installed can still modify this to use
flash_attention_2 for potentially better performance.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant