Skip to content

TritonParse v0.4.3 Release πŸŽ‰

Latest

Choose a tag to compare

@FindHao FindHao released this 08 Apr 23:24
· 17 commits to main since this release
  • Date range: 2026-04-01 β€” 2026-04-08
  • Scope: Bug-fix release - OVERRIDE_TTIR reproducer rewrite with stub kernel generation, warp-specialized kernel num_warps fix, Manifold upload scoping to fbcode MAST environments, OSS atexit cleanup fix, and _json_compat extensions.

Highlights

  • πŸ”§ OVERRIDE_TTIR Reproducer Rewrite (#376): Complete rewrite of the OVERRIDE_TTIR reproducer mode. The previous implementation was broken β€” it skipped defining the kernel function (causing NameError), only worked for autotuned kernels, and discarded constexpr values. The new approach generates a stub triton.jit function (same name and params, pass body) wrapped with triton.autotune carrying captured constexpr values, compile params, and ir_override pointing to the captured TTGIR. This eliminates the need to copy kernel source code and its transitive dependencies.

  • πŸ› Warp-Specialized Kernel Reproducer Fix: Fixed ptxas "Insufficient registers" failure when reproducing warp-specialized kernels. The Triton compiler overwrites metadata["num_warps"] with the post-expansion count (ttg.total-num-warps), causing the reproducer to double-inflate the warp count. The fix extracts the original ttg.num-warps from TTGIR module attributes instead.

  • πŸ”’ Manifold Upload Scoping: Manifold upload is now only enabled by default in fbcode MAST environments (detected via torch.version.git_version and MAST_HPC_JOB_NAME), preventing ModuleNotFoundError in OSS environments during atexit cleanup.

Changes by Area

πŸ”§ Reproducer Enhancements

  • OVERRIDE_TTIR Stub Generation (#376): New stub_generator.py (~137 lines) generates stub Triton functions and extracts constexpr values. Rewritten _replace_kernel_import for OVERRIDE_TTIR generates stub + autotune config. _replace_kernel_invocation filters constexpr/compile params (autotune provides them). Captured IR files saved from compilation event's file_content to captured_irs/. Uses lru_cache on extract_params_from_source to avoid redundant AST parses.
  • Warp-Specialized num_warps Fix: At reproducer generation time, extracts original ttg.num-warps from TTGIR module attributes instead of the inflated metadata["num_warps"]. The post-expansion count is preserved as total_num_warps for informational purposes.

⚑ JSON Compatibility Layer

  • _json_compat.py Extensions: Added load(f) and dump(obj, f) file-based convenience wrappers delegating to existing loads()/dumps() with file I/O wrapping.
  • CUTracer Migration: All 14 CUTracer production Python files migrated from stdlib json to tritonparse._json_compat, providing a free 3-10x JSON performance upgrade via orjson with graceful degradation.

πŸ”’ Manifold Upload & OSS Fixes

  • Scoped Default (#374, 3337a0c): TRITONPARSE_TRACE_MANIFOLD now defaults to "0" (OFF) and is only auto-enabled when running in fbcode and in a MAST environment. The env var override still works in all environments.
  • OSS atexit Fix (#374): Gated the Manifold upload path in _cleanup() behind is_fbcode() to prevent ModuleNotFoundError: No module named 'tritonparse.fb' during atexit in OSS environments.

πŸ—οΈ Infrastructure & CI

  • Packaging Workaround (#370): Added explicit pip install packaging in CI setup to work around PyTorch nightly (2.12.0.dev20260405+) missing dependency on packaging module.
  • Pin Node.js in CI: Pinned Node.js version in GitHub Actions CI workflows for reproducible builds.
  • Website Dependencies: Upgraded website dependencies and fixed Vite 8 / ESLint compatibility. Bumped vite from 8.0.3 to 8.0.5 (security fix).
  • Internal Repo Re-sync (#375): Cleaned up Claude Code configuration files that were incorrectly synced to the OSS repository.

Compatibility Notes

  • No breaking changes: This is a bug-fix release with no API or behavior changes for existing users.
  • Manifold upload default changed: TRITONPARSE_TRACE_MANIFOLD now defaults to OFF in non-fbcode environments. Users who relied on the previous default of ON in OSS should explicitly set TRITONPARSE_TRACE_MANIFOLD=1.
  • OVERRIDE_TTIR reproducer: The reproducer output format for OVERRIDE_TTIR mode has changed (stub kernel + autotune wrapper instead of source copy), but the generated reproducers are functionally equivalent and more reliable.

Upgrade Guidance

  1. Standard upgrade:

    pip install --upgrade tritonparse
  2. Warp-specialized kernel reproducers: Previously failing reproducers for warp-specialized kernels should now work correctly without manual intervention.