- Date range: 2026-04-01 β 2026-04-08
- Scope: Bug-fix release - OVERRIDE_TTIR reproducer rewrite with stub kernel generation, warp-specialized kernel num_warps fix, Manifold upload scoping to fbcode MAST environments, OSS atexit cleanup fix, and _json_compat extensions.
Highlights
-
π§ OVERRIDE_TTIR Reproducer Rewrite (#376): Complete rewrite of the OVERRIDE_TTIR reproducer mode. The previous implementation was broken β it skipped defining the kernel function (causing
NameError), only worked for autotuned kernels, and discarded constexpr values. The new approach generates a stubtriton.jitfunction (same name and params,passbody) wrapped withtriton.autotunecarrying captured constexpr values, compile params, andir_overridepointing to the captured TTGIR. This eliminates the need to copy kernel source code and its transitive dependencies. -
π Warp-Specialized Kernel Reproducer Fix: Fixed
ptxas"Insufficient registers" failure when reproducing warp-specialized kernels. The Triton compiler overwritesmetadata["num_warps"]with the post-expansion count (ttg.total-num-warps), causing the reproducer to double-inflate the warp count. The fix extracts the originalttg.num-warpsfrom TTGIR module attributes instead. -
π Manifold Upload Scoping: Manifold upload is now only enabled by default in fbcode MAST environments (detected via
torch.version.git_versionandMAST_HPC_JOB_NAME), preventingModuleNotFoundErrorin OSS environments during atexit cleanup.
Changes by Area
π§ Reproducer Enhancements
- OVERRIDE_TTIR Stub Generation (#376): New
stub_generator.py(~137 lines) generates stub Triton functions and extracts constexpr values. Rewritten_replace_kernel_importfor OVERRIDE_TTIR generates stub + autotune config._replace_kernel_invocationfilters constexpr/compile params (autotune provides them). Captured IR files saved from compilation event'sfile_contenttocaptured_irs/. Useslru_cacheonextract_params_from_sourceto avoid redundant AST parses. - Warp-Specialized num_warps Fix: At reproducer generation time, extracts original
ttg.num-warpsfrom TTGIR module attributes instead of the inflatedmetadata["num_warps"]. The post-expansion count is preserved astotal_num_warpsfor informational purposes.
β‘ JSON Compatibility Layer
_json_compat.pyExtensions: Addedload(f)anddump(obj, f)file-based convenience wrappers delegating to existingloads()/dumps()with file I/O wrapping.- CUTracer Migration: All 14 CUTracer production Python files migrated from stdlib
jsontotritonparse._json_compat, providing a free 3-10x JSON performance upgrade via orjson with graceful degradation.
π Manifold Upload & OSS Fixes
- Scoped Default (#374, 3337a0c):
TRITONPARSE_TRACE_MANIFOLDnow defaults to"0"(OFF) and is only auto-enabled when running in fbcode and in a MAST environment. The env var override still works in all environments. - OSS atexit Fix (#374): Gated the Manifold upload path in
_cleanup()behindis_fbcode()to preventModuleNotFoundError: No module named 'tritonparse.fb'during atexit in OSS environments.
ποΈ Infrastructure & CI
- Packaging Workaround (#370): Added explicit
pip install packagingin CI setup to work around PyTorch nightly (2.12.0.dev20260405+) missing dependency onpackagingmodule. - Pin Node.js in CI: Pinned Node.js version in GitHub Actions CI workflows for reproducible builds.
- Website Dependencies: Upgraded website dependencies and fixed Vite 8 / ESLint compatibility. Bumped vite from 8.0.3 to 8.0.5 (security fix).
- Internal Repo Re-sync (#375): Cleaned up Claude Code configuration files that were incorrectly synced to the OSS repository.
Compatibility Notes
- No breaking changes: This is a bug-fix release with no API or behavior changes for existing users.
- Manifold upload default changed:
TRITONPARSE_TRACE_MANIFOLDnow defaults to OFF in non-fbcode environments. Users who relied on the previous default of ON in OSS should explicitly setTRITONPARSE_TRACE_MANIFOLD=1. - OVERRIDE_TTIR reproducer: The reproducer output format for OVERRIDE_TTIR mode has changed (stub kernel + autotune wrapper instead of source copy), but the generated reproducers are functionally equivalent and more reliable.
Upgrade Guidance
-
Standard upgrade:
pip install --upgrade tritonparse
-
Warp-specialized kernel reproducers: Previously failing reproducers for warp-specialized kernels should now work correctly without manual intervention.