Add free-threading (no-GIL) support for Python 3.13t+#175
Conversation
|
Hi! I’ll try to give this a once-over next week sometime. |
|
That would be amazing, thanks @ngoldbaum ! |
6b60bc3 to
81de062
Compare
There was a problem hiding this comment.
I looked over the C code (with the help of Claude) and I think there are two thread safety issues that aren't addressed in this PR:
-
Uses of
PyList_GetItemandPyDict_GetItemStringon possibly shared containers.- You can fix this by using
PyList_GetItemRefandPyDict_GetItemRefinstead. Both are available inpythoncapi-compat, which it looks like you already have vendored. You'll also need to adjust the reference counting, since the new functions I'm suggesting return owned references.
- You can fix this by using
-
Time-of-use vs Time-of-update races around the workspace pointer.
- The
(!self->work)check on line 832 of scsobject.h, inSCS_update, happens without holding the lock. Another thread could concurrently triggerSCS_finishorSCS_init, which both setself->work = NULLwith the lock held. If that happens, you could have a null pointer dereference. I think bothSCS_solveandSCS_updateshould move those checks into code blocks where they have the lock held. - That said, I'm pretty sure either of these scenarios are mostly theoretical, I don't think it's possible to trigger
SCS_finishunless the object's refcount goes to zero. It is possible to manually re-trigger__init__, but that's a super weird thing to do. Either way, probably better to move the checks for whether the workspace exists into code where the lock is held, if only to make the code clearer.
- The
-
PyThread_acquire_lockcan fail and you're supposed to check for failure. Almost no code actually does that though so this is more of a minor nit.
|
Oh also, it looks like there isn't a test that shares an SCS objects between threads and simultaneously calls update and solve. That shouldn't race, assuming the locking is correct. You might even consider adding a test run that uses a TSan-instrumented build of CPython and scs-python if you want to be very sure there aren't any data races. |
|
Thanks for the very thorough review @ngoldbaum ! I am also working with Claude and I think we have addressed your comments. #: 1 |
|
Awesome, I’ll try to give this another once-over next week. |
ngoldbaum
left a comment
There was a problem hiding this comment.
Overall looks good. I had a few minor comments inline. I used Claude to review this and didn't read all the new tests in detail. I did read the changes to scsobject.h and the CI/sanitizer config in detail.
| 'Programming Language :: Python :: 3.13', | ||
| 'Programming Language :: Python :: 3.14', | ||
| 'Programming Language :: Python :: 3 :: Only', | ||
| 'Programming Language :: Python :: Implementation :: Free Threading :: 1 - Unstable', |
There was a problem hiding this comment.
Good catch — changed to 3 - Stable. Also fixed the classifier string: the previous one had an extraneous Implementation :: segment that isn't in the trove-classifiers list, so PyPI would have rejected it on upload.
|
|
||
| [tool.cibuildwheel] | ||
| enable = [ | ||
| "cpython-freethreading", # Build free-threaded (3.13t+) wheels |
There was a problem hiding this comment.
It turns out cibuildwheel is going to remove this option in the next release, so you might as well never enable it in the first place: pypa/cibuildwheel#2787
There was a problem hiding this comment.
Dropped. Verified that on cibuildwheel 3.4.1 (which we pin), 3.14t wheels build without the flag — see the v3.4.1 release notes. Side effect: we no longer ship 3.13t wheels, which matches the migration direction in your earlier comment anyway.
| "thread_unsafe: mark test as unsafe to run in parallel threads (pytest-run-parallel)", | ||
| ] | ||
| # pytest-run-parallel: fixtures and functions that are not thread-safe | ||
| thread_unsafe_fixtures = ["capsys", "capfd"] |
There was a problem hiding this comment.
pytest-run-parallel marks these as unsafe automatically: https://github.com/Quansight-Labs/pytest-run-parallel/blob/52486283381cf881f8fd229ab4a77528466748ef/src/pytest_run_parallel/thread_unsafe_detection.py#L52-L60
| endif | ||
|
|
||
| _deps = [blas_deps] | ||
| _deps += dependency('threads') |
There was a problem hiding this comment.
It's for the upstream thread-safe ctrl-c fix (cvxgrp/scs#375). scs_source/src/ctrlc.c now uses a pthread_mutex_t to protect the ref-counted signal-handler registration, so the extension needs to link against the pthread library on platforms where libc doesn't pull it in by default. dependency('threads') is the meson-idiomatic, portable way to ask for that (it resolves to -lpthread on glibc, nothing on musl/macOS, the right Windows shim, etc.).
| try: | ||
| # Re-initialize with same data -- SCS_init checks | ||
| # if self->work is set and errors out | ||
| solver.__init__(data, cone, verbose=False) |
There was a problem hiding this comment.
Generally I've been operating under the assumption that objects are only ever initialized once and anyone calling __init__ directly like this is doing something that shouldn't be made thread-safe. It makes it a lot simpler to reason about code if you make the assumption that the initializer always runs on an unshared instance.
But if you do care and want to protect against this, I think that means SCS_init should acquire the lock when it checks for the workspace pointer.
There was a problem hiding this comment.
Agreed — went with the first option. SCS_init stays unlocked, the re-init concurrency test is gone (39c1579), and SCS.__init__ now has a docstring note making the thread-local-construction assumption explicit. Matches the pattern in NumPy/SciPy/h5py/etc.
Enables scs to run with the GIL disabled in free-threaded CPython builds. C extension changes: - Add PyMutex per-instance lock (only in Py_GIL_DISABLED builds) to serialize concurrent access to the SCS workspace from multiple threads - Lock around scs_solve, scs_update, and scs_finish (pure C calls safe to hold across Py_BEGIN_ALLOW_THREADS) - No locking in SCS_init (object is thread-local during construction) - Declare module as GIL-not-used via PyUnstable_Module_SetGIL Build/packaging: - Enable cpython-freethreading in cibuildwheel to ship 3.13t+ wheels - Add free-threading PyPI classifier CI: - Add free-threading test workflow (3.13t + 3.14t) alongside existing build.yml Tests: - 15 new concurrency tests covering: independent instances, shared instances, solve+update sequences, warm starts, legacy API, stress tests, result isolation Closes #130 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
uv pip install requires a virtual environment or --system flag. Use uv venv to create a proper venv for the free-threaded Python. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
meson-python and meson must be in the venv when using --no-build-isolation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ation Let the build system handle its own dependencies (meson-python, numpy) via pyproject.toml build-system.requires, rather than manually installing them and using --no-build-isolation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Correctness fixes from @ngoldbaum's review: - PyDict_GetItemString -> PyDict_GetItemStringRef (strong refs) - PyList_GetItem -> PyList_GetItemRef (strong refs) - Proper Py_DECREF and scs_free on all error paths - Move self->work NULL check under lock (TOCTOU fix) in SCS_solve/SCS_update - Check PyThread_acquire_lock return value (PY_LOCK_ACQUIRED) - Add comment to SCS_finish explaining unchecked lock acquire in dealloc path - Fix SCS_update lock release ordering with comment explaining why it differs from SCS_solve (avoid holding instance lock while waiting for GIL) Thread-safe RNG in all test files: - Replace all np.random.seed() + global RNG with local RandomState instances - Each test file gets unique seeds so tests exercise different random data - Backend-variant tests (direct/dense/mkl/cudss) share seeds intentionally since they test the same problem with different linear system solvers Testing infrastructure: - 27 new threading tests covering borrowed refs, TOCTOU races, concurrent solve+update, re-init races, error path lock release, stress tests - GIL re-enable detection in conftest.py (modeled after NumPy) - pytest-run-parallel in CI (--parallel-threads=4 --iterations=3) - TSan CI job with cached CPython 3.14t build - faulthandler_timeout and thread_unsafe markers in pyproject.toml - tsan-suppressions.txt for known CPython-internal races Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update scs_source submodule to thread-safe-ctrlc branch which adds mutex-protected reference counting for signal handler registration, fixing TSan-detected data races when multiple threads solve concurrently. Add pthreads dependency to meson.build for the ctrlc mutex. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
OpenBLAS spawns internal threads that are not instrumented with TSan annotations, causing false positive data race reports (e.g. in dsyrk worker threads racing with the caller after the BLAS call returns). Forcing single-threaded BLAS under TSan is standard practice. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Refactor the TSan CI job into a matrix over {tsan, asan}. Both build
CPython 3.14t from source with the corresponding sanitizer flag. Runtime
options (TSAN_OPTIONS, ASAN_OPTIONS, OPENBLAS_NUM_THREADS) are set via
GITHUB_ENV in a shared setup step.
ASan uses detect_leaks=0 to avoid false positives from CPython's
internal memory allocator.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ASan's LeakSanitizer was killing the build because ASAN_OPTIONS was only set after the build. Move the env setup step before pip install so detect_leaks=0 applies during compilation too. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move tsan-suppressions.txt to test/ and add lsan-suppressions.txt for CPython-internal leaks. Use LSan suppressions instead of detect_leaks=0 so that real SCS leaks are still caught. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Set faulthandler_exit_on_timeout = true in pyproject.toml so hanging tests actually terminate in CI instead of just dumping tracebacks. Remove redundant -p no:faulthandler -o faulthandler_timeout=600 flags from the free-threading CI since the config is now in pyproject.toml. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix Free Threading trove classifier: drop extraneous "Implementation ::" prefix, bump "1 - Unstable" to "3 - Stable" per PEP 779 and reviewer feedback (free-threading is no longer experimental in 3.14). - Drop `enable = ["cpython-freethreading"]`: deprecated in cibuildwheel 3.4.1, 3.14t wheels build without it. Side effect: we no longer ship 3.13t wheels, which aligns with upstream's migration push to 3.14t. - Drop `thread_unsafe_fixtures = ["capsys", "capfd"]`: pytest-run-parallel marks these automatically. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Per review discussion: the `SCS_init` guard against concurrent re-init of a live instance is not standard practice for CPython extensions — NumPy, SciPy, and similar libraries assume object construction is thread-local and do not lock `__init__`. The `test_reinit_while_solving` test also swallowed exceptions broadly, which hid rather than detected any real race on the `self->work` field. Remove the test and add a docstring note to `SCS.__init__` making the thread-local-construction assumption explicit. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Missed during the rebase onto master: this test still passed the old `use_indirect=` boolean kwarg, which was removed in #189 in favor of the `linear_solver=scs.LinearSolver.*` enum. This broke the full CI matrix (wheel builds, accelerate builds, free-threading tests, sanitizers) on a single shared failure. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
These tests patch scs.sys and scs._load_module — module-level state that leaks to other tests running in parallel threads under pytest-run-parallel, causing MagicMock to be substituted for the real scs.SCS class in unrelated tests. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Thanks again for all your help @ngoldbaum ! |
Summary
Enables
scs-pythonto run correctly on free-threaded CPython builds (3.13t+), building on the earlier work in #131 and incorporating follow-up fixes from review and CI.The main user-facing outcome is that:
SCSinstances can be used safely from multiple threads on free-threaded PythonSCSinstance is also safe to call from multiple threads, but access is serialized with a per-instance lockWhat Changed
Python extension changes
PyUnstable_Module_SetGIL(..., Py_MOD_GIL_NOT_USED)inscs/scsmodule.h.PyThread_type_lockprotectingwork/solinscs/scsobject.h.SCS_solve,SCS_update, andSCS_finish.Py_GIL_DISABLEDbuilds.PyDict_GetItemString->PyDict_GetItemStringRefPyList_GetItem->PyList_GetItemRefself->workTOCTOU race by moving the null check under the instance lock inSCS_solveandSCS_update.SCS_solvevsSCS_update.scs_sourceupdatescs_sourcesubmodule to current upstreammaster.cvxgrp/scs#375) and the later upstream changes that landed after that point.Build / packaging / CI
3.13t+.3.13t3.14tpytest-run-parallelOPENBLAS_NUM_THREADS=1in TSan runs to avoid noise from non-instrumented OpenBLAS worker threads.test/.Tests
test/test_free_threading.pywith concurrency and stress coverage for:solve()/update()usagetest/test_thread_safety.pyand mark thread-creating tests appropriately forpytest-run-parallel.test/conftest.pyto fail the test run if the GIL gets silently re-enabled during free-threaded testing.RandomState(...)usage so they are deterministic and thread-safe.Design Notes
SCSinstances should be able to solve concurrently once the interpreter no longer serializes threads with the GIL.SCSinstance is made safe by serialization, not by making the underlying workspace re-entrant.SCS_initis still left unlocked; the intended model is that object construction is thread-local.Closes #130.
Test Coverage
This branch adds or updates coverage for:
3.13t/3.14t