Skip to content

Add free-threading (no-GIL) support for Python 3.13t+#175

Merged
bodono merged 18 commits intomasterfrom
free-threading
Apr 17, 2026
Merged

Add free-threading (no-GIL) support for Python 3.13t+#175
bodono merged 18 commits intomasterfrom
free-threading

Conversation

@bodono
Copy link
Copy Markdown
Owner

@bodono bodono commented Apr 4, 2026

Summary

Enables scs-python to run correctly on free-threaded CPython builds (3.13t+), building on the earlier work in #131 and incorporating follow-up fixes from review and CI.

The main user-facing outcome is that:

  • separate SCS instances can be used safely from multiple threads on free-threaded Python
  • the same SCS instance is also safe to call from multiple threads, but access is serialized with a per-instance lock

What Changed

Python extension changes

  • Declare the extension modules as not requiring the GIL via PyUnstable_Module_SetGIL(..., Py_MOD_GIL_NOT_USED) in scs/scsmodule.h.
  • Add a per-instance PyThread_type_lock protecting work / sol in scs/scsobject.h.
    • The lock wraps SCS_solve, SCS_update, and SCS_finish.
    • This makes same-instance access safe by serializing it.
    • This lock is currently present on all builds, not just Py_GIL_DISABLED builds.
  • Fix free-threading reference-safety issues while parsing Python objects:
    • PyDict_GetItemString -> PyDict_GetItemStringRef
    • PyList_GetItem -> PyList_GetItemRef
  • Fix the self->work TOCTOU race by moving the null check under the instance lock in SCS_solve and SCS_update.
  • Document the different lock-release orderings used in SCS_solve vs SCS_update.

scs_source update

  • Update the scs_source submodule to current upstream master.
  • This includes the upstream thread-safe ctrl-c / signal-handling fix (cvxgrp/scs#375) and the later upstream changes that landed after that point.
  • Link thread support in the build where needed for the upstream ctrl-c fix.

Build / packaging / CI

  • Enable free-threaded wheel builds in cibuildwheel for CPython 3.13t+.
  • Add a dedicated free-threading CI workflow covering:
    • free-threaded Python 3.13t
    • free-threaded Python 3.14t
    • parallel threaded test execution with pytest-run-parallel
    • ThreadSanitizer and AddressSanitizer jobs
  • Set OPENBLAS_NUM_THREADS=1 in TSan runs to avoid noise from non-instrumented OpenBLAS worker threads.
  • Move sanitizer suppression files into test/.

Tests

  • Add test/test_free_threading.py with concurrency and stress coverage for:
    • independent solver instances
    • shared solver instances
    • concurrent solve() / update() usage
    • warm starts
    • result isolation
    • error-path lock release behavior
    • rapid create / solve / destroy cycles
  • Keep test/test_thread_safety.py and mark thread-creating tests appropriately for pytest-run-parallel.
  • Add test/conftest.py to fail the test run if the GIL gets silently re-enabled during free-threaded testing.
  • Convert random/problem-generation tests away from global RNG state to explicit per-test RandomState(...) usage so they are deterministic and thread-safe.

Design Notes

  • Different SCS instances should be able to solve concurrently once the interpreter no longer serializes threads with the GIL.
  • Shared use of a single SCS instance is made safe by serialization, not by making the underlying workspace re-entrant.
  • SCS_init is still left unlocked; the intended model is that object construction is thread-local.
  • On ordinary GIL builds, this branch still adds the instance lock path, so the overhead is small but not literally zero.

Closes #130.

Test Coverage

This branch adds or updates coverage for:

  • existing build/test matrix on regular CPython
  • free-threaded CPython 3.13t / 3.14t
  • threaded parallel test execution
  • sanitizer-based race checking
  • deterministic random-problem backend tests

@ngoldbaum
Copy link
Copy Markdown

Hi! I’ll try to give this a once-over next week sometime.

@bodono
Copy link
Copy Markdown
Owner Author

bodono commented Apr 4, 2026

That would be amazing, thanks @ngoldbaum !

@bodono bodono force-pushed the free-threading branch 2 times, most recently from 6b60bc3 to 81de062 Compare April 6, 2026 18:53
Copy link
Copy Markdown

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked over the C code (with the help of Claude) and I think there are two thread safety issues that aren't addressed in this PR:

  • Uses of PyList_GetItem and PyDict_GetItemString on possibly shared containers.

    • You can fix this by using PyList_GetItemRef and PyDict_GetItemRef instead. Both are available in pythoncapi-compat, which it looks like you already have vendored. You'll also need to adjust the reference counting, since the new functions I'm suggesting return owned references.
  • Time-of-use vs Time-of-update races around the workspace pointer.

    • The (!self->work) check on line 832 of scsobject.h, in SCS_update, happens without holding the lock. Another thread could concurrently trigger SCS_finish or SCS_init, which both set self->work = NULL with the lock held. If that happens, you could have a null pointer dereference. I think both SCS_solve and SCS_update should move those checks into code blocks where they have the lock held.
    • That said, I'm pretty sure either of these scenarios are mostly theoretical, I don't think it's possible to trigger SCS_finish unless the object's refcount goes to zero. It is possible to manually re-trigger __init__, but that's a super weird thing to do. Either way, probably better to move the checks for whether the workspace exists into code where the lock is held, if only to make the code clearer.
  • PyThread_acquire_lock can fail and you're supposed to check for failure. Almost no code actually does that though so this is more of a minor nit.

Comment thread pyproject.toml Outdated
Comment thread .github/workflows/freethreading_tests.yml
Comment thread .github/workflows/freethreading_tests.yml Outdated
Comment thread .github/workflows/freethreading_tests.yml Outdated
@ngoldbaum
Copy link
Copy Markdown

Oh also, it looks like there isn't a test that shares an SCS objects between threads and simultaneously calls update and solve. That shouldn't race, assuming the locking is correct. You might even consider adding a test run that uses a TSan-instrumented build of CPython and scs-python if you want to be very sure there aren't any data races.

@bodono
Copy link
Copy Markdown
Owner Author

bodono commented Apr 10, 2026

Thanks for the very thorough review @ngoldbaum ! I am also working with Claude and I think we have addressed your comments.

#: 1
Comment: Use PyList_GetItemRef / PyDict_GetItemRef
Addressed?: Yes
Notes: All list/dict access uses owned-reference variants
────────────────────────────────────────
#: 2
Comment: TOCTOU races — check workspace under lock
Addressed?: Yes
Notes: Both SCS_solve and SCS_update check !self->work inside the lock
────────────────────────────────────────
#: 3
Comment: Check PyThread_acquire_lock return
Addressed?: Yes
Notes: Checked in solve/update; dealloc path intentionally unchecked (documented)
────────────────────────────────────────
#: 4
Comment: Drop 3.13t, focus on 3.14t
Addressed?: No
Notes: We test both 3.13t and 3.14t; happy to drop 3.13t if you feel strongly
────────────────────────────────────────
#: 5
Comment: Integrate into pixi-based CI
Addressed?: No
Notes: Kept as separate workflow — the sanitizer jobs build CPython from source with --disable-gil, and uv python
install 3.14t is simpler than wiring free-threaded Python through pixi. Different test invocations (sequential +
--parallel-threads) also make it a poor fit for the existing matrix.
────────────────────────────────────────
#: 6
Comment: faulthandler_timeout in pytest config
Addressed?: Yes
Notes: Added faulthandler_exit_on_timeout = true as well
────────────────────────────────────────
#: 7
Comment: conftest.py GIL check
Addressed?: Yes
Notes: NumPy-style pytest_terminal_summary hook detects GIL re-enablement
────────────────────────────────────────
#: 8
Comment: Concurrent solve/update test + TSan
Addressed?: Yes
Notes: test_shared_instance_concurrent_solve_and_update uses barrier-synchronized threads; TSan and ASan CI jobs run against these tests

@ngoldbaum
Copy link
Copy Markdown

Awesome, I’ll try to give this another once-over next week.

Copy link
Copy Markdown

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. I had a few minor comments inline. I used Claude to review this and didn't read all the new tests in detail. I did read the changes to scsobject.h and the CI/sanitizer config in detail.

Comment thread pyproject.toml Outdated
'Programming Language :: Python :: 3.13',
'Programming Language :: Python :: 3.14',
'Programming Language :: Python :: 3 :: Only',
'Programming Language :: Python :: Implementation :: Free Threading :: 1 - Unstable',
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point I'd call this stable!

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — changed to 3 - Stable. Also fixed the classifier string: the previous one had an extraneous Implementation :: segment that isn't in the trove-classifiers list, so PyPI would have rejected it on upload.

Comment thread pyproject.toml Outdated

[tool.cibuildwheel]
enable = [
"cpython-freethreading", # Build free-threaded (3.13t+) wheels
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out cibuildwheel is going to remove this option in the next release, so you might as well never enable it in the first place: pypa/cibuildwheel#2787

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped. Verified that on cibuildwheel 3.4.1 (which we pin), 3.14t wheels build without the flag — see the v3.4.1 release notes. Side effect: we no longer ship 3.13t wheels, which matches the migration direction in your earlier comment anyway.

Comment thread pyproject.toml Outdated
"thread_unsafe: mark test as unsafe to run in parallel threads (pytest-run-parallel)",
]
# pytest-run-parallel: fixtures and functions that are not thread-safe
thread_unsafe_fixtures = ["capsys", "capfd"]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped, thanks.

Comment thread meson.build
endif

_deps = [blas_deps]
_deps += dependency('threads')
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the pthreads dependency?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's for the upstream thread-safe ctrl-c fix (cvxgrp/scs#375). scs_source/src/ctrlc.c now uses a pthread_mutex_t to protect the ref-counted signal-handler registration, so the extension needs to link against the pthread library on platforms where libc doesn't pull it in by default. dependency('threads') is the meson-idiomatic, portable way to ask for that (it resolves to -lpthread on glibc, nothing on musl/macOS, the right Windows shim, etc.).

Comment thread test/test_free_threading.py Outdated
try:
# Re-initialize with same data -- SCS_init checks
# if self->work is set and errors out
solver.__init__(data, cone, verbose=False)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally I've been operating under the assumption that objects are only ever initialized once and anyone calling __init__ directly like this is doing something that shouldn't be made thread-safe. It makes it a lot simpler to reason about code if you make the assumption that the initializer always runs on an unshared instance.

But if you do care and want to protect against this, I think that means SCS_init should acquire the lock when it checks for the workspace pointer.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — went with the first option. SCS_init stays unlocked, the re-init concurrency test is gone (39c1579), and SCS.__init__ now has a docstring note making the thread-local-construction assumption explicit. Matches the pattern in NumPy/SciPy/h5py/etc.

bodono and others added 16 commits April 17, 2026 17:24
Enables scs to run with the GIL disabled in free-threaded CPython builds.

C extension changes:
- Add PyMutex per-instance lock (only in Py_GIL_DISABLED builds) to serialize
  concurrent access to the SCS workspace from multiple threads
- Lock around scs_solve, scs_update, and scs_finish (pure C calls safe to
  hold across Py_BEGIN_ALLOW_THREADS)
- No locking in SCS_init (object is thread-local during construction)
- Declare module as GIL-not-used via PyUnstable_Module_SetGIL

Build/packaging:
- Enable cpython-freethreading in cibuildwheel to ship 3.13t+ wheels
- Add free-threading PyPI classifier

CI:
- Add free-threading test workflow (3.13t + 3.14t) alongside existing build.yml

Tests:
- 15 new concurrency tests covering: independent instances, shared instances,
  solve+update sequences, warm starts, legacy API, stress tests, result isolation

Closes #130

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
uv pip install requires a virtual environment or --system flag.
Use uv venv to create a proper venv for the free-threaded Python.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
meson-python and meson must be in the venv when using --no-build-isolation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ation

Let the build system handle its own dependencies (meson-python, numpy)
via pyproject.toml build-system.requires, rather than manually installing
them and using --no-build-isolation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Correctness fixes from @ngoldbaum's review:
- PyDict_GetItemString -> PyDict_GetItemStringRef (strong refs)
- PyList_GetItem -> PyList_GetItemRef (strong refs)
- Proper Py_DECREF and scs_free on all error paths
- Move self->work NULL check under lock (TOCTOU fix) in SCS_solve/SCS_update
- Check PyThread_acquire_lock return value (PY_LOCK_ACQUIRED)
- Add comment to SCS_finish explaining unchecked lock acquire in dealloc path
- Fix SCS_update lock release ordering with comment explaining why it
  differs from SCS_solve (avoid holding instance lock while waiting for GIL)

Thread-safe RNG in all test files:
- Replace all np.random.seed() + global RNG with local RandomState instances
- Each test file gets unique seeds so tests exercise different random data
- Backend-variant tests (direct/dense/mkl/cudss) share seeds intentionally
  since they test the same problem with different linear system solvers

Testing infrastructure:
- 27 new threading tests covering borrowed refs, TOCTOU races, concurrent
  solve+update, re-init races, error path lock release, stress tests
- GIL re-enable detection in conftest.py (modeled after NumPy)
- pytest-run-parallel in CI (--parallel-threads=4 --iterations=3)
- TSan CI job with cached CPython 3.14t build
- faulthandler_timeout and thread_unsafe markers in pyproject.toml
- tsan-suppressions.txt for known CPython-internal races

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update scs_source submodule to thread-safe-ctrlc branch which adds
mutex-protected reference counting for signal handler registration,
fixing TSan-detected data races when multiple threads solve concurrently.

Add pthreads dependency to meson.build for the ctrlc mutex.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
OpenBLAS spawns internal threads that are not instrumented with TSan
annotations, causing false positive data race reports (e.g. in dsyrk
worker threads racing with the caller after the BLAS call returns).
Forcing single-threaded BLAS under TSan is standard practice.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Refactor the TSan CI job into a matrix over {tsan, asan}. Both build
CPython 3.14t from source with the corresponding sanitizer flag. Runtime
options (TSAN_OPTIONS, ASAN_OPTIONS, OPENBLAS_NUM_THREADS) are set via
GITHUB_ENV in a shared setup step.

ASan uses detect_leaks=0 to avoid false positives from CPython's
internal memory allocator.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ASan's LeakSanitizer was killing the build because ASAN_OPTIONS was only
set after the build. Move the env setup step before pip install so
detect_leaks=0 applies during compilation too.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move tsan-suppressions.txt to test/ and add lsan-suppressions.txt for
CPython-internal leaks. Use LSan suppressions instead of detect_leaks=0
so that real SCS leaks are still caught.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Set faulthandler_exit_on_timeout = true in pyproject.toml so hanging
tests actually terminate in CI instead of just dumping tracebacks.
Remove redundant -p no:faulthandler -o faulthandler_timeout=600 flags
from the free-threading CI since the config is now in pyproject.toml.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix Free Threading trove classifier: drop extraneous "Implementation ::"
  prefix, bump "1 - Unstable" to "3 - Stable" per PEP 779 and reviewer
  feedback (free-threading is no longer experimental in 3.14).
- Drop `enable = ["cpython-freethreading"]`: deprecated in cibuildwheel
  3.4.1, 3.14t wheels build without it. Side effect: we no longer ship
  3.13t wheels, which aligns with upstream's migration push to 3.14t.
- Drop `thread_unsafe_fixtures = ["capsys", "capfd"]`: pytest-run-parallel
  marks these automatically.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Per review discussion: the `SCS_init` guard against concurrent re-init of
a live instance is not standard practice for CPython extensions — NumPy,
SciPy, and similar libraries assume object construction is thread-local
and do not lock `__init__`. The `test_reinit_while_solving` test also
swallowed exceptions broadly, which hid rather than detected any real
race on the `self->work` field.

Remove the test and add a docstring note to `SCS.__init__` making the
thread-local-construction assumption explicit.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Missed during the rebase onto master: this test still passed the old
`use_indirect=` boolean kwarg, which was removed in #189 in favor of
the `linear_solver=scs.LinearSolver.*` enum. This broke the full CI
matrix (wheel builds, accelerate builds, free-threading tests,
sanitizers) on a single shared failure.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
These tests patch scs.sys and scs._load_module — module-level state
that leaks to other tests running in parallel threads under
pytest-run-parallel, causing MagicMock to be substituted for the real
scs.SCS class in unrelated tests.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@bodono
Copy link
Copy Markdown
Owner Author

bodono commented Apr 17, 2026

Thanks again for all your help @ngoldbaum !

@bodono bodono merged commit 3c84c8e into master Apr 17, 2026
65 of 66 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ENH: Free-Threading (No-GIL) Support for Python 3.13+

2 participants