Skip to content

feat: Add json+cuda_ipc array encoding for GPU-direct tensor transfer#588

Draft
dionhaefner wants to merge 3 commits into
mainfrom
dion/gpu-go-brr
Draft

feat: Add json+cuda_ipc array encoding for GPU-direct tensor transfer#588
dionhaefner wants to merge 3 commits into
mainfrom
dion/gpu-go-brr

Conversation

@dionhaefner

Copy link
Copy Markdown
Contributor

Summary

  • New cuda_ipc array encoding that passes CUDA IPC memory handles instead of serialized tensor bytes
  • Framework-agnostic: works with any GPU array that implements __cuda_array_interface__ (PyTorch, CuPy, JAX, Numba)
  • Uses ctypes calls to libcudart directly — no PyTorch or CuPy dependency in the encode path
  • Decode path requires CuPy (returns cupy.ndarray); consumers convert via torch.as_tensor() or DLPack as needed
  • Containers launched with json+cuda_ipc automatically get --ipc=host

What problem does this solve?

Tesseract's data path is CPU-bound. Array encoding serializes via JSON/base64/binref, and copies to CPU at every boundary. For tight composition loops (optimization, MCMC), the GPU→CPU→serialize→network→CPU→GPU round-trip dominates wall time.

With cuda_ipc, tensors stay on the GPU, while the CPU only handles metadata (like shape and dtype) – this is essentially binref for GPU memory.

Usage

# Container path
t = Tesseract.from_image("my_gpu_tesseract", gpus=["0"], output_format="json+cuda_ipc")
t.serve()
result = t.apply({"x": cupy_array})  # or torch tensor, or any __cuda_array_interface__ object

# Local path
t = Tesseract.from_tesseract_api("tesseract_api.py", output_format="json+cuda_ipc")

Requirements

  • CUDA runtime (libcudart.so) on both producer and consumer
  • CuPy for decoding (pip install cupy-cuda12x)
  • --ipc=host for cross-container IPC (handled automatically by engine.py)
  • Both processes must see the same physical GPU

@codecov

codecov Bot commented May 11, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 21.77419% with 97 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.39%. Comparing base (54d5c74) to head (1889fd0).

Files with missing lines Patch % Lines
tesseract_core/runtime/array_encoding.py 21.35% 80 Missing and 1 partial ⚠️
tesseract_core/sdk/tesseract.py 26.66% 10 Missing and 1 partial ⚠️
tesseract_core/sdk/engine.py 0.00% 2 Missing and 1 partial ⚠️
tesseract_core/runtime/file_interactions.py 33.33% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #588      +/-   ##
==========================================
+ Coverage   67.22%   75.39%   +8.17%     
==========================================
  Files          32       32              
  Lines        4519     4638     +119     
  Branches      743      765      +22     
==========================================
+ Hits         3038     3497     +459     
+ Misses       1237      831     -406     
- Partials      244      310      +66     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@PasteurBot

Copy link
Copy Markdown
Contributor

Benchmark Results

Benchmarks use a no-op Tesseract to measure pure framework overhead.

🚀 0 faster, ⚠️ 0 slower, ✅ 36 unchanged

✅ No significant performance changes detected.

Full results
Benchmark Baseline Current Change Status
api/apply_1,000 0.402ms 0.393ms -2.2%
api/apply_100,000 0.408ms 0.400ms -1.9%
api/apply_10,000,000 0.407ms 0.397ms -2.5%
cli/apply_1,000 1672.317ms 1648.496ms -1.4%
cli/apply_100,000 1702.285ms 1680.354ms -1.3%
cli/apply_10,000,000 1766.081ms 1730.622ms -2.0%
decoding/base64_1,000 0.027ms 0.027ms -1.5%
decoding/base64_100,000 0.751ms 0.758ms +0.9%
decoding/base64_10,000,000 139.384ms 139.584ms +0.1%
decoding/binref_1,000 0.170ms 0.172ms +1.3%
decoding/binref_100,000 0.262ms 0.264ms +0.8%
decoding/binref_10,000,000 27.732ms 28.095ms +1.3%
decoding/json_1,000 0.089ms 0.089ms -0.0%
decoding/json_100,000 8.255ms 8.280ms +0.3%
decoding/json_10,000,000 1116.804ms 1121.226ms +0.4%
encoding/base64_1,000 0.034ms 0.034ms -0.7%
encoding/base64_100,000 0.204ms 0.206ms +0.7%
encoding/base64_10,000,000 66.182ms 65.200ms -1.5%
encoding/binref_1,000 0.236ms 0.242ms +2.2%
encoding/binref_100,000 0.410ms 0.412ms +0.5%
encoding/binref_10,000,000 30.667ms 30.439ms -0.7%
encoding/json_1,000 0.117ms 0.119ms +1.2%
encoding/json_100,000 10.918ms 11.576ms +6.0%
encoding/json_10,000,000 1296.752ms 1317.191ms +1.6%
http/apply_1,000 2.856ms 2.910ms +1.9%
http/apply_100,000 8.743ms 9.234ms +5.6%
http/apply_10,000,000 930.312ms 941.730ms +1.2%
roundtrip/base64_1,000 0.071ms 0.070ms -1.0%
roundtrip/base64_100,000 1.122ms 1.127ms +0.4%
roundtrip/base64_10,000,000 206.272ms 208.074ms +0.9%
roundtrip/binref_1,000 0.421ms 0.418ms -0.7%
roundtrip/binref_100,000 0.661ms 0.662ms +0.2%
roundtrip/binref_10,000,000 59.083ms 59.370ms +0.5%
roundtrip/json_1,000 0.220ms 0.217ms -1.1%
roundtrip/json_100,000 18.543ms 17.888ms -3.5%
roundtrip/json_10,000,000 2413.511ms 2415.169ms +0.1%
  • Runner: Linux 6.17.0-1010-azure x86_64

jpbrodrick89 and others added 2 commits May 29, 2026 16:30
The 'Merge branch main into dion/gpu-go-brr' conflict resolution left
output_format/timeout params at 7-space indent and an over-length
is_leaf lambda. ruff-format clean now so the pre-commit/CI gate passes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants