Skip to content

[Redis] Add RedisLazyStackedTensorDict for lazy stack storage#1573

Closed
vmoens wants to merge 7 commits intomainfrom
redis-lazy-stack
Closed

[Redis] Add RedisLazyStackedTensorDict for lazy stack storage#1573
vmoens wants to merge 7 commits intomainfrom
redis-lazy-stack

Conversation

@vmoens
Copy link
Copy Markdown
Collaborator

@vmoens vmoens commented Feb 16, 2026

Summary

  • Adds RedisLazyStackedTensorDict, a TensorDictBase subclass that stores LazyStackedTensorDict data in Redis as concatenated blobs, using only O(K) Redis keys for K leaf keys regardless of the number of stack elements N (e.g. 95 keys for 30 leaves and 1M elements, vs 30M with per-element storage).
  • Supports both homogeneous (same-shape, arithmetic offsets) and heterogeneous (variable-shape, packed int64 offset tables) storage modes, auto-detected during upload.
  • Adds to_redis() convenience method on TensorDictBase, following the same pattern as to_h5().
  • Streaming upload processes elements in chunks of 10K to avoid materializing the full stack in memory.

Test plan

  • 23 new tests in TestRedisLazyStackedTensorDict covering:
    • Construction from LazyStackedTensorDict (homogeneous and heterogeneous)
    • Read access: td[int], td[key], td[int][key], td[slice], td[::step], td[tensor_idx]
    • Write access: td[int] = subtd, td[slice] = val, td[tensor_idx] = val
    • Materialization: to_tensordict(), to_local(), td[idx].to_tensordict()
    • Persistence: pickle roundtrip, from_redis() reconnect
    • Nested keys
  • All 82 existing + new redis tests pass locally
  • Black and flake8 clean

Made with Cursor

vmoens and others added 5 commits February 16, 2026 10:58
- Apply ufmt formatting to bench_redis.py and redis.py
- Ignore PytestUnraisableExceptionWarning in test_tensorclass.py to
  prevent __del__ GC exceptions from failing unrelated tests
- Install torchvision nightly in docs workflow to fix torchvision
  compatibility errors in tutorial builds

Co-authored-by: Cursor <cursoragent@cursor.com>
- Ignore PytestUnraisableExceptionWarning in test_tensordict.py (fixes
  test_to_memory_leak failure on 3.14 and silicon 3.14)
- Make h5py install optional for Python 3.14t since it fails to build
  from source on free-threaded Python

Co-authored-by: Cursor <cursoragent@cursor.com>
orjson does not support free-threaded Python yet, similar to h5py.

Co-authored-by: Cursor <cursoragent@cursor.com>
- Add skipif for test_auto_batch_size when h5py is not installed
- Skip h5=True parametrizations in test_index_with_generator when h5py
  is missing
- Skip no-build-isolation install tests on free-threaded Python

Co-authored-by: Cursor <cursoragent@cursor.com>
Implement RedisLazyStackedTensorDict, a TensorDictBase subclass that
stores LazyStackedTensorDict data in Redis as concatenated blobs with
O(K) Redis keys for K leaf keys, regardless of the number of stack
elements N.

Key features:
- Homogeneous mode: same-shape elements use arithmetic offsets, reusing
  existing GETRANGE/Lua byte-range infrastructure
- Heterogeneous mode: variable-shape elements use packed int64 offset
  tables per key, with per-element shapes in metadata
- Streaming upload: data processed in chunks of 10K elements to avoid
  materializing the full stack in memory
- Full indexing: td[int], td[slice], td[::step], td[tensor], td[bool]
  all work via _index_tensordict override and pipelined GETRANGE/Lua
- to_redis() convenience method on TensorDictBase
- Pickle support and from_redis() reconnection

Co-authored-by: Cursor <cursoragent@cursor.com>
@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 16, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Feb 16, 2026

$\color{#D29922}\textsf{\Large&amp;#x26A0;\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 243. Improved: $\large\color{#35bf28}22$. Worsened: $\large\color{#d91a1a}4$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 38.8020μs 14.2163μs 70.3420 KOps/s 70.8511 KOps/s $\color{#d91a1a}-0.72\%$
test_plain_set_stack_nested 45.9930μs 14.4827μs 69.0480 KOps/s 69.8096 KOps/s $\color{#d91a1a}-1.09\%$
test_plain_set_nested_inplace 41.4030μs 15.9697μs 62.6187 KOps/s 64.1995 KOps/s $\color{#d91a1a}-2.46\%$
test_plain_set_stack_nested_inplace 53.2230μs 15.7162μs 63.6286 KOps/s 64.2951 KOps/s $\color{#d91a1a}-1.04\%$
test_items 42.8730μs 5.5304μs 180.8195 KOps/s 179.8177 KOps/s $\color{#35bf28}+0.56\%$
test_items_nested 0.5094ms 0.4398ms 2.2740 KOps/s 2.3125 KOps/s $\color{#d91a1a}-1.67\%$
test_items_nested_locked 0.5560ms 0.4438ms 2.2531 KOps/s 2.2823 KOps/s $\color{#d91a1a}-1.28\%$
test_items_nested_leaf 0.1460ms 90.9302μs 10.9974 KOps/s 10.8319 KOps/s $\color{#35bf28}+1.53\%$
test_items_stack_nested 0.5553ms 0.4415ms 2.2650 KOps/s 2.3029 KOps/s $\color{#d91a1a}-1.65\%$
test_items_stack_nested_leaf 0.1509ms 91.0638μs 10.9813 KOps/s 10.8687 KOps/s $\color{#35bf28}+1.04\%$
test_items_stack_nested_locked 0.4981ms 0.4433ms 2.2559 KOps/s 2.2769 KOps/s $\color{#d91a1a}-0.92\%$
test_keys 23.2620μs 4.1391μs 241.5969 KOps/s 242.1664 KOps/s $\color{#d91a1a}-0.24\%$
test_keys_nested 0.1665ms 0.1171ms 8.5424 KOps/s 8.3874 KOps/s $\color{#35bf28}+1.85\%$
test_keys_nested_locked 0.6483ms 0.1272ms 7.8638 KOps/s 7.8337 KOps/s $\color{#35bf28}+0.38\%$
test_keys_nested_leaf 0.1857ms 0.1082ms 9.2463 KOps/s 9.2268 KOps/s $\color{#35bf28}+0.21\%$
test_keys_stack_nested 0.1846ms 0.1187ms 8.4258 KOps/s 8.4332 KOps/s $\color{#d91a1a}-0.09\%$
test_keys_stack_nested_leaf 0.1701ms 0.1078ms 9.2751 KOps/s 9.2751 KOps/s $-0.00\%$
test_keys_stack_nested_locked 0.1991ms 0.1261ms 7.9292 KOps/s 7.8282 KOps/s $\color{#35bf28}+1.29\%$
test_values 5.6962μs 0.9960μs 1.0041 MOps/s 1.0012 MOps/s $\color{#35bf28}+0.29\%$
test_values_nested 82.1150μs 46.4159μs 21.5443 KOps/s 21.6294 KOps/s $\color{#d91a1a}-0.39\%$
test_values_nested_locked 94.4460μs 49.2198μs 20.3170 KOps/s 20.1921 KOps/s $\color{#35bf28}+0.62\%$
test_values_nested_leaf 0.1167ms 52.6465μs 18.9946 KOps/s 18.9836 KOps/s $\color{#35bf28}+0.06\%$
test_values_stack_nested 78.0640μs 46.5439μs 21.4851 KOps/s 21.5466 KOps/s $\color{#d91a1a}-0.29\%$
test_values_stack_nested_leaf 93.1060μs 53.1245μs 18.8237 KOps/s 18.9468 KOps/s $\color{#d91a1a}-0.65\%$
test_values_stack_nested_locked 91.8750μs 49.6338μs 20.1476 KOps/s 20.1246 KOps/s $\color{#35bf28}+0.11\%$
test_membership 4.6287μs 0.8170μs 1.2239 MOps/s 1.2313 MOps/s $\color{#d91a1a}-0.60\%$
test_membership_nested 22.9610μs 2.7524μs 363.3237 KOps/s 361.7447 KOps/s $\color{#35bf28}+0.44\%$
test_membership_nested_leaf 27.7510μs 2.7727μs 360.6562 KOps/s 362.9467 KOps/s $\color{#d91a1a}-0.63\%$
test_membership_stacked_nested 38.5020μs 2.7709μs 360.8883 KOps/s 362.2366 KOps/s $\color{#d91a1a}-0.37\%$
test_membership_stacked_nested_leaf 25.5010μs 2.7628μs 361.9503 KOps/s 364.7556 KOps/s $\color{#d91a1a}-0.77\%$
test_membership_nested_last 39.1820μs 4.1182μs 242.8268 KOps/s 242.8961 KOps/s $\color{#d91a1a}-0.03\%$
test_membership_nested_leaf_last 26.5910μs 4.1813μs 239.1605 KOps/s 241.4803 KOps/s $\color{#d91a1a}-0.96\%$
test_membership_stacked_nested_last 21.5020μs 4.1255μs 242.3978 KOps/s 243.1646 KOps/s $\color{#d91a1a}-0.32\%$
test_membership_stacked_nested_leaf_last 38.7020μs 4.1514μs 240.8822 KOps/s 240.2708 KOps/s $\color{#35bf28}+0.25\%$
test_nested_getleaf 51.0730μs 20.6875μs 48.3384 KOps/s 48.1444 KOps/s $\color{#35bf28}+0.40\%$
test_nested_get 60.2540μs 18.9413μs 52.7947 KOps/s 51.2556 KOps/s $\color{#35bf28}+3.00\%$
test_stacked_getleaf 77.8550μs 20.3355μs 49.1751 KOps/s 47.8505 KOps/s $\color{#35bf28}+2.77\%$
test_stacked_get 51.9230μs 19.5242μs 51.2186 KOps/s 50.6685 KOps/s $\color{#35bf28}+1.09\%$
test_nested_getitemleaf 45.8630μs 21.4158μs 46.6945 KOps/s 47.2139 KOps/s $\color{#d91a1a}-1.10\%$
test_nested_getitem 56.4330μs 19.7680μs 50.5868 KOps/s 50.5356 KOps/s $\color{#35bf28}+0.10\%$
test_stacked_getitemleaf 55.2730μs 20.7678μs 48.1515 KOps/s 46.9094 KOps/s $\color{#35bf28}+2.65\%$
test_stacked_getitem 49.0430μs 19.8087μs 50.4830 KOps/s 49.8940 KOps/s $\color{#35bf28}+1.18\%$
test_lock_nested 8.3284ms 0.4560ms 2.1932 KOps/s 2.1917 KOps/s $\color{#35bf28}+0.07\%$
test_lock_stack_nested 0.5303ms 0.4548ms 2.1986 KOps/s 2.1729 KOps/s $\color{#35bf28}+1.18\%$
test_unlock_nested 0.6785ms 0.3628ms 2.7566 KOps/s 2.7233 KOps/s $\color{#35bf28}+1.22\%$
test_unlock_stack_nested 0.4198ms 0.3648ms 2.7414 KOps/s 2.6831 KOps/s $\color{#35bf28}+2.17\%$
test_flatten_speed 0.1502ms 0.1168ms 8.5634 KOps/s 8.5749 KOps/s $\color{#d91a1a}-0.13\%$
test_unflatten_speed 0.5805ms 0.5423ms 1.8441 KOps/s 1.8305 KOps/s $\color{#35bf28}+0.74\%$
test_common_ops 0.8378ms 0.6800ms 1.4705 KOps/s 1.4852 KOps/s $\color{#d91a1a}-0.99\%$
test_creation 0.1202ms 2.6840μs 372.5740 KOps/s 365.6714 KOps/s $\color{#35bf28}+1.89\%$
test_creation_empty 42.3020μs 5.7681μs 173.3666 KOps/s 175.2517 KOps/s $\color{#d91a1a}-1.08\%$
test_creation_nested_1 40.6720μs 10.0118μs 99.8824 KOps/s 101.7073 KOps/s $\color{#d91a1a}-1.79\%$
test_creation_nested_2 42.7020μs 11.2887μs 88.5843 KOps/s 90.0344 KOps/s $\color{#d91a1a}-1.61\%$
test_creation_many_keys[10] 52.6630μs 16.9297μs 59.0679 KOps/s 59.5467 KOps/s $\color{#d91a1a}-0.80\%$
test_creation_many_keys[50] 99.9260μs 73.7166μs 13.5655 KOps/s 13.8992 KOps/s $\color{#d91a1a}-2.40\%$
test_creation_many_keys[100] 0.1953ms 0.1435ms 6.9673 KOps/s 7.0862 KOps/s $\color{#d91a1a}-1.68\%$
test_creation_nested_many_keys[10] 73.0640μs 36.3733μs 27.4927 KOps/s 27.5029 KOps/s $\color{#d91a1a}-0.04\%$
test_creation_nested_many_keys[50] 0.2064ms 0.1495ms 6.6910 KOps/s 6.7331 KOps/s $\color{#d91a1a}-0.62\%$
test_clone 63.5140μs 12.8777μs 77.6538 KOps/s 76.1892 KOps/s $\color{#35bf28}+1.92\%$
test_getitem[int] 1.7832ms 13.8875μs 72.0071 KOps/s 56.9723 KOps/s $\textbf{\color{#35bf28}+26.39\%}$
test_getitem[slice_int] 0.1375ms 23.4029μs 42.7297 KOps/s 42.5933 KOps/s $\color{#35bf28}+0.32\%$
test_getitem[range] 0.1697ms 61.1100μs 16.3639 KOps/s 16.0797 KOps/s $\color{#35bf28}+1.77\%$
test_getitem[tuple] 0.1383ms 22.8928μs 43.6819 KOps/s 43.3946 KOps/s $\color{#35bf28}+0.66\%$
test_getitem[list] 0.1833ms 56.0327μs 17.8467 KOps/s 17.6063 KOps/s $\color{#35bf28}+1.37\%$
test_setitem_dim[int] 46.7130μs 24.6376μs 40.5883 KOps/s 36.1009 KOps/s $\textbf{\color{#35bf28}+12.43\%}$
test_setitem_dim[slice_int] 69.1740μs 42.0388μs 23.7876 KOps/s 22.0354 KOps/s $\textbf{\color{#35bf28}+7.95\%}$
test_setitem_dim[range] 0.1324ms 94.4035μs 10.5928 KOps/s 10.9182 KOps/s $\color{#d91a1a}-2.98\%$
test_setitem_dim[tuple] 65.1840μs 40.0937μs 24.9416 KOps/s 25.0901 KOps/s $\color{#d91a1a}-0.59\%$
test_setitem 55.6230μs 16.7729μs 59.6201 KOps/s 57.3516 KOps/s $\color{#35bf28}+3.96\%$
test_set 65.3340μs 16.0938μs 62.1356 KOps/s 61.0051 KOps/s $\color{#35bf28}+1.85\%$
test_set_shared 0.5693ms 0.2027ms 4.9336 KOps/s 4.7778 KOps/s $\color{#35bf28}+3.26\%$
test_update 0.3593ms 20.4493μs 48.9014 KOps/s 44.2006 KOps/s $\textbf{\color{#35bf28}+10.64\%}$
test_update_nested 77.3940μs 32.4246μs 30.8408 KOps/s 30.5882 KOps/s $\color{#35bf28}+0.83\%$
test_update__nested 0.4857ms 33.4145μs 29.9271 KOps/s 29.7093 KOps/s $\color{#35bf28}+0.73\%$
test_set_nested 60.5440μs 18.1177μs 55.1947 KOps/s 54.2875 KOps/s $\color{#35bf28}+1.67\%$
test_set_nested_new 68.5240μs 22.9175μs 43.6348 KOps/s 42.1093 KOps/s $\color{#35bf28}+3.62\%$
test_select 69.4140μs 38.8749μs 25.7235 KOps/s 24.6930 KOps/s $\color{#35bf28}+4.17\%$
test_select_nested 0.1173ms 70.8523μs 14.1139 KOps/s 14.1591 KOps/s $\color{#d91a1a}-0.32\%$
test_exclude_nested 0.1444ms 86.3379μs 11.5824 KOps/s 11.5399 KOps/s $\color{#35bf28}+0.37\%$
test_empty[True] 0.4363ms 0.3755ms 2.6632 KOps/s 2.6575 KOps/s $\color{#35bf28}+0.21\%$
test_empty[False] 12.2882μs 1.2650μs 790.5211 KOps/s 787.5069 KOps/s $\color{#35bf28}+0.38\%$
test_to 0.1031ms 72.4030μs 13.8116 KOps/s 14.1971 KOps/s $\color{#d91a1a}-2.72\%$
test_to_nonblocking 0.1041ms 66.2112μs 15.1032 KOps/s 15.6977 KOps/s $\color{#d91a1a}-3.79\%$
test_unbind_speed 0.3701ms 0.3113ms 3.2125 KOps/s 3.2003 KOps/s $\color{#35bf28}+0.38\%$
test_unbind_speed_stack0 0.3617ms 0.3080ms 3.2468 KOps/s 3.2066 KOps/s $\color{#35bf28}+1.25\%$
test_unbind_speed_stack1 0.1032s 0.8790ms 1.1377 KOps/s 1.1128 KOps/s $\color{#35bf28}+2.23\%$
test_split 1.1780ms 1.0845ms 922.0795 Ops/s 918.9973 Ops/s $\color{#35bf28}+0.34\%$
test_chunk 0.1033s 1.1520ms 868.0179 Ops/s 951.9561 Ops/s $\textbf{\color{#d91a1a}-8.82\%}$
test_to_cpu_blocking 19.1713ms 18.7071ms 53.4557 Ops/s 48.3416 Ops/s $\textbf{\color{#35bf28}+10.58\%}$
test_to_cpu_global_sync 11.0135ms 10.8917ms 91.8134 Ops/s 91.4407 Ops/s $\color{#35bf28}+0.41\%$
test_to_cpu_event_sync 0.1149s 12.9870ms 77.0000 Ops/s 84.9671 Ops/s $\textbf{\color{#d91a1a}-9.38\%}$
test_to_cpu_default 12.1815ms 11.8062ms 84.7015 Ops/s 84.7642 Ops/s $\color{#d91a1a}-0.07\%$
test_consolidate[False-None] 4.1354ms 3.9451ms 253.4774 Ops/s 226.0395 Ops/s $\textbf{\color{#35bf28}+12.14\%}$
test_consolidate[default-None] 2.0577ms 1.9417ms 515.0231 Ops/s 505.9531 Ops/s $\color{#35bf28}+1.79\%$
test_consolidate[reduce-overhead-None] 2.0499ms 1.8658ms 535.9653 Ops/s 522.1745 Ops/s $\color{#35bf28}+2.64\%$
test_consolidate_njt[False-None] 8.8069ms 8.1169ms 123.1995 Ops/s 121.3843 Ops/s $\color{#35bf28}+1.50\%$
test_to[False-False-None] 2.4097ms 1.9973ms 500.6642 Ops/s 490.9072 Ops/s $\color{#35bf28}+1.99\%$
test_to[True-False-None] 2.1041ms 1.8632ms 536.6986 Ops/s 535.7286 Ops/s $\color{#35bf28}+0.18\%$
test_to[within-False-None] 6.1998ms 5.8891ms 169.8047 Ops/s 169.5192 Ops/s $\color{#35bf28}+0.17\%$
test_to[True-default-None] 7.6206ms 7.3736ms 135.6195 Ops/s 131.4642 Ops/s $\color{#35bf28}+3.16\%$
test_to_njt[False-False-None] 8.3518ms 8.2638ms 121.0091 Ops/s 120.2883 Ops/s $\color{#35bf28}+0.60\%$
test_to_njt[True-False-None] 7.1218ms 6.7742ms 147.6182 Ops/s 147.5583 Ops/s $\color{#35bf28}+0.04\%$
test_to_njt[within-False-None] 15.6212ms 14.8593ms 67.2977 Ops/s 65.5190 Ops/s $\color{#35bf28}+2.71\%$
test_creation[device0] 0.4474ms 0.1151ms 8.6890 KOps/s 8.6966 KOps/s $\color{#d91a1a}-0.09\%$
test_creation_from_tensor 0.4506ms 0.1127ms 8.8751 KOps/s 8.8807 KOps/s $\color{#d91a1a}-0.06\%$
test_add_one[memmap_tensor0] 0.3430ms 6.3843μs 156.6346 KOps/s 154.7175 KOps/s $\color{#35bf28}+1.24\%$
test_contiguous[memmap_tensor0] 30.0210μs 0.6199μs 1.6130 MOps/s 2.2632 MOps/s $\textbf{\color{#d91a1a}-28.73\%}$
test_stack[memmap_tensor0] 30.6210μs 4.5595μs 219.3235 KOps/s 220.0725 KOps/s $\color{#d91a1a}-0.34\%$
test_memmaptd_index 1.1633ms 0.2549ms 3.9225 KOps/s 3.9897 KOps/s $\color{#d91a1a}-1.68\%$
test_memmaptd_index_astensor 0.4989ms 0.3425ms 2.9193 KOps/s 2.9351 KOps/s $\color{#d91a1a}-0.54\%$
test_memmaptd_index_op 0.8564ms 0.5800ms 1.7241 KOps/s 1.7025 KOps/s $\color{#35bf28}+1.27\%$
test_serialize_model 0.1399s 0.1374s 7.2804 Ops/s 7.3728 Ops/s $\color{#d91a1a}-1.25\%$
test_serialize_model_pickle 1.3491s 1.2162s 0.8222 Ops/s 0.8233 Ops/s $\color{#d91a1a}-0.13\%$
test_serialize_weights 0.1408s 0.1365s 7.3246 Ops/s 7.3880 Ops/s $\color{#d91a1a}-0.86\%$
test_serialize_weights_returnearly 0.2918s 81.2918ms 12.3014 Ops/s 11.4219 Ops/s $\textbf{\color{#35bf28}+7.70\%}$
test_serialize_weights_pickle 1.3472s 1.2099s 0.8265 Ops/s 0.8228 Ops/s $\color{#35bf28}+0.45\%$
test_reshape_pytree 0.2057ms 31.8226μs 31.4242 KOps/s 31.1159 KOps/s $\color{#35bf28}+0.99\%$
test_reshape_td 84.3450μs 41.7576μs 23.9477 KOps/s 23.0599 KOps/s $\color{#35bf28}+3.85\%$
test_view_pytree 0.2178ms 31.4952μs 31.7509 KOps/s 31.4810 KOps/s $\color{#35bf28}+0.86\%$
test_view_td 0.2162ms 52.4068μs 19.0815 KOps/s 19.9509 KOps/s $\color{#d91a1a}-4.36\%$
test_unbind_pytree 0.2362ms 35.2055μs 28.4047 KOps/s 28.0683 KOps/s $\color{#35bf28}+1.20\%$
test_unbind_td 0.1626ms 46.4393μs 21.5335 KOps/s 21.5637 KOps/s $\color{#d91a1a}-0.14\%$
test_split_pytree 0.2460ms 40.7771μs 24.5236 KOps/s 24.6722 KOps/s $\color{#d91a1a}-0.60\%$
test_split_td 0.1852ms 61.5353μs 16.2508 KOps/s 15.9034 KOps/s $\color{#35bf28}+2.18\%$
test_add_pytree 0.1911ms 42.6910μs 23.4242 KOps/s 24.3692 KOps/s $\color{#d91a1a}-3.88\%$
test_add_td 0.1075ms 55.8257μs 17.9129 KOps/s 19.5253 KOps/s $\textbf{\color{#d91a1a}-8.26\%}$
test_compile_add_one_nested[tensordict-compile] 0.1965ms 0.1363ms 7.3358 KOps/s 7.1104 KOps/s $\color{#35bf28}+3.17\%$
test_compile_add_one_nested[tensordict-eager] 0.6244ms 0.1888ms 5.2960 KOps/s 5.3708 KOps/s $\color{#d91a1a}-1.39\%$
test_compile_add_one_nested[pytree-compile] 0.6133ms 0.1070ms 9.3496 KOps/s 9.1915 KOps/s $\color{#35bf28}+1.72\%$
test_compile_add_one_nested[pytree-eager] 0.6004ms 0.1753ms 5.7036 KOps/s 5.6842 KOps/s $\color{#35bf28}+0.34\%$
test_compile_copy_nested[tensordict-compile] 0.1071ms 26.9186μs 37.1490 KOps/s 32.4276 KOps/s $\textbf{\color{#35bf28}+14.56\%}$
test_compile_copy_nested[tensordict-eager] 81.5550μs 49.3067μs 20.2812 KOps/s 20.0443 KOps/s $\color{#35bf28}+1.18\%$
test_compile_copy_nested[pytree-compile] 37.3320μs 9.3439μs 107.0218 KOps/s 103.5485 KOps/s $\color{#35bf28}+3.35\%$
test_compile_copy_nested[pytree-eager] 0.4606ms 66.0763μs 15.1340 KOps/s 15.1210 KOps/s $\color{#35bf28}+0.09\%$
test_compile_add_one_flat[tensordict-compile] 0.2133ms 0.1742ms 5.7416 KOps/s 5.3773 KOps/s $\textbf{\color{#35bf28}+6.77\%}$
test_compile_add_one_flat[tensordict-eager] 0.3252ms 0.2505ms 3.9926 KOps/s 3.9419 KOps/s $\color{#35bf28}+1.29\%$
test_compile_add_one_flat[tensorclass-compile] 0.1587ms 0.1149ms 8.7068 KOps/s 8.3708 KOps/s $\color{#35bf28}+4.01\%$
test_compile_add_one_flat[tensorclass-eager] 0.1156ms 68.5670μs 14.5843 KOps/s 14.1236 KOps/s $\color{#35bf28}+3.26\%$
test_compile_add_one_flat[pytree-compile] 0.2076ms 0.1555ms 6.4313 KOps/s 6.0737 KOps/s $\textbf{\color{#35bf28}+5.89\%}$
test_compile_add_one_flat[pytree-eager] 0.7890ms 0.5194ms 1.9253 KOps/s 1.8689 KOps/s $\color{#35bf28}+3.01\%$
test_compile_add_self_flat[tensordict-eager] 0.3490ms 0.3027ms 3.3039 KOps/s 3.2571 KOps/s $\color{#35bf28}+1.44\%$
test_compile_add_self_flat[tensordict-compile] 0.2422ms 0.1768ms 5.6545 KOps/s 5.3892 KOps/s $\color{#35bf28}+4.92\%$
test_compile_add_self_flat[tensorclass-eager] 0.1281ms 84.0233μs 11.9015 KOps/s 11.8348 KOps/s $\color{#35bf28}+0.56\%$
test_compile_add_self_flat[tensorclass-compile] 0.1635ms 0.1175ms 8.5087 KOps/s 8.2985 KOps/s $\color{#35bf28}+2.53\%$
test_compile_add_self_flat[pytree-eager] 0.6507ms 0.4388ms 2.2790 KOps/s 2.3061 KOps/s $\color{#d91a1a}-1.18\%$
test_compile_add_self_flat[pytree-compile] 0.1966ms 0.1554ms 6.4364 KOps/s 6.3256 KOps/s $\color{#35bf28}+1.75\%$
test_compile_copy_flat[tensordict-compile] 50.2930μs 22.2471μs 44.9498 KOps/s 40.9568 KOps/s $\textbf{\color{#35bf28}+9.75\%}$
test_compile_copy_flat[tensordict-eager] 66.4340μs 40.0775μs 24.9516 KOps/s 25.4097 KOps/s $\color{#d91a1a}-1.80\%$
test_compile_copy_flat[pytree-compile] 46.9820μs 10.2531μs 97.5314 KOps/s 95.1442 KOps/s $\color{#35bf28}+2.51\%$
test_compile_copy_flat[pytree-eager] 0.3909ms 50.8320μs 19.6727 KOps/s 19.7468 KOps/s $\color{#d91a1a}-0.38\%$
test_compile_assign_and_add[tensordict-compile] 1.9244ms 0.1688ms 5.9225 KOps/s 5.7534 KOps/s $\color{#35bf28}+2.94\%$
test_compile_assign_and_add[tensordict-eager] 3.3052ms 3.2381ms 308.8233 Ops/s 310.3973 Ops/s $\color{#d91a1a}-0.51\%$
test_compile_assign_and_add[pytree-compile] 1.8839ms 0.1572ms 6.3611 KOps/s 6.1903 KOps/s $\color{#35bf28}+2.76\%$
test_compile_assign_and_add[pytree-eager] 2.8819ms 2.7512ms 363.4764 Ops/s 340.6210 Ops/s $\textbf{\color{#35bf28}+6.71\%}$
test_compile_indexing[tensor-tensordict-compile] 0.1488ms 0.1076ms 9.2919 KOps/s 8.9735 KOps/s $\color{#35bf28}+3.55\%$
test_compile_indexing[tensor-tensordict-eager] 0.3109ms 71.3070μs 14.0239 KOps/s 13.9946 KOps/s $\color{#35bf28}+0.21\%$
test_compile_indexing[tensor-tensorclass-compile] 0.1387ms 93.9590μs 10.6429 KOps/s 10.2392 KOps/s $\color{#35bf28}+3.94\%$
test_compile_indexing[tensor-tensorclass-eager] 0.2496ms 44.5448μs 22.4493 KOps/s 22.1174 KOps/s $\color{#35bf28}+1.50\%$
test_compile_indexing[tensor-pytree-compile] 0.1345ms 95.0368μs 10.5222 KOps/s 10.4696 KOps/s $\color{#35bf28}+0.50\%$
test_compile_indexing[tensor-pytree-eager] 0.2736ms 44.5678μs 22.4377 KOps/s 21.3667 KOps/s $\textbf{\color{#35bf28}+5.01\%}$
test_compile_indexing[slice-tensordict-compile] 92.7250μs 54.6991μs 18.2818 KOps/s 17.4196 KOps/s $\color{#35bf28}+4.95\%$
test_compile_indexing[slice-tensordict-eager] 0.2262ms 26.7034μs 37.4484 KOps/s 37.3752 KOps/s $\color{#35bf28}+0.20\%$
test_compile_indexing[slice-tensorclass-compile] 84.0650μs 44.1688μs 22.6404 KOps/s 21.8045 KOps/s $\color{#35bf28}+3.83\%$
test_compile_indexing[slice-tensorclass-eager] 0.2560ms 21.5948μs 46.3074 KOps/s 45.8312 KOps/s $\color{#35bf28}+1.04\%$
test_compile_indexing[slice-pytree-compile] 81.8540μs 44.2772μs 22.5850 KOps/s 22.1603 KOps/s $\color{#35bf28}+1.92\%$
test_compile_indexing[slice-pytree-eager] 0.2552ms 21.4938μs 46.5251 KOps/s 45.8079 KOps/s $\color{#35bf28}+1.57\%$
test_compile_indexing[int-tensordict-compile] 94.7450μs 55.5385μs 18.0055 KOps/s 17.2799 KOps/s $\color{#35bf28}+4.20\%$
test_compile_indexing[int-tensordict-eager] 0.2554ms 26.4233μs 37.8454 KOps/s 38.4350 KOps/s $\color{#d91a1a}-1.53\%$
test_compile_indexing[int-tensorclass-compile] 80.6450μs 44.1220μs 22.6644 KOps/s 22.3507 KOps/s $\color{#35bf28}+1.40\%$
test_compile_indexing[int-tensorclass-eager] 0.2635ms 21.7073μs 46.0675 KOps/s 45.8761 KOps/s $\color{#35bf28}+0.42\%$
test_compile_indexing[int-pytree-compile] 82.0650μs 44.3379μs 22.5541 KOps/s 21.8061 KOps/s $\color{#35bf28}+3.43\%$
test_compile_indexing[int-pytree-eager] 0.2760ms 21.5017μs 46.5078 KOps/s 46.0939 KOps/s $\color{#35bf28}+0.90\%$
test_mod_add[eager] 0.1069ms 48.5403μs 20.6014 KOps/s 19.5326 KOps/s $\textbf{\color{#35bf28}+5.47\%}$
test_mod_add[compile] 0.1496ms 0.1019ms 9.8113 KOps/s 9.3356 KOps/s $\textbf{\color{#35bf28}+5.10\%}$
test_mod_add[compile-overhead] 0.2280ms 0.1450ms 6.8981 KOps/s 6.7452 KOps/s $\color{#35bf28}+2.27\%$
test_mod_wrap[eager] 0.3931ms 0.2984ms 3.3513 KOps/s 3.3444 KOps/s $\color{#35bf28}+0.21\%$
test_mod_wrap[compile] 0.4744ms 0.3400ms 2.9411 KOps/s 2.9223 KOps/s $\color{#35bf28}+0.64\%$
test_mod_wrap[compile-overhead] 7.3953ms 4.0360ms 247.7685 Ops/s 250.2542 Ops/s $\color{#d91a1a}-0.99\%$
test_mod_wrap_and_backward[eager] 2.3350ms 1.5527ms 644.0543 Ops/s 677.0766 Ops/s $\color{#d91a1a}-4.88\%$
test_mod_wrap_and_backward[compile] 1.7531ms 1.4193ms 704.5646 Ops/s 700.7280 Ops/s $\color{#35bf28}+0.55\%$
test_mod_wrap_and_backward[compile-overhead] 1.2097ms 0.8646ms 1.1566 KOps/s 1.1390 KOps/s $\color{#35bf28}+1.54\%$
test_seq_add[eager] 0.2295ms 0.1615ms 6.1916 KOps/s 6.3711 KOps/s $\color{#d91a1a}-2.82\%$
test_seq_add[compile] 0.1879ms 0.1153ms 8.6730 KOps/s 8.2262 KOps/s $\textbf{\color{#35bf28}+5.43\%}$
test_seq_add[compile-overhead] 0.1981ms 0.1513ms 6.6095 KOps/s 6.4419 KOps/s $\color{#35bf28}+2.60\%$
test_seq_wrap[eager] 0.5784ms 0.5117ms 1.9541 KOps/s 1.9894 KOps/s $\color{#d91a1a}-1.78\%$
test_seq_wrap[compile] 0.3970ms 0.3592ms 2.7841 KOps/s 2.7823 KOps/s $\color{#35bf28}+0.06\%$
test_seq_wrap[compile-overhead] 0.3260ms 0.2613ms 3.8273 KOps/s 3.8114 KOps/s $\color{#35bf28}+0.42\%$
test_func_call_runtime[False-eager] 0.9353ms 0.8290ms 1.2063 KOps/s 1.2170 KOps/s $\color{#d91a1a}-0.88\%$
test_func_call_runtime[False-compile] 0.9319ms 0.8866ms 1.1279 KOps/s 1.0629 KOps/s $\textbf{\color{#35bf28}+6.11\%}$
test_func_call_runtime[False-compile-overhead] 0.6591ms 0.4487ms 2.2287 KOps/s 2.2420 KOps/s $\color{#d91a1a}-0.59\%$
test_func_call_runtime[True-eager] 1.3419ms 1.0804ms 925.6152 Ops/s 949.9384 Ops/s $\color{#d91a1a}-2.56\%$
test_func_call_runtime[True-compile] 0.9923ms 0.8982ms 1.1134 KOps/s 1.1121 KOps/s $\color{#35bf28}+0.11\%$
test_func_call_runtime[True-compile-overhead] 0.5198ms 0.4593ms 2.1773 KOps/s 2.1774 KOps/s $-0.01\%$
test_func_call_cm_runtime[False-eager] 0.8868ms 0.8311ms 1.2032 KOps/s 1.1372 KOps/s $\textbf{\color{#35bf28}+5.81\%}$
test_func_call_cm_runtime[False-compile] 0.9869ms 0.8945ms 1.1179 KOps/s 1.1222 KOps/s $\color{#d91a1a}-0.38\%$
test_func_call_cm_runtime[False-compile-overhead] 0.5114ms 0.4447ms 2.2485 KOps/s 2.2288 KOps/s $\color{#35bf28}+0.89\%$
test_func_call_cm_runtime[True-eager] 1.3058ms 1.2038ms 830.7028 Ops/s 831.7958 Ops/s $\color{#d91a1a}-0.13\%$
test_func_call_cm_runtime[True-compile] 1.1236ms 0.9518ms 1.0507 KOps/s 1.0691 KOps/s $\color{#d91a1a}-1.72\%$
test_func_call_cm_runtime[True-compile-overhead] 0.6095ms 0.4888ms 2.0457 KOps/s 2.0221 KOps/s $\color{#35bf28}+1.17\%$
test_vmap_func_call_cm_runtime[eager] 3.1016ms 2.3702ms 421.9128 Ops/s 429.0025 Ops/s $\color{#d91a1a}-1.65\%$
test_vmap_func_call_cm_runtime[compile] 1.0224ms 0.9498ms 1.0528 KOps/s 1.0556 KOps/s $\color{#d91a1a}-0.26\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.5745ms 0.4945ms 2.0223 KOps/s 2.0005 KOps/s $\color{#35bf28}+1.09\%$
test_distributed 0.7112ms 0.1515ms 6.6019 KOps/s 6.6239 KOps/s $\color{#d91a1a}-0.33\%$
test_tdmodule 0.3337ms 27.5319μs 36.3216 KOps/s 35.8630 KOps/s $\color{#35bf28}+1.28\%$
test_tdmodule_dispatch 75.1050μs 44.5478μs 22.4478 KOps/s 22.4168 KOps/s $\color{#35bf28}+0.14\%$
test_tdseq 47.9430μs 27.0499μs 36.9687 KOps/s 37.6567 KOps/s $\color{#d91a1a}-1.83\%$
test_tdseq_dispatch 67.0040μs 46.7413μs 21.3944 KOps/s 21.2810 KOps/s $\color{#35bf28}+0.53\%$
test_instantiation_functorch 2.2118ms 1.9791ms 505.2786 Ops/s 510.5523 Ops/s $\color{#d91a1a}-1.03\%$
test_exec_functorch 0.2199ms 0.1745ms 5.7306 KOps/s 5.6423 KOps/s $\color{#35bf28}+1.57\%$
test_exec_functional_call 0.2058ms 0.1583ms 6.3168 KOps/s 6.2913 KOps/s $\color{#35bf28}+0.41\%$
test_exec_td_decorator 0.4451ms 0.2290ms 4.3660 KOps/s 4.3790 KOps/s $\color{#d91a1a}-0.30\%$
test_vmap_mlp_speed_decorator[True-True] 1.0039ms 0.8155ms 1.2262 KOps/s 1.2410 KOps/s $\color{#d91a1a}-1.19\%$
test_vmap_mlp_speed_decorator[True-False] 0.9887ms 0.8127ms 1.2305 KOps/s 1.2071 KOps/s $\color{#35bf28}+1.93\%$
test_vmap_mlp_speed_decorator[False-True] 0.8898ms 0.7211ms 1.3867 KOps/s 1.4166 KOps/s $\color{#d91a1a}-2.11\%$
test_vmap_mlp_speed_decorator[False-False] 0.9162ms 0.7108ms 1.4069 KOps/s 1.4083 KOps/s $\color{#d91a1a}-0.10\%$
test_vmap_transformer_speed_decorator[True-True] 21.2442ms 20.3329ms 49.1813 Ops/s 49.4106 Ops/s $\color{#d91a1a}-0.46\%$
test_vmap_transformer_speed_decorator[True-False] 21.0559ms 20.3297ms 49.1892 Ops/s 49.2606 Ops/s $\color{#d91a1a}-0.14\%$
test_vmap_transformer_speed_decorator[False-True] 20.2393ms 20.1439ms 49.6429 Ops/s 49.7491 Ops/s $\color{#d91a1a}-0.21\%$
test_vmap_transformer_speed_decorator[False-False] 20.4260ms 20.1054ms 49.7380 Ops/s 49.7222 Ops/s $\color{#35bf28}+0.03\%$
test_to_module_speed[True] 1.4969ms 1.4198ms 704.3187 Ops/s 710.0716 Ops/s $\color{#d91a1a}-0.81\%$
test_to_module_speed[False] 1.4938ms 1.3942ms 717.2412 Ops/s 729.6974 Ops/s $\color{#d91a1a}-1.71\%$
test_tc_init 96.2950μs 44.0550μs 22.6989 KOps/s 22.6334 KOps/s $\color{#35bf28}+0.29\%$
test_tc_init_tensor_only 35.5720μs 9.4144μs 106.2204 KOps/s 107.4214 KOps/s $\color{#d91a1a}-1.12\%$
test_tc_init_nested 0.1327ms 86.4906μs 11.5620 KOps/s 11.4163 KOps/s $\color{#35bf28}+1.28\%$
test_tc_init_many_fields 61.5940μs 15.6338μs 63.9638 KOps/s 63.8747 KOps/s $\color{#35bf28}+0.14\%$
test_tc_first_layer_tensor 20.4910μs 1.7314μs 577.5658 KOps/s 567.7724 KOps/s $\color{#35bf28}+1.72\%$
test_tc_first_layer_tensor_only 5.1603μs 0.7019μs 1.4248 MOps/s 1.3930 MOps/s $\color{#35bf28}+2.28\%$
test_tc_first_layer_tensor_set 29.3020μs 3.7248μs 268.4693 KOps/s 267.3582 KOps/s $\color{#35bf28}+0.42\%$
test_tc_first_layer_tensor_only_set 25.0020μs 2.9996μs 333.3797 KOps/s 331.7403 KOps/s $\color{#35bf28}+0.49\%$
test_tc_first_layer_nontensor 31.1020μs 5.8800μs 170.0667 KOps/s 172.7371 KOps/s $\color{#d91a1a}-1.55\%$
test_tc_second_layer_tensor 33.0020μs 4.2019μs 237.9882 KOps/s 240.5104 KOps/s $\color{#d91a1a}-1.05\%$
test_tc_second_layer_nontensor 37.8820μs 8.2966μs 120.5318 KOps/s 123.0094 KOps/s $\color{#d91a1a}-2.01\%$
test_unbind 0.2430s 17.1602ms 58.2745 Ops/s 57.0724 Ops/s $\color{#35bf28}+2.11\%$
test_full_like 4.8257ms 4.3821ms 228.1985 Ops/s 73.5949 Ops/s $\textbf{\color{#35bf28}+210.07\%}$
test_zeros_like 7.6051ms 4.3880ms 227.8950 Ops/s 73.8555 Ops/s $\textbf{\color{#35bf28}+208.57\%}$
test_ones_like 4.4952ms 4.3870ms 227.9481 Ops/s 73.7064 Ops/s $\textbf{\color{#35bf28}+209.26\%}$
test_clone 6.9797ms 6.6413ms 150.5732 Ops/s 66.4622 Ops/s $\textbf{\color{#35bf28}+126.55\%}$
test_squeeze 0.2048ms 14.4443μs 69.2315 KOps/s 72.7994 KOps/s $\color{#d91a1a}-4.90\%$
test_unsqueeze 0.1717ms 0.1134ms 8.8195 KOps/s 9.1978 KOps/s $\color{#d91a1a}-4.11\%$
test_split 0.2602ms 0.1857ms 5.3854 KOps/s 5.4502 KOps/s $\color{#d91a1a}-1.19\%$
test_permute 0.2847ms 0.2076ms 4.8175 KOps/s 5.0115 KOps/s $\color{#d91a1a}-3.87\%$
test_stack 52.0777ms 51.5276ms 19.4071 Ops/s 19.3701 Ops/s $\color{#35bf28}+0.19\%$
test_cat 52.0952ms 51.7178ms 19.3357 Ops/s 19.3748 Ops/s $\color{#d91a1a}-0.20\%$

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Feb 16, 2026

$\color{#D29922}\textsf{\Large&amp;#x26A0;\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 243. Improved: $\large\color{#35bf28}24$. Worsened: $\large\color{#d91a1a}6$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 39.0600μs 14.2166μs 70.3403 KOps/s 70.7641 KOps/s $\color{#d91a1a}-0.60\%$
test_plain_set_stack_nested 35.9000μs 14.7375μs 67.8542 KOps/s 70.0937 KOps/s $\color{#d91a1a}-3.20\%$
test_plain_set_nested_inplace 45.9710μs 16.0765μs 62.2024 KOps/s 63.2244 KOps/s $\color{#d91a1a}-1.62\%$
test_plain_set_stack_nested_inplace 47.4410μs 15.8831μs 62.9600 KOps/s 63.9427 KOps/s $\color{#d91a1a}-1.54\%$
test_items 36.6410μs 5.5635μs 179.7430 KOps/s 184.0275 KOps/s $\color{#d91a1a}-2.33\%$
test_items_nested 0.5310ms 0.4454ms 2.2453 KOps/s 2.2974 KOps/s $\color{#d91a1a}-2.27\%$
test_items_nested_locked 0.5197ms 0.4471ms 2.2366 KOps/s 2.2678 KOps/s $\color{#d91a1a}-1.38\%$
test_items_nested_leaf 0.1334ms 91.9320μs 10.8776 KOps/s 10.8973 KOps/s $\color{#d91a1a}-0.18\%$
test_items_stack_nested 0.5125ms 0.4462ms 2.2410 KOps/s 2.2885 KOps/s $\color{#d91a1a}-2.07\%$
test_items_stack_nested_leaf 0.1307ms 92.7252μs 10.7846 KOps/s 10.8924 KOps/s $\color{#d91a1a}-0.99\%$
test_items_stack_nested_locked 0.5215ms 0.4498ms 2.2230 KOps/s 2.2709 KOps/s $\color{#d91a1a}-2.11\%$
test_keys 29.9400μs 4.1416μs 241.4531 KOps/s 241.3472 KOps/s $\color{#35bf28}+0.04\%$
test_keys_nested 0.1653ms 0.1200ms 8.3348 KOps/s 8.6154 KOps/s $\color{#d91a1a}-3.26\%$
test_keys_nested_locked 0.6222ms 0.1282ms 7.7976 KOps/s 7.9738 KOps/s $\color{#d91a1a}-2.21\%$
test_keys_nested_leaf 0.1539ms 0.1100ms 9.0915 KOps/s 9.3478 KOps/s $\color{#d91a1a}-2.74\%$
test_keys_stack_nested 0.1723ms 0.1202ms 8.3211 KOps/s 8.5863 KOps/s $\color{#d91a1a}-3.09\%$
test_keys_stack_nested_leaf 0.1994ms 0.1092ms 9.1556 KOps/s 9.3639 KOps/s $\color{#d91a1a}-2.22\%$
test_keys_stack_nested_locked 0.1759ms 0.1271ms 7.8689 KOps/s 7.9958 KOps/s $\color{#d91a1a}-1.59\%$
test_values 6.6662μs 1.0025μs 997.4911 KOps/s 1.0024 MOps/s $\color{#d91a1a}-0.49\%$
test_values_nested 96.1720μs 47.0412μs 21.2580 KOps/s 21.6882 KOps/s $\color{#d91a1a}-1.98\%$
test_values_nested_locked 94.9320μs 50.0234μs 19.9906 KOps/s 20.0559 KOps/s $\color{#d91a1a}-0.33\%$
test_values_nested_leaf 0.1309ms 52.9994μs 18.8682 KOps/s 19.1661 KOps/s $\color{#d91a1a}-1.55\%$
test_values_stack_nested 81.1220μs 46.7152μs 21.4063 KOps/s 21.5998 KOps/s $\color{#d91a1a}-0.90\%$
test_values_stack_nested_leaf 0.1107ms 53.7329μs 18.6106 KOps/s 19.2423 KOps/s $\color{#d91a1a}-3.28\%$
test_values_stack_nested_locked 0.1039ms 50.3239μs 19.8713 KOps/s 20.1424 KOps/s $\color{#d91a1a}-1.35\%$
test_membership 18.0910μs 0.9306μs 1.0746 MOps/s 1.2240 MOps/s $\textbf{\color{#d91a1a}-12.21\%}$
test_membership_nested 38.2910μs 2.7216μs 367.4351 KOps/s 362.1986 KOps/s $\color{#35bf28}+1.45\%$
test_membership_nested_leaf 37.4910μs 2.7455μs 364.2276 KOps/s 372.2336 KOps/s $\color{#d91a1a}-2.15\%$
test_membership_stacked_nested 31.6910μs 2.7619μs 362.0674 KOps/s 360.5059 KOps/s $\color{#35bf28}+0.43\%$
test_membership_stacked_nested_leaf 34.6310μs 2.7645μs 361.7265 KOps/s 362.5720 KOps/s $\color{#d91a1a}-0.23\%$
test_membership_nested_last 40.1310μs 4.1431μs 241.3660 KOps/s 241.8741 KOps/s $\color{#d91a1a}-0.21\%$
test_membership_nested_leaf_last 67.4310μs 4.0790μs 245.1577 KOps/s 243.3582 KOps/s $\color{#35bf28}+0.74\%$
test_membership_stacked_nested_last 33.9600μs 4.1495μs 240.9921 KOps/s 243.5863 KOps/s $\color{#d91a1a}-1.06\%$
test_membership_stacked_nested_leaf_last 32.3600μs 4.0713μs 245.6206 KOps/s 241.4491 KOps/s $\color{#35bf28}+1.73\%$
test_nested_getleaf 56.6910μs 20.5394μs 48.6869 KOps/s 47.9955 KOps/s $\color{#35bf28}+1.44\%$
test_nested_get 53.7610μs 18.9257μs 52.8382 KOps/s 50.7886 KOps/s $\color{#35bf28}+4.04\%$
test_stacked_getleaf 56.2610μs 20.4595μs 48.8770 KOps/s 48.6143 KOps/s $\color{#35bf28}+0.54\%$
test_stacked_get 48.1510μs 19.3513μs 51.6760 KOps/s 51.0617 KOps/s $\color{#35bf28}+1.20\%$
test_nested_getitemleaf 54.9510μs 20.9285μs 47.7817 KOps/s 47.5349 KOps/s $\color{#35bf28}+0.52\%$
test_nested_getitem 67.4310μs 19.4991μs 51.2844 KOps/s 49.6580 KOps/s $\color{#35bf28}+3.28\%$
test_stacked_getitemleaf 55.8310μs 21.0248μs 47.5628 KOps/s 47.5217 KOps/s $\color{#35bf28}+0.09\%$
test_stacked_getitem 68.4710μs 19.9166μs 50.2093 KOps/s 49.5984 KOps/s $\color{#35bf28}+1.23\%$
test_lock_nested 8.2480ms 0.4585ms 2.1811 KOps/s 2.1614 KOps/s $\color{#35bf28}+0.91\%$
test_lock_stack_nested 0.5518ms 0.4591ms 2.1782 KOps/s 2.1566 KOps/s $\color{#35bf28}+1.00\%$
test_unlock_nested 0.7058ms 0.3668ms 2.7261 KOps/s 2.7175 KOps/s $\color{#35bf28}+0.32\%$
test_unlock_stack_nested 0.5048ms 0.3710ms 2.6952 KOps/s 2.6760 KOps/s $\color{#35bf28}+0.72\%$
test_flatten_speed 0.2695ms 0.1153ms 8.6715 KOps/s 8.5136 KOps/s $\color{#35bf28}+1.85\%$
test_unflatten_speed 0.6109ms 0.5468ms 1.8287 KOps/s 1.8281 KOps/s $\color{#35bf28}+0.03\%$
test_common_ops 0.7928ms 0.6589ms 1.5176 KOps/s 1.4231 KOps/s $\textbf{\color{#35bf28}+6.64\%}$
test_creation 71.6910μs 2.7421μs 364.6899 KOps/s 364.7682 KOps/s $\color{#d91a1a}-0.02\%$
test_creation_empty 33.7810μs 5.7643μs 173.4829 KOps/s 174.4172 KOps/s $\color{#d91a1a}-0.54\%$
test_creation_nested_1 44.7510μs 9.9507μs 100.4953 KOps/s 100.8636 KOps/s $\color{#d91a1a}-0.37\%$
test_creation_nested_2 46.2110μs 11.1050μs 90.0499 KOps/s 89.8189 KOps/s $\color{#35bf28}+0.26\%$
test_creation_many_keys[10] 58.8010μs 16.9923μs 58.8503 KOps/s 59.2977 KOps/s $\color{#d91a1a}-0.75\%$
test_creation_many_keys[50] 0.1056ms 72.8090μs 13.7346 KOps/s 13.9732 KOps/s $\color{#d91a1a}-1.71\%$
test_creation_many_keys[100] 0.2010ms 0.1369ms 7.3026 KOps/s 7.1235 KOps/s $\color{#35bf28}+2.51\%$
test_creation_nested_many_keys[10] 78.5910μs 36.8536μs 27.1344 KOps/s 27.4960 KOps/s $\color{#d91a1a}-1.32\%$
test_creation_nested_many_keys[50] 0.2035ms 0.1490ms 6.7135 KOps/s 6.7784 KOps/s $\color{#d91a1a}-0.96\%$
test_clone 63.5410μs 12.8129μs 78.0462 KOps/s 77.3519 KOps/s $\color{#35bf28}+0.90\%$
test_getitem[int] 1.7111ms 14.1395μs 70.7239 KOps/s 55.8188 KOps/s $\textbf{\color{#35bf28}+26.70\%}$
test_getitem[slice_int] 0.1938ms 23.8482μs 41.9318 KOps/s 42.5531 KOps/s $\color{#d91a1a}-1.46\%$
test_getitem[range] 0.2037ms 59.5374μs 16.7962 KOps/s 16.4190 KOps/s $\color{#35bf28}+2.30\%$
test_getitem[tuple] 0.2455ms 22.7430μs 43.9696 KOps/s 42.7556 KOps/s $\color{#35bf28}+2.84\%$
test_getitem[list] 0.2071ms 55.5175μs 18.0123 KOps/s 16.6047 KOps/s $\textbf{\color{#35bf28}+8.48\%}$
test_setitem_dim[int] 43.8110μs 24.8536μs 40.2356 KOps/s 36.8024 KOps/s $\textbf{\color{#35bf28}+9.33\%}$
test_setitem_dim[slice_int] 65.3610μs 42.4331μs 23.5665 KOps/s 22.4107 KOps/s $\textbf{\color{#35bf28}+5.16\%}$
test_setitem_dim[range] 0.1313ms 90.0593μs 11.1038 KOps/s 10.9109 KOps/s $\color{#35bf28}+1.77\%$
test_setitem_dim[tuple] 58.4010μs 38.6158μs 25.8961 KOps/s 24.2947 KOps/s $\textbf{\color{#35bf28}+6.59\%}$
test_setitem 59.4510μs 16.5192μs 60.5358 KOps/s 52.2149 KOps/s $\textbf{\color{#35bf28}+15.94\%}$
test_set 55.1610μs 15.9037μs 62.8783 KOps/s 56.7109 KOps/s $\textbf{\color{#35bf28}+10.88\%}$
test_set_shared 0.5527ms 0.2026ms 4.9366 KOps/s 4.6995 KOps/s $\textbf{\color{#35bf28}+5.05\%}$
test_update 0.3547ms 20.7309μs 48.2371 KOps/s 43.7645 KOps/s $\textbf{\color{#35bf28}+10.22\%}$
test_update_nested 78.9120μs 31.9585μs 31.2906 KOps/s 28.8728 KOps/s $\textbf{\color{#35bf28}+8.37\%}$
test_update__nested 0.5284ms 33.3361μs 29.9975 KOps/s 29.8044 KOps/s $\color{#35bf28}+0.65\%$
test_set_nested 85.3710μs 17.7638μs 56.2942 KOps/s 49.5968 KOps/s $\textbf{\color{#35bf28}+13.50\%}$
test_set_nested_new 68.0310μs 23.2031μs 43.0976 KOps/s 39.2797 KOps/s $\textbf{\color{#35bf28}+9.72\%}$
test_select 0.1088ms 39.8834μs 25.0731 KOps/s 23.3423 KOps/s $\textbf{\color{#35bf28}+7.41\%}$
test_select_nested 0.1089ms 70.3482μs 14.2150 KOps/s 14.1073 KOps/s $\color{#35bf28}+0.76\%$
test_exclude_nested 0.1217ms 86.8919μs 11.5085 KOps/s 11.8183 KOps/s $\color{#d91a1a}-2.62\%$
test_empty[True] 0.7841ms 0.3770ms 2.6524 KOps/s 2.7035 KOps/s $\color{#d91a1a}-1.89\%$
test_empty[False] 9.7603μs 1.2693μs 787.8492 KOps/s 800.1885 KOps/s $\color{#d91a1a}-1.54\%$
test_to 0.1010ms 69.1413μs 14.4631 KOps/s 13.8873 KOps/s $\color{#35bf28}+4.15\%$
test_to_nonblocking 0.2182ms 64.1616μs 15.5856 KOps/s 16.0661 KOps/s $\color{#d91a1a}-2.99\%$
test_unbind_speed 0.4373ms 0.3110ms 3.2150 KOps/s 3.1957 KOps/s $\color{#35bf28}+0.60\%$
test_unbind_speed_stack0 0.3707ms 0.3097ms 3.2292 KOps/s 3.2401 KOps/s $\color{#d91a1a}-0.34\%$
test_unbind_speed_stack1 0.1025s 0.8865ms 1.1280 KOps/s 1.1273 KOps/s $\color{#35bf28}+0.07\%$
test_split 1.1797ms 1.0986ms 910.2843 Ops/s 923.1877 Ops/s $\color{#d91a1a}-1.40\%$
test_chunk 0.1029s 1.1723ms 853.0100 Ops/s 783.1420 Ops/s $\textbf{\color{#35bf28}+8.92\%}$
test_to_cpu_blocking 28.2570ms 28.0575ms 35.6411 Ops/s 35.4599 Ops/s $\color{#35bf28}+0.51\%$
test_to_cpu_global_sync 11.0551ms 10.9448ms 91.3674 Ops/s 91.7974 Ops/s $\color{#d91a1a}-0.47\%$
test_to_cpu_event_sync 0.1144s 13.1801ms 75.8717 Ops/s 83.9700 Ops/s $\textbf{\color{#d91a1a}-9.64\%}$
test_to_cpu_default 12.2633ms 11.9533ms 83.6586 Ops/s 75.6447 Ops/s $\textbf{\color{#35bf28}+10.59\%}$
test_consolidate[False-None] 4.0979ms 3.9319ms 254.3296 Ops/s 251.4071 Ops/s $\color{#35bf28}+1.16\%$
test_consolidate[default-None] 2.3312ms 1.9253ms 519.3936 Ops/s 503.1363 Ops/s $\color{#35bf28}+3.23\%$
test_consolidate[reduce-overhead-None] 1.9841ms 1.8556ms 538.9126 Ops/s 522.7708 Ops/s $\color{#35bf28}+3.09\%$
test_consolidate_njt[False-None] 8.3678ms 8.1812ms 122.2310 Ops/s 121.1437 Ops/s $\color{#35bf28}+0.90\%$
test_to[False-False-None] 2.1855ms 2.0023ms 499.4241 Ops/s 488.1182 Ops/s $\color{#35bf28}+2.32\%$
test_to[True-False-None] 2.0597ms 1.8794ms 532.0744 Ops/s 525.2560 Ops/s $\color{#35bf28}+1.30\%$
test_to[within-False-None] 6.0681ms 5.9280ms 168.6901 Ops/s 167.1774 Ops/s $\color{#35bf28}+0.90\%$
test_to[True-default-None] 0.1760s 8.5753ms 116.6136 Ops/s 131.0533 Ops/s $\textbf{\color{#d91a1a}-11.02\%}$
test_to_njt[False-False-None] 8.4461ms 8.3291ms 120.0606 Ops/s 119.7455 Ops/s $\color{#35bf28}+0.26\%$
test_to_njt[True-False-None] 6.9066ms 6.7809ms 147.4722 Ops/s 147.3904 Ops/s $\color{#35bf28}+0.06\%$
test_to_njt[within-False-None] 15.3354ms 15.0876ms 66.2797 Ops/s 65.8415 Ops/s $\color{#35bf28}+0.67\%$
test_creation[device0] 0.2864ms 0.1140ms 8.7720 KOps/s 8.6597 KOps/s $\color{#35bf28}+1.30\%$
test_creation_from_tensor 0.4007ms 0.1121ms 8.9208 KOps/s 8.7670 KOps/s $\color{#35bf28}+1.75\%$
test_add_one[memmap_tensor0] 0.2191ms 6.3129μs 158.4058 KOps/s 153.8127 KOps/s $\color{#35bf28}+2.99\%$
test_contiguous[memmap_tensor0] 13.8700μs 0.5958μs 1.6785 MOps/s 2.2997 MOps/s $\textbf{\color{#d91a1a}-27.01\%}$
test_stack[memmap_tensor0] 87.6110μs 4.3416μs 230.3324 KOps/s 216.1956 KOps/s $\textbf{\color{#35bf28}+6.54\%}$
test_memmaptd_index 1.0963ms 0.2517ms 3.9724 KOps/s 3.9188 KOps/s $\color{#35bf28}+1.37\%$
test_memmaptd_index_astensor 0.7672ms 0.3418ms 2.9260 KOps/s 2.9117 KOps/s $\color{#35bf28}+0.49\%$
test_memmaptd_index_op 1.0082ms 0.5743ms 1.7412 KOps/s 1.6836 KOps/s $\color{#35bf28}+3.42\%$
test_serialize_model 0.1389s 0.1362s 7.3428 Ops/s 7.3033 Ops/s $\color{#35bf28}+0.54\%$
test_serialize_model_pickle 1.8803s 1.3754s 0.7270 Ops/s 0.8265 Ops/s $\textbf{\color{#d91a1a}-12.04\%}$
test_serialize_weights 0.1374s 0.1357s 7.3675 Ops/s 7.4459 Ops/s $\color{#d91a1a}-1.05\%$
test_serialize_weights_returnearly 0.3927s 89.3600ms 11.1907 Ops/s 10.8933 Ops/s $\color{#35bf28}+2.73\%$
test_serialize_weights_pickle 1.3675s 1.1980s 0.8347 Ops/s 0.8228 Ops/s $\color{#35bf28}+1.44\%$
test_reshape_pytree 0.2150ms 32.1711μs 31.0838 KOps/s 31.6166 KOps/s $\color{#d91a1a}-1.69\%$
test_reshape_td 0.1620ms 43.6420μs 22.9137 KOps/s 23.4757 KOps/s $\color{#d91a1a}-2.39\%$
test_view_pytree 0.2293ms 31.2462μs 32.0039 KOps/s 31.8227 KOps/s $\color{#35bf28}+0.57\%$
test_view_td 94.8410μs 49.5472μs 20.1828 KOps/s 19.9872 KOps/s $\color{#35bf28}+0.98\%$
test_unbind_pytree 0.2388ms 35.5454μs 28.1331 KOps/s 28.6575 KOps/s $\color{#d91a1a}-1.83\%$
test_unbind_td 0.1083ms 46.7531μs 21.3890 KOps/s 21.6590 KOps/s $\color{#d91a1a}-1.25\%$
test_split_pytree 0.2185ms 40.3446μs 24.7865 KOps/s 25.0005 KOps/s $\color{#d91a1a}-0.86\%$
test_split_td 0.1835ms 62.4153μs 16.0217 KOps/s 16.2064 KOps/s $\color{#d91a1a}-1.14\%$
test_add_pytree 0.1926ms 40.5479μs 24.6622 KOps/s 24.7956 KOps/s $\color{#d91a1a}-0.54\%$
test_add_td 0.2041ms 50.0554μs 19.9779 KOps/s 19.2763 KOps/s $\color{#35bf28}+3.64\%$
test_compile_add_one_nested[tensordict-compile] 0.1878ms 0.1353ms 7.3916 KOps/s 7.1890 KOps/s $\color{#35bf28}+2.82\%$
test_compile_add_one_nested[tensordict-eager] 0.4883ms 0.1888ms 5.2969 KOps/s 5.4047 KOps/s $\color{#d91a1a}-2.00\%$
test_compile_add_one_nested[pytree-compile] 0.1596ms 0.1063ms 9.4094 KOps/s 9.2349 KOps/s $\color{#35bf28}+1.89\%$
test_compile_add_one_nested[pytree-eager] 0.4332ms 0.1748ms 5.7221 KOps/s 5.6905 KOps/s $\color{#35bf28}+0.55\%$
test_compile_copy_nested[tensordict-compile] 89.1820μs 30.8034μs 32.4639 KOps/s 33.2705 KOps/s $\color{#d91a1a}-2.42\%$
test_compile_copy_nested[tensordict-eager] 82.5820μs 49.6635μs 20.1355 KOps/s 20.1165 KOps/s $\color{#35bf28}+0.09\%$
test_compile_copy_nested[pytree-compile] 44.9710μs 9.4944μs 105.3251 KOps/s 104.6554 KOps/s $\color{#35bf28}+0.64\%$
test_compile_copy_nested[pytree-eager] 0.4612ms 65.8688μs 15.1817 KOps/s 15.3888 KOps/s $\color{#d91a1a}-1.35\%$
test_compile_add_one_flat[tensordict-compile] 0.2424ms 0.1727ms 5.7897 KOps/s 3.7195 KOps/s $\textbf{\color{#35bf28}+55.66\%}$
test_compile_add_one_flat[tensordict-eager] 0.3456ms 0.2505ms 3.9917 KOps/s 3.9066 KOps/s $\color{#35bf28}+2.18\%$
test_compile_add_one_flat[tensorclass-compile] 0.1915ms 0.1127ms 8.8753 KOps/s 8.4071 KOps/s $\textbf{\color{#35bf28}+5.57\%}$
test_compile_add_one_flat[tensorclass-eager] 0.1637ms 67.6665μs 14.7784 KOps/s 14.6184 KOps/s $\color{#35bf28}+1.09\%$
test_compile_add_one_flat[pytree-compile] 0.1997ms 0.1559ms 6.4151 KOps/s 6.1705 KOps/s $\color{#35bf28}+3.96\%$
test_compile_add_one_flat[pytree-eager] 0.8061ms 0.5160ms 1.9379 KOps/s 1.8732 KOps/s $\color{#35bf28}+3.46\%$
test_compile_add_self_flat[tensordict-eager] 0.4669ms 0.3036ms 3.2937 KOps/s 3.2313 KOps/s $\color{#35bf28}+1.93\%$
test_compile_add_self_flat[tensordict-compile] 0.3231ms 0.1777ms 5.6288 KOps/s 5.3611 KOps/s $\color{#35bf28}+4.99\%$
test_compile_add_self_flat[tensorclass-eager] 0.1270ms 83.0115μs 12.0465 KOps/s 11.9420 KOps/s $\color{#35bf28}+0.88\%$
test_compile_add_self_flat[tensorclass-compile] 0.1614ms 0.1159ms 8.6271 KOps/s 8.3257 KOps/s $\color{#35bf28}+3.62\%$
test_compile_add_self_flat[pytree-eager] 0.6390ms 0.4238ms 2.3594 KOps/s 2.2469 KOps/s $\textbf{\color{#35bf28}+5.01\%}$
test_compile_add_self_flat[pytree-compile] 0.3078ms 0.1575ms 6.3493 KOps/s 6.3362 KOps/s $\color{#35bf28}+0.21\%$
test_compile_copy_flat[tensordict-compile] 0.1446ms 24.1745μs 41.3659 KOps/s 40.8725 KOps/s $\color{#35bf28}+1.21\%$
test_compile_copy_flat[tensordict-eager] 83.0620μs 40.4502μs 24.7218 KOps/s 25.1112 KOps/s $\color{#d91a1a}-1.55\%$
test_compile_copy_flat[pytree-compile] 44.6810μs 10.4272μs 95.9030 KOps/s 95.6917 KOps/s $\color{#35bf28}+0.22\%$
test_compile_copy_flat[pytree-eager] 0.4132ms 51.3126μs 19.4884 KOps/s 19.4970 KOps/s $\color{#d91a1a}-0.04\%$
test_compile_assign_and_add[tensordict-compile] 1.9465ms 0.1686ms 5.9325 KOps/s 5.7717 KOps/s $\color{#35bf28}+2.79\%$
test_compile_assign_and_add[tensordict-eager] 3.3166ms 3.2268ms 309.9053 Ops/s 307.7682 Ops/s $\color{#35bf28}+0.69\%$
test_compile_assign_and_add[pytree-compile] 1.9317ms 0.1569ms 6.3748 KOps/s 6.2523 KOps/s $\color{#35bf28}+1.96\%$
test_compile_assign_and_add[pytree-eager] 2.8836ms 2.7538ms 363.1330 Ops/s 343.1867 Ops/s $\textbf{\color{#35bf28}+5.81\%}$
test_compile_indexing[tensor-tensordict-compile] 0.1427ms 0.1049ms 9.5359 KOps/s 9.2796 KOps/s $\color{#35bf28}+2.76\%$
test_compile_indexing[tensor-tensordict-eager] 0.3068ms 73.8455μs 13.5418 KOps/s 13.2200 KOps/s $\color{#35bf28}+2.43\%$
test_compile_indexing[tensor-tensorclass-compile] 0.1435ms 93.1412μs 10.7364 KOps/s 10.5678 KOps/s $\color{#35bf28}+1.60\%$
test_compile_indexing[tensor-tensorclass-eager] 0.2649ms 43.7299μs 22.8676 KOps/s 22.6228 KOps/s $\color{#35bf28}+1.08\%$
test_compile_indexing[tensor-pytree-compile] 0.1294ms 93.3928μs 10.7075 KOps/s 10.4530 KOps/s $\color{#35bf28}+2.43\%$
test_compile_indexing[tensor-pytree-eager] 0.2488ms 45.8396μs 21.8152 KOps/s 22.5805 KOps/s $\color{#d91a1a}-3.39\%$
test_compile_indexing[slice-tensordict-compile] 0.1551ms 56.6657μs 17.6474 KOps/s 17.8835 KOps/s $\color{#d91a1a}-1.32\%$
test_compile_indexing[slice-tensordict-eager] 0.2110ms 26.9079μs 37.1638 KOps/s 36.4170 KOps/s $\color{#35bf28}+2.05\%$
test_compile_indexing[slice-tensorclass-compile] 0.1760ms 44.7038μs 22.3695 KOps/s 22.1174 KOps/s $\color{#35bf28}+1.14\%$
test_compile_indexing[slice-tensorclass-eager] 0.2583ms 21.8515μs 45.7633 KOps/s 46.0050 KOps/s $\color{#d91a1a}-0.53\%$
test_compile_indexing[slice-pytree-compile] 85.2420μs 44.3579μs 22.5439 KOps/s 22.2674 KOps/s $\color{#35bf28}+1.24\%$
test_compile_indexing[slice-pytree-eager] 0.2603ms 21.5609μs 46.3803 KOps/s 45.9606 KOps/s $\color{#35bf28}+0.91\%$
test_compile_indexing[int-tensordict-compile] 0.1701ms 55.9667μs 17.8678 KOps/s 18.0007 KOps/s $\color{#d91a1a}-0.74\%$
test_compile_indexing[int-tensordict-eager] 0.2503ms 26.5312μs 37.6915 KOps/s 37.3566 KOps/s $\color{#35bf28}+0.90\%$
test_compile_indexing[int-tensorclass-compile] 84.7820μs 44.0286μs 22.7125 KOps/s 21.7658 KOps/s $\color{#35bf28}+4.35\%$
test_compile_indexing[int-tensorclass-eager] 0.2586ms 21.7088μs 46.0643 KOps/s 45.8554 KOps/s $\color{#35bf28}+0.46\%$
test_compile_indexing[int-pytree-compile] 0.1173ms 44.7329μs 22.3549 KOps/s 22.0467 KOps/s $\color{#35bf28}+1.40\%$
test_compile_indexing[int-pytree-eager] 0.2629ms 21.5707μs 46.3591 KOps/s 45.6702 KOps/s $\color{#35bf28}+1.51\%$
test_mod_add[eager] 0.1966ms 49.2430μs 20.3074 KOps/s 20.3599 KOps/s $\color{#d91a1a}-0.26\%$
test_mod_add[compile] 0.2219ms 0.1009ms 9.9132 KOps/s 9.5363 KOps/s $\color{#35bf28}+3.95\%$
test_mod_add[compile-overhead] 0.2301ms 0.1444ms 6.9249 KOps/s 6.8508 KOps/s $\color{#35bf28}+1.08\%$
test_mod_wrap[eager] 0.4396ms 0.2939ms 3.4022 KOps/s 3.4457 KOps/s $\color{#d91a1a}-1.26\%$
test_mod_wrap[compile] 0.4224ms 0.3376ms 2.9623 KOps/s 2.8449 KOps/s $\color{#35bf28}+4.13\%$
test_mod_wrap[compile-overhead] 6.7072ms 3.6869ms 271.2316 Ops/s 253.0873 Ops/s $\textbf{\color{#35bf28}+7.17\%}$
test_mod_wrap_and_backward[eager] 1.6892ms 1.4875ms 672.2851 Ops/s 670.6033 Ops/s $\color{#35bf28}+0.25\%$
test_mod_wrap_and_backward[compile] 1.6164ms 1.4187ms 704.8871 Ops/s 695.3365 Ops/s $\color{#35bf28}+1.37\%$
test_mod_wrap_and_backward[compile-overhead] 1.6950ms 0.8720ms 1.1468 KOps/s 1.1240 KOps/s $\color{#35bf28}+2.03\%$
test_seq_add[eager] 0.2136ms 0.1511ms 6.6160 KOps/s 6.5076 KOps/s $\color{#35bf28}+1.67\%$
test_seq_add[compile] 0.2630ms 0.1161ms 8.6123 KOps/s 8.6970 KOps/s $\color{#d91a1a}-0.97\%$
test_seq_add[compile-overhead] 0.2830ms 0.1511ms 6.6191 KOps/s 6.4851 KOps/s $\color{#35bf28}+2.07\%$
test_seq_wrap[eager] 0.8063ms 0.5196ms 1.9246 KOps/s 1.8787 KOps/s $\color{#35bf28}+2.44\%$
test_seq_wrap[compile] 0.4420ms 0.3708ms 2.6968 KOps/s 2.6714 KOps/s $\color{#35bf28}+0.95\%$
test_seq_wrap[compile-overhead] 0.3759ms 0.2582ms 3.8732 KOps/s 3.8191 KOps/s $\color{#35bf28}+1.42\%$
test_func_call_runtime[False-eager] 0.8819ms 0.8181ms 1.2223 KOps/s 1.1991 KOps/s $\color{#35bf28}+1.94\%$
test_func_call_runtime[False-compile] 1.0857ms 0.8817ms 1.1342 KOps/s 1.1100 KOps/s $\color{#35bf28}+2.18\%$
test_func_call_runtime[False-compile-overhead] 0.4972ms 0.4408ms 2.2684 KOps/s 2.2454 KOps/s $\color{#35bf28}+1.02\%$
test_func_call_runtime[True-eager] 1.2692ms 1.0726ms 932.3468 Ops/s 931.7976 Ops/s $\color{#35bf28}+0.06\%$
test_func_call_runtime[True-compile] 0.9639ms 0.8934ms 1.1193 KOps/s 1.0999 KOps/s $\color{#35bf28}+1.76\%$
test_func_call_runtime[True-compile-overhead] 0.5088ms 0.4561ms 2.1924 KOps/s 2.1704 KOps/s $\color{#35bf28}+1.01\%$
test_func_call_cm_runtime[False-eager] 0.8837ms 0.8188ms 1.2213 KOps/s 1.1526 KOps/s $\textbf{\color{#35bf28}+5.96\%}$
test_func_call_cm_runtime[False-compile] 1.1224ms 0.8858ms 1.1290 KOps/s 1.0904 KOps/s $\color{#35bf28}+3.54\%$
test_func_call_cm_runtime[False-compile-overhead] 0.5481ms 0.4449ms 2.2476 KOps/s 2.2259 KOps/s $\color{#35bf28}+0.98\%$
test_func_call_cm_runtime[True-eager] 1.3124ms 1.1950ms 836.8100 Ops/s 821.3976 Ops/s $\color{#35bf28}+1.88\%$
test_func_call_cm_runtime[True-compile] 1.0991ms 0.9254ms 1.0806 KOps/s 1.0306 KOps/s $\color{#35bf28}+4.85\%$
test_func_call_cm_runtime[True-compile-overhead] 0.6345ms 0.4873ms 2.0523 KOps/s 2.0317 KOps/s $\color{#35bf28}+1.02\%$
test_vmap_func_call_cm_runtime[eager] 2.8246ms 2.3020ms 434.3993 Ops/s 424.3272 Ops/s $\color{#35bf28}+2.37\%$
test_vmap_func_call_cm_runtime[compile] 1.0089ms 0.9418ms 1.0618 KOps/s 1.0412 KOps/s $\color{#35bf28}+1.98\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.5705ms 0.4934ms 2.0268 KOps/s 1.9985 KOps/s $\color{#35bf28}+1.42\%$
test_distributed 0.5652ms 0.1516ms 6.5970 KOps/s 6.3745 KOps/s $\color{#35bf28}+3.49\%$
test_tdmodule 53.7210μs 27.4960μs 36.3689 KOps/s 35.6232 KOps/s $\color{#35bf28}+2.09\%$
test_tdmodule_dispatch 75.2520μs 44.2864μs 22.5803 KOps/s 22.3335 KOps/s $\color{#35bf28}+1.11\%$
test_tdseq 0.1441ms 27.5425μs 36.3075 KOps/s 38.0364 KOps/s $\color{#d91a1a}-4.55\%$
test_tdseq_dispatch 67.3410μs 47.4045μs 21.0950 KOps/s 21.4530 KOps/s $\color{#d91a1a}-1.67\%$
test_instantiation_functorch 2.0650ms 1.9476ms 513.4548 Ops/s 499.1947 Ops/s $\color{#35bf28}+2.86\%$
test_exec_functorch 0.2269ms 0.1727ms 5.7891 KOps/s 5.6000 KOps/s $\color{#35bf28}+3.38\%$
test_exec_functional_call 0.2075ms 0.1546ms 6.4673 KOps/s 6.2661 KOps/s $\color{#35bf28}+3.21\%$
test_exec_td_decorator 0.4331ms 0.2267ms 4.4109 KOps/s 4.2955 KOps/s $\color{#35bf28}+2.69\%$
test_vmap_mlp_speed_decorator[True-True] 0.9998ms 0.8042ms 1.2434 KOps/s 1.2246 KOps/s $\color{#35bf28}+1.54\%$
test_vmap_mlp_speed_decorator[True-False] 0.9856ms 0.8080ms 1.2377 KOps/s 1.2146 KOps/s $\color{#35bf28}+1.90\%$
test_vmap_mlp_speed_decorator[False-True] 0.8688ms 0.6972ms 1.4344 KOps/s 1.3989 KOps/s $\color{#35bf28}+2.53\%$
test_vmap_mlp_speed_decorator[False-False] 0.8746ms 0.6981ms 1.4325 KOps/s 1.3924 KOps/s $\color{#35bf28}+2.88\%$
test_vmap_transformer_speed_decorator[True-True] 20.3068ms 20.1310ms 49.6745 Ops/s 48.4874 Ops/s $\color{#35bf28}+2.45\%$
test_vmap_transformer_speed_decorator[True-False] 20.8677ms 20.1686ms 49.5819 Ops/s 48.7305 Ops/s $\color{#35bf28}+1.75\%$
test_vmap_transformer_speed_decorator[False-True] 20.3886ms 19.9515ms 50.1216 Ops/s 49.1416 Ops/s $\color{#35bf28}+1.99\%$
test_vmap_transformer_speed_decorator[False-False] 20.6232ms 19.9686ms 50.0787 Ops/s 49.2046 Ops/s $\color{#35bf28}+1.78\%$
test_to_module_speed[True] 2.0026ms 1.4114ms 708.5092 Ops/s 710.1866 Ops/s $\color{#d91a1a}-0.24\%$
test_to_module_speed[False] 1.9015ms 1.3784ms 725.4721 Ops/s 708.2282 Ops/s $\color{#35bf28}+2.43\%$
test_tc_init 87.6910μs 43.8329μs 22.8139 KOps/s 22.7025 KOps/s $\color{#35bf28}+0.49\%$
test_tc_init_tensor_only 31.4300μs 9.3070μs 107.4456 KOps/s 107.0924 KOps/s $\color{#35bf28}+0.33\%$
test_tc_init_nested 0.1280ms 87.5069μs 11.4277 KOps/s 11.3360 KOps/s $\color{#35bf28}+0.81\%$
test_tc_init_many_fields 61.3410μs 15.4993μs 64.5191 KOps/s 62.6369 KOps/s $\color{#35bf28}+3.00\%$
test_tc_first_layer_tensor 22.5200μs 1.7320μs 577.3576 KOps/s 574.8230 KOps/s $\color{#35bf28}+0.44\%$
test_tc_first_layer_tensor_only 4.4101μs 0.6968μs 1.4352 MOps/s 1.3920 MOps/s $\color{#35bf28}+3.10\%$
test_tc_first_layer_tensor_set 23.7510μs 3.7153μs 269.1589 KOps/s 268.3424 KOps/s $\color{#35bf28}+0.30\%$
test_tc_first_layer_tensor_only_set 28.5300μs 2.9776μs 335.8398 KOps/s 334.2563 KOps/s $\color{#35bf28}+0.47\%$
test_tc_first_layer_nontensor 46.0010μs 5.7485μs 173.9590 KOps/s 171.9706 KOps/s $\color{#35bf28}+1.16\%$
test_tc_second_layer_tensor 20.7500μs 4.1198μs 242.7316 KOps/s 240.6608 KOps/s $\color{#35bf28}+0.86\%$
test_tc_second_layer_nontensor 29.3300μs 8.2446μs 121.2921 KOps/s 120.5393 KOps/s $\color{#35bf28}+0.62\%$
test_unbind 0.2681s 16.0338ms 62.3682 Ops/s 57.1747 Ops/s $\textbf{\color{#35bf28}+9.08\%}$
test_full_like 4.4953ms 4.3613ms 229.2898 Ops/s 234.9400 Ops/s $\color{#d91a1a}-2.40\%$
test_zeros_like 6.2779ms 4.3680ms 228.9398 Ops/s 237.0518 Ops/s $\color{#d91a1a}-3.42\%$
test_ones_like 4.8173ms 4.3537ms 229.6883 Ops/s 229.6175 Ops/s $\color{#35bf28}+0.03\%$
test_clone 11.5841ms 9.1876ms 108.8421 Ops/s 157.2682 Ops/s $\textbf{\color{#d91a1a}-30.79\%}$
test_squeeze 0.1591ms 13.4189μs 74.5217 KOps/s 72.2869 KOps/s $\color{#35bf28}+3.09\%$
test_unsqueeze 0.2223ms 0.1062ms 9.4155 KOps/s 9.1380 KOps/s $\color{#35bf28}+3.04\%$
test_split 0.3502ms 0.1781ms 5.6144 KOps/s 5.5747 KOps/s $\color{#35bf28}+0.71\%$
test_permute 0.2479ms 0.1988ms 5.0290 KOps/s 4.9965 KOps/s $\color{#35bf28}+0.65\%$
test_stack 51.6337ms 51.2214ms 19.5231 Ops/s 19.6919 Ops/s $\color{#d91a1a}-0.86\%$
test_cat 51.4338ms 51.1567ms 19.5478 Ops/s 19.5975 Ops/s $\color{#d91a1a}-0.25\%$

vmoens and others added 2 commits February 17, 2026 09:20
…ixes

- RedisLazyStackedTensorDict[int] now returns a _RedisStackElementView
  that propagates reads/writes directly through to Redis instead of
  returning a detached TensorDict copy
- Fix redundant await-in-loop in _abatch_get_element, _abatch_get_element_keys
  and _aset_element (reuse already-fetched metadata instead of re-fetching)
- Add 5 new tests for view write-through behavior

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@vmoens
Copy link
Copy Markdown
Collaborator Author

vmoens commented Feb 17, 2026

Closing in favour of ghstack submission

@vmoens vmoens closed this Feb 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant