Skip to content

[Feature] Per-leaf device/dtype casting via an attrs tensordict#1678

Merged
vmoens merged 1 commit intomainfrom
td-update-device
Apr 22, 2026
Merged

[Feature] Per-leaf device/dtype casting via an attrs tensordict#1678
vmoens merged 1 commit intomainfrom
td-update-device

Conversation

@vmoens
Copy link
Copy Markdown
Collaborator

@vmoens vmoens commented Apr 22, 2026

Summary

  • Adds TensorAttrs, a TensorClass recording per-leaf tgt_device / tgt_dtype / tgt_shape (names prefixed with tgt_ to avoid shadowing the tensordict's own device/dtype/shape). Matches PyTorch's "Tensor Attributes" vocabulary.
  • Adds TensorDictBase.attrs(fields=("device", "dtype", "shape"), num_threads=None) — returns a deviceless tensordict whose leaves are TensorAttrs. Scales cleanly: new attributes can be appended to fields without proliferating boolean kwargs.
  • Extends TensorDictBase.to() so a positional tensordict argument is interpreted as a per-leaf spec and dispatched via a new _to_per_leaf() walker. Missing keys pass through unchanged.

Motivation: a deviceless tensordict may hold leaves on heterogeneous devices. Today td.to(other_td) picks a single device; there was no way to say "move each leaf to the device its counterpart lives on." Now:

td0.device                # None
td0["a"].device           # cuda:0
td0["b"].device           # cpu
td2["a"].device           # cpu
td2["b"].device           # cuda:1
td3 = td0.to(td2.attrs())      # td3["a"] on cpu, td3["b"] on cuda:1

Async / sync semantics

  • Default (non_blocking=None): per-leaf copies issued async; one _sync_all() at the end if any leaf actually crossed devices. Stricter than core to(device)'s D2H-only sync guard because a single per-leaf call can span mixed directions (cuda:0 → cpu, cpu → cuda:1, cuda:0 → cuda:1) that Torch won't coordinate for you.
  • non_blocking=True: fully async, no trailing sync. Caller's responsibility.
  • non_blocking=False: blocking copies.
  • Dtype-only specs never sync (no device window to close).

Threading

num_threads= is now accepted on both attrs() and the per-leaf to() path; it plumbs through to _fast_apply. Defaults to single-threaded because the CPU-only path is GIL-bound — local microbenchmarks showed num_threads=4 is 3–4× slower than single-threaded on CPU. Callers opt in when they know they have real device transfers (D2H/H2D/D2D cross-device) to overlap, which is where _fast_apply's thread pool earns its keep.

Deferred

  • non_blocking_pin=True on the per-leaf path — still raises NotImplementedError pending a follow-up that mirrors the multithreaded pin-memory walker from _to_cuda_with_pin_mem.
  • Consolidated-tensordict fast path for attrs-TD casts — falls back to the per-leaf walker for now.
  • TensorAttrs construction is ~17 µs/leaf (dominated by three NonTensorData wraps). For very wide tensordicts this adds up on attrs(); a lightweight storage path is a reasonable follow-up.

Test plan

  • New tests in test/test_tensordict.py: attrs() basics, field-subset, nested structures, per-leaf dtype cast, missing-key passthrough, roundtrip, extra-positional rejection, non_blocking_pin rejection, sync-toggle monkeypatch, and a CUDA-gated heterogeneous-device scenario.
  • pytest test/test_tensordict.py — 7611 passed, 856 skipped.
  • pytest test/test_tensorclass.py — 143 passed, 1 skipped (streaming not installed).
  • CUDA-specific heterogeneous-device and threaded-transfer tests require a GPU runner.

🤖 Generated with Claude Code

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 22, 2026
@github-actions github-actions Bot added documentation Improvements or additions to documentation CI Test tensorclass Feature New feature Compile torch.compile related labels Apr 22, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 22, 2026

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 261. Improved: $\large\color{#35bf28}19$. Worsened: $\large\color{#d91a1a}16$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 30.4000μs 14.5112μs 68.9124 KOps/s 67.4541 KOps/s $\color{#35bf28}+2.16\%$
test_plain_set_stack_nested 46.1010μs 14.5986μs 68.4996 KOps/s 68.5048 KOps/s $-0.01\%$
test_plain_set_nested_inplace 50.9910μs 16.2171μs 61.6635 KOps/s 61.2384 KOps/s $\color{#35bf28}+0.69\%$
test_plain_set_stack_nested_inplace 77.2310μs 15.5032μs 64.5027 KOps/s 62.2711 KOps/s $\color{#35bf28}+3.58\%$
test_items 22.8610μs 5.4583μs 183.2075 KOps/s 183.8782 KOps/s $\color{#d91a1a}-0.36\%$
test_items_nested 0.5333ms 0.4481ms 2.2318 KOps/s 2.2301 KOps/s $\color{#35bf28}+0.07\%$
test_items_nested_locked 0.5391ms 0.4538ms 2.2037 KOps/s 2.2258 KOps/s $\color{#d91a1a}-1.00\%$
test_items_nested_leaf 0.1255ms 92.1751μs 10.8489 KOps/s 10.7168 KOps/s $\color{#35bf28}+1.23\%$
test_items_stack_nested 0.5512ms 0.4498ms 2.2230 KOps/s 2.2589 KOps/s $\color{#d91a1a}-1.59\%$
test_items_stack_nested_leaf 0.1324ms 92.1550μs 10.8513 KOps/s 10.7914 KOps/s $\color{#35bf28}+0.55\%$
test_items_stack_nested_locked 0.5396ms 0.4486ms 2.2292 KOps/s 2.2327 KOps/s $\color{#d91a1a}-0.16\%$
test_keys 30.3410μs 4.1318μs 242.0240 KOps/s 243.8970 KOps/s $\color{#d91a1a}-0.77\%$
test_keys_nested 0.2120ms 0.1263ms 7.9158 KOps/s 7.8350 KOps/s $\color{#35bf28}+1.03\%$
test_keys_nested_locked 2.3325ms 0.1348ms 7.4202 KOps/s 7.3012 KOps/s $\color{#35bf28}+1.63\%$
test_keys_nested_leaf 0.1534ms 0.1171ms 8.5429 KOps/s 8.4337 KOps/s $\color{#35bf28}+1.30\%$
test_keys_stack_nested 0.1771ms 0.1274ms 7.8500 KOps/s 7.8144 KOps/s $\color{#35bf28}+0.46\%$
test_keys_stack_nested_leaf 0.1765ms 0.1170ms 8.5449 KOps/s 8.4406 KOps/s $\color{#35bf28}+1.24\%$
test_keys_stack_nested_locked 0.2105ms 0.1348ms 7.4170 KOps/s 7.3660 KOps/s $\color{#35bf28}+0.69\%$
test_values 6.7542μs 1.0027μs 997.2598 KOps/s 998.7781 KOps/s $\color{#d91a1a}-0.15\%$
test_values_nested 85.2610μs 50.9512μs 19.6266 KOps/s 19.3964 KOps/s $\color{#35bf28}+1.19\%$
test_values_nested_locked 82.2510μs 53.3267μs 18.7523 KOps/s 18.4737 KOps/s $\color{#35bf28}+1.51\%$
test_values_nested_leaf 84.6410μs 57.8152μs 17.2965 KOps/s 16.9285 KOps/s $\color{#35bf28}+2.17\%$
test_values_stack_nested 83.3110μs 50.3705μs 19.8529 KOps/s 19.5025 KOps/s $\color{#35bf28}+1.80\%$
test_values_stack_nested_leaf 95.3720μs 58.0873μs 17.2155 KOps/s 16.9291 KOps/s $\color{#35bf28}+1.69\%$
test_values_stack_nested_locked 84.1110μs 54.1461μs 18.4686 KOps/s 18.3939 KOps/s $\color{#35bf28}+0.41\%$
test_membership 5.1852μs 0.7913μs 1.2637 MOps/s 1.0819 MOps/s $\textbf{\color{#35bf28}+16.81\%}$
test_membership_nested 23.9100μs 2.6824μs 372.7996 KOps/s 368.0186 KOps/s $\color{#35bf28}+1.30\%$
test_membership_nested_leaf 15.5800μs 2.5960μs 385.2077 KOps/s 382.7969 KOps/s $\color{#35bf28}+0.63\%$
test_membership_stacked_nested 24.6400μs 2.6631μs 375.4960 KOps/s 368.5481 KOps/s $\color{#35bf28}+1.89\%$
test_membership_stacked_nested_leaf 25.4710μs 2.6973μs 370.7438 KOps/s 370.9850 KOps/s $\color{#d91a1a}-0.07\%$
test_membership_nested_last 35.0500μs 4.0753μs 245.3779 KOps/s 244.2707 KOps/s $\color{#35bf28}+0.45\%$
test_membership_nested_leaf_last 31.7110μs 4.0567μs 246.5059 KOps/s 243.6923 KOps/s $\color{#35bf28}+1.15\%$
test_membership_stacked_nested_last 26.3910μs 4.1021μs 243.7796 KOps/s 244.2593 KOps/s $\color{#d91a1a}-0.20\%$
test_membership_stacked_nested_leaf_last 38.4610μs 4.0298μs 248.1531 KOps/s 245.9264 KOps/s $\color{#35bf28}+0.91\%$
test_nested_getleaf 50.4600μs 20.4245μs 48.9607 KOps/s 48.7673 KOps/s $\color{#35bf28}+0.40\%$
test_nested_get 48.5810μs 19.2582μs 51.9259 KOps/s 50.8732 KOps/s $\color{#35bf28}+2.07\%$
test_stacked_getleaf 51.4310μs 20.2222μs 49.4505 KOps/s 48.4592 KOps/s $\color{#35bf28}+2.05\%$
test_stacked_get 50.6110μs 19.4233μs 51.4845 KOps/s 51.3102 KOps/s $\color{#35bf28}+0.34\%$
test_nested_getitemleaf 55.1610μs 20.9052μs 47.8350 KOps/s 47.7028 KOps/s $\color{#35bf28}+0.28\%$
test_nested_getitem 50.4000μs 19.9503μs 50.1245 KOps/s 50.4153 KOps/s $\color{#d91a1a}-0.58\%$
test_stacked_getitemleaf 44.8110μs 21.2437μs 47.0727 KOps/s 47.6541 KOps/s $\color{#d91a1a}-1.22\%$
test_stacked_getitem 47.4410μs 20.0763μs 49.8099 KOps/s 49.4999 KOps/s $\color{#35bf28}+0.63\%$
test_lock_nested 4.6325ms 0.4644ms 2.1532 KOps/s 2.1914 KOps/s $\color{#d91a1a}-1.75\%$
test_lock_stack_nested 0.5415ms 0.4645ms 2.1529 KOps/s 2.1572 KOps/s $\color{#d91a1a}-0.20\%$
test_unlock_nested 0.4742ms 0.3786ms 2.6415 KOps/s 2.6650 KOps/s $\color{#d91a1a}-0.88\%$
test_unlock_stack_nested 0.4620ms 0.3784ms 2.6425 KOps/s 2.6280 KOps/s $\color{#35bf28}+0.55\%$
test_flatten_speed 0.1686ms 0.1140ms 8.7687 KOps/s 8.6428 KOps/s $\color{#35bf28}+1.46\%$
test_unflatten_speed 0.6387ms 0.5432ms 1.8411 KOps/s 1.8208 KOps/s $\color{#35bf28}+1.11\%$
test_common_ops 0.8049ms 0.6733ms 1.4852 KOps/s 1.4675 KOps/s $\color{#35bf28}+1.21\%$
test_creation 0.1127ms 2.9404μs 340.0892 KOps/s 339.5890 KOps/s $\color{#35bf28}+0.15\%$
test_creation_empty 37.9510μs 6.5908μs 151.7257 KOps/s 150.8327 KOps/s $\color{#35bf28}+0.59\%$
test_creation_nested_1 38.6810μs 10.9258μs 91.5269 KOps/s 90.8728 KOps/s $\color{#35bf28}+0.72\%$
test_creation_nested_2 35.9400μs 12.6478μs 79.0652 KOps/s 78.4327 KOps/s $\color{#35bf28}+0.81\%$
test_creation_many_keys[10] 64.5010μs 19.3088μs 51.7900 KOps/s 50.4365 KOps/s $\color{#35bf28}+2.68\%$
test_creation_many_keys[50] 0.1411ms 84.4460μs 11.8419 KOps/s 11.8174 KOps/s $\color{#35bf28}+0.21\%$
test_creation_many_keys[100] 0.2273ms 0.1663ms 6.0143 KOps/s 6.0275 KOps/s $\color{#d91a1a}-0.22\%$
test_creation_nested_many_keys[10] 77.1910μs 42.1310μs 23.7355 KOps/s 23.5625 KOps/s $\color{#35bf28}+0.73\%$
test_creation_nested_many_keys[50] 0.2398ms 0.1744ms 5.7356 KOps/s 5.8252 KOps/s $\color{#d91a1a}-1.54\%$
test_clone 38.1500μs 12.5745μs 79.5263 KOps/s 78.1974 KOps/s $\color{#35bf28}+1.70\%$
test_getitem[int] 1.6827ms 14.6516μs 68.2517 KOps/s 60.9363 KOps/s $\textbf{\color{#35bf28}+12.01\%}$
test_getitem[slice_int] 0.1384ms 23.5285μs 42.5016 KOps/s 41.5304 KOps/s $\color{#35bf28}+2.34\%$
test_getitem[range] 0.1873ms 61.8198μs 16.1761 KOps/s 16.3050 KOps/s $\color{#d91a1a}-0.79\%$
test_getitem[tuple] 0.1417ms 22.9836μs 43.5092 KOps/s 42.6359 KOps/s $\color{#35bf28}+2.05\%$
test_getitem[list] 0.1847ms 55.5228μs 18.0106 KOps/s 17.9001 KOps/s $\color{#35bf28}+0.62\%$
test_setitem_dim[int] 50.2710μs 24.1599μs 41.3909 KOps/s 40.9322 KOps/s $\color{#35bf28}+1.12\%$
test_setitem_dim[slice_int] 63.2110μs 40.0733μs 24.9543 KOps/s 24.0646 KOps/s $\color{#35bf28}+3.70\%$
test_setitem_dim[range] 0.1147ms 90.5079μs 11.0488 KOps/s 10.9145 KOps/s $\color{#35bf28}+1.23\%$
test_setitem_dim[tuple] 56.9910μs 37.1483μs 26.9191 KOps/s 25.6424 KOps/s $\color{#35bf28}+4.98\%$
test_setitem 44.3100μs 16.9626μs 58.9532 KOps/s 57.7920 KOps/s $\color{#35bf28}+2.01\%$
test_set 44.3300μs 16.1415μs 61.9520 KOps/s 61.0281 KOps/s $\color{#35bf28}+1.51\%$
test_set_shared 0.5440ms 0.2031ms 4.9229 KOps/s 4.9666 KOps/s $\color{#d91a1a}-0.88\%$
test_update 0.3264ms 20.7837μs 48.1146 KOps/s 46.3705 KOps/s $\color{#35bf28}+3.76\%$
test_update_nested 83.4910μs 31.1896μs 32.0620 KOps/s 31.4487 KOps/s $\color{#35bf28}+1.95\%$
test_update__nested 0.4451ms 32.8299μs 30.4601 KOps/s 29.9235 KOps/s $\color{#35bf28}+1.79\%$
test_set_nested 49.6810μs 17.7954μs 56.1944 KOps/s 53.9971 KOps/s $\color{#35bf28}+4.07\%$
test_set_nested_new 55.2810μs 22.8680μs 43.7292 KOps/s 43.4621 KOps/s $\color{#35bf28}+0.61\%$
test_select 0.1187ms 38.6357μs 25.8828 KOps/s 25.3004 KOps/s $\color{#35bf28}+2.30\%$
test_select_nested 0.1158ms 69.9454μs 14.2969 KOps/s 14.2486 KOps/s $\color{#35bf28}+0.34\%$
test_exclude_nested 0.1183ms 86.9815μs 11.4967 KOps/s 11.5211 KOps/s $\color{#d91a1a}-0.21\%$
test_empty[True] 0.4235ms 0.3805ms 2.6282 KOps/s 2.5875 KOps/s $\color{#35bf28}+1.57\%$
test_empty[False] 7.5303μs 1.2357μs 809.2525 KOps/s 805.8969 KOps/s $\color{#35bf28}+0.42\%$
test_to 0.1065ms 74.2178μs 13.4739 KOps/s 13.4636 KOps/s $\color{#35bf28}+0.08\%$
test_to_nonblocking 98.7420μs 67.3282μs 14.8526 KOps/s 14.5727 KOps/s $\color{#35bf28}+1.92\%$
test_unbind_speed 0.3764ms 0.3222ms 3.1039 KOps/s 3.1076 KOps/s $\color{#d91a1a}-0.12\%$
test_unbind_speed_stack0 0.3666ms 0.3176ms 3.1482 KOps/s 3.1463 KOps/s $\color{#35bf28}+0.06\%$
test_unbind_speed_stack1 0.1069s 0.8945ms 1.1179 KOps/s 1.2302 KOps/s $\textbf{\color{#d91a1a}-9.13\%}$
test_split 1.1542ms 1.0819ms 924.2730 Ops/s 821.4981 Ops/s $\textbf{\color{#35bf28}+12.51\%}$
test_chunk 0.1071s 1.1572ms 864.1354 Ops/s 951.6262 Ops/s $\textbf{\color{#d91a1a}-9.19\%}$
test_to_cpu_blocking 19.7954ms 19.3902ms 51.5723 Ops/s 46.3241 Ops/s $\textbf{\color{#35bf28}+11.33\%}$
test_to_cpu_global_sync 12.0771ms 11.7870ms 84.8394 Ops/s 86.0982 Ops/s $\color{#d91a1a}-1.46\%$
test_to_cpu_event_sync 0.1193s 14.1151ms 70.8461 Ops/s 79.3138 Ops/s $\textbf{\color{#d91a1a}-10.68\%}$
test_to_cpu_default 13.0684ms 12.7545ms 78.4039 Ops/s 79.2042 Ops/s $\color{#d91a1a}-1.01\%$
test_consolidate[False-None] 4.0883ms 3.9802ms 251.2434 Ops/s 220.5610 Ops/s $\textbf{\color{#35bf28}+13.91\%}$
test_consolidate[default-None] 2.0245ms 1.9406ms 515.2932 Ops/s 502.5758 Ops/s $\color{#35bf28}+2.53\%$
test_consolidate[reduce-overhead-None] 1.9703ms 1.8801ms 531.8810 Ops/s 519.8041 Ops/s $\color{#35bf28}+2.32\%$
test_consolidate_njt[False-None] 8.7094ms 8.3425ms 119.8685 Ops/s 119.3283 Ops/s $\color{#35bf28}+0.45\%$
test_to[False-False-None] 2.3899ms 2.1610ms 462.7424 Ops/s 467.1988 Ops/s $\color{#d91a1a}-0.95\%$
test_to[True-False-None] 1.9775ms 1.8547ms 539.1669 Ops/s 534.2118 Ops/s $\color{#35bf28}+0.93\%$
test_to[within-False-None] 6.2678ms 5.9550ms 167.9254 Ops/s 167.5918 Ops/s $\color{#35bf28}+0.20\%$
test_to[True-default-None] 0.1879s 10.7288ms 93.2073 Ops/s 105.9481 Ops/s $\textbf{\color{#d91a1a}-12.03\%}$
test_to_njt[False-False-None] 8.6923ms 8.3236ms 120.1405 Ops/s 119.0733 Ops/s $\color{#35bf28}+0.90\%$
test_to_njt[True-False-None] 6.9745ms 6.7731ms 147.6432 Ops/s 146.5535 Ops/s $\color{#35bf28}+0.74\%$
test_to_njt[within-False-None] 15.8959ms 15.0980ms 66.2339 Ops/s 65.8759 Ops/s $\color{#35bf28}+0.54\%$
test_creation[device0] 0.5256ms 0.1134ms 8.8165 KOps/s 8.9576 KOps/s $\color{#d91a1a}-1.57\%$
test_creation_from_tensor 0.4501ms 0.1109ms 9.0201 KOps/s 8.9370 KOps/s $\color{#35bf28}+0.93\%$
test_add_one[memmap_tensor0] 0.3462ms 6.5414μs 152.8720 KOps/s 155.6245 KOps/s $\color{#d91a1a}-1.77\%$
test_contiguous[memmap_tensor0] 10.6900μs 0.5927μs 1.6873 MOps/s 2.3925 MOps/s $\textbf{\color{#d91a1a}-29.47\%}$
test_stack[memmap_tensor0] 68.8010μs 4.4982μs 222.3097 KOps/s 230.0748 KOps/s $\color{#d91a1a}-3.38\%$
test_memmaptd_index 1.0246ms 0.2628ms 3.8049 KOps/s 3.8880 KOps/s $\color{#d91a1a}-2.14\%$
test_memmaptd_index_astensor 0.5050ms 0.3565ms 2.8053 KOps/s 2.8303 KOps/s $\color{#d91a1a}-0.88\%$
test_memmaptd_index_op 0.9956ms 0.6049ms 1.6531 KOps/s 1.6607 KOps/s $\color{#d91a1a}-0.46\%$
test_serialize_model 0.1365s 0.1345s 7.4335 Ops/s 7.4775 Ops/s $\color{#d91a1a}-0.59\%$
test_serialize_model_pickle 1.3478s 1.1927s 0.8385 Ops/s 0.8257 Ops/s $\color{#35bf28}+1.54\%$
test_serialize_weights 0.1363s 0.1344s 7.4411 Ops/s 6.1504 Ops/s $\textbf{\color{#35bf28}+20.99\%}$
test_serialize_weights_returnearly 0.4554s 86.7181ms 11.5316 Ops/s 15.9951 Ops/s $\textbf{\color{#d91a1a}-27.91\%}$
test_serialize_weights_pickle 1.3514s 1.2119s 0.8251 Ops/s 0.8228 Ops/s $\color{#35bf28}+0.28\%$
test_reshape_pytree 0.2045ms 30.9364μs 32.3244 KOps/s 32.4721 KOps/s $\color{#d91a1a}-0.46\%$
test_reshape_td 82.1320μs 42.8793μs 23.3213 KOps/s 22.9904 KOps/s $\color{#35bf28}+1.44\%$
test_view_pytree 0.2144ms 30.4150μs 32.8785 KOps/s 32.5443 KOps/s $\color{#35bf28}+1.03\%$
test_view_td 85.7810μs 50.7659μs 19.6982 KOps/s 19.3235 KOps/s $\color{#35bf28}+1.94\%$
test_unbind_pytree 0.2261ms 34.2319μs 29.2125 KOps/s 28.8789 KOps/s $\color{#35bf28}+1.16\%$
test_unbind_td 0.1776ms 47.5736μs 21.0200 KOps/s 21.1010 KOps/s $\color{#d91a1a}-0.38\%$
test_split_pytree 0.2501ms 40.4660μs 24.7121 KOps/s 24.9444 KOps/s $\color{#d91a1a}-0.93\%$
test_split_td 0.2151ms 61.5954μs 16.2350 KOps/s 15.9687 KOps/s $\color{#35bf28}+1.67\%$
test_add_pytree 0.2252ms 39.7204μs 25.1760 KOps/s 24.7558 KOps/s $\color{#35bf28}+1.70\%$
test_add_td 0.1104ms 54.0950μs 18.4860 KOps/s 18.5304 KOps/s $\color{#d91a1a}-0.24\%$
test_compile_add_one_nested[tensordict-compile] 0.2370ms 0.1533ms 6.5224 KOps/s 6.0591 KOps/s $\textbf{\color{#35bf28}+7.65\%}$
test_compile_add_one_nested[tensordict-eager] 0.3007ms 0.1981ms 5.0486 KOps/s 5.1436 KOps/s $\color{#d91a1a}-1.85\%$
test_compile_add_one_nested[pytree-compile] 0.1842ms 0.1225ms 8.1642 KOps/s 7.8099 KOps/s $\color{#35bf28}+4.54\%$
test_compile_add_one_nested[pytree-eager] 0.4258ms 0.1763ms 5.6719 KOps/s 5.7822 KOps/s $\color{#d91a1a}-1.91\%$
test_compile_copy_nested[tensordict-compile] 0.3293ms 16.6724μs 59.9795 KOps/s 65.5208 KOps/s $\textbf{\color{#d91a1a}-8.46\%}$
test_compile_copy_nested[tensordict-eager] 88.0610μs 51.0293μs 19.5966 KOps/s 19.9790 KOps/s $\color{#d91a1a}-1.91\%$
test_compile_copy_nested[pytree-compile] 0.1229ms 15.4819μs 64.5916 KOps/s 63.9608 KOps/s $\color{#35bf28}+0.99\%$
test_compile_copy_nested[pytree-eager] 0.3585ms 63.3280μs 15.7908 KOps/s 15.7972 KOps/s $\color{#d91a1a}-0.04\%$
test_compile_add_one_flat[tensordict-compile] 0.3317ms 0.1959ms 5.1035 KOps/s 4.8545 KOps/s $\textbf{\color{#35bf28}+5.13\%}$
test_compile_add_one_flat[tensordict-eager] 0.3592ms 0.2719ms 3.6783 KOps/s 3.6702 KOps/s $\color{#35bf28}+0.22\%$
test_compile_add_one_flat[tensorclass-compile] 0.1755ms 0.1330ms 7.5208 KOps/s 7.2756 KOps/s $\color{#35bf28}+3.37\%$
test_compile_add_one_flat[tensorclass-eager] 0.1203ms 75.1498μs 13.3068 KOps/s 13.3816 KOps/s $\color{#d91a1a}-0.56\%$
test_compile_add_one_flat[pytree-compile] 0.5055ms 0.1807ms 5.5339 KOps/s 5.5515 KOps/s $\color{#d91a1a}-0.32\%$
test_compile_add_one_flat[pytree-eager] 0.8193ms 0.5406ms 1.8497 KOps/s 1.9326 KOps/s $\color{#d91a1a}-4.29\%$
test_compile_add_self_flat[tensordict-eager] 0.5422ms 0.3254ms 3.0735 KOps/s 3.0818 KOps/s $\color{#d91a1a}-0.27\%$
test_compile_add_self_flat[tensordict-compile] 0.3129ms 0.1948ms 5.1342 KOps/s 4.6654 KOps/s $\textbf{\color{#35bf28}+10.05\%}$
test_compile_add_self_flat[tensorclass-eager] 0.1340ms 86.6981μs 11.5343 KOps/s 10.9339 KOps/s $\textbf{\color{#35bf28}+5.49\%}$
test_compile_add_self_flat[tensorclass-compile] 0.2752ms 0.1326ms 7.5440 KOps/s 6.9025 KOps/s $\textbf{\color{#35bf28}+9.29\%}$
test_compile_add_self_flat[pytree-eager] 0.1829s 0.5216ms 1.9173 KOps/s 2.3015 KOps/s $\textbf{\color{#d91a1a}-16.69\%}$
test_compile_add_self_flat[pytree-compile] 0.2476ms 0.1747ms 5.7250 KOps/s 5.5675 KOps/s $\color{#35bf28}+2.83\%$
test_compile_copy_flat[tensordict-compile] 50.2110μs 18.5152μs 54.0097 KOps/s 52.0928 KOps/s $\color{#35bf28}+3.68\%$
test_compile_copy_flat[tensordict-eager] 81.1110μs 40.1300μs 24.9190 KOps/s 25.1598 KOps/s $\color{#d91a1a}-0.96\%$
test_compile_copy_flat[pytree-compile] 0.2224ms 15.9525μs 62.6860 KOps/s 61.0192 KOps/s $\color{#35bf28}+2.73\%$
test_compile_copy_flat[pytree-eager] 0.3422ms 51.7444μs 19.3258 KOps/s 19.6201 KOps/s $\color{#d91a1a}-1.50\%$
test_compile_assign_and_add[tensordict-compile] 1.9879ms 0.1838ms 5.4401 KOps/s 5.0605 KOps/s $\textbf{\color{#35bf28}+7.50\%}$
test_compile_assign_and_add[tensordict-eager] 3.6125ms 3.2692ms 305.8806 Ops/s 301.7261 Ops/s $\color{#35bf28}+1.38\%$
test_compile_assign_and_add[pytree-compile] 2.0007ms 0.1718ms 5.8204 KOps/s 5.6547 KOps/s $\color{#35bf28}+2.93\%$
test_compile_assign_and_add[pytree-eager] 2.9712ms 2.8291ms 353.4670 Ops/s 358.5485 Ops/s $\color{#d91a1a}-1.42\%$
test_compile_indexing[tensor-tensordict-compile] 0.2363ms 0.1274ms 7.8467 KOps/s 7.3902 KOps/s $\textbf{\color{#35bf28}+6.18\%}$
test_compile_indexing[tensor-tensordict-eager] 0.3111ms 74.5267μs 13.4180 KOps/s 13.3187 KOps/s $\color{#35bf28}+0.75\%$
test_compile_indexing[tensor-tensorclass-compile] 0.2021ms 0.1114ms 8.9741 KOps/s 8.5329 KOps/s $\textbf{\color{#35bf28}+5.17\%}$
test_compile_indexing[tensor-tensorclass-eager] 0.2659ms 46.6773μs 21.4237 KOps/s 21.9843 KOps/s $\color{#d91a1a}-2.55\%$
test_compile_indexing[tensor-pytree-compile] 0.1908ms 0.1168ms 8.5616 KOps/s 8.8885 KOps/s $\color{#d91a1a}-3.68\%$
test_compile_indexing[tensor-pytree-eager] 0.2953ms 46.6260μs 21.4473 KOps/s 22.9035 KOps/s $\textbf{\color{#d91a1a}-6.36\%}$
test_compile_indexing[slice-tensordict-compile] 0.1305ms 70.1250μs 14.2602 KOps/s 14.2311 KOps/s $\color{#35bf28}+0.20\%$
test_compile_indexing[slice-tensordict-eager] 0.2159ms 26.6305μs 37.5509 KOps/s 37.3907 KOps/s $\color{#35bf28}+0.43\%$
test_compile_indexing[slice-tensorclass-compile] 0.1654ms 57.6589μs 17.3434 KOps/s 17.2774 KOps/s $\color{#35bf28}+0.38\%$
test_compile_indexing[slice-tensorclass-eager] 0.2444ms 21.2437μs 47.0728 KOps/s 47.2413 KOps/s $\color{#d91a1a}-0.36\%$
test_compile_indexing[slice-pytree-compile] 0.1898ms 56.8451μs 17.5917 KOps/s 17.2458 KOps/s $\color{#35bf28}+2.01\%$
test_compile_indexing[slice-pytree-eager] 0.2374ms 21.0320μs 47.5466 KOps/s 47.7389 KOps/s $\color{#d91a1a}-0.40\%$
test_compile_indexing[int-tensordict-compile] 0.1226ms 71.3029μs 14.0247 KOps/s 14.0397 KOps/s $\color{#d91a1a}-0.11\%$
test_compile_indexing[int-tensordict-eager] 0.2407ms 26.7172μs 37.4291 KOps/s 37.7320 KOps/s $\color{#d91a1a}-0.80\%$
test_compile_indexing[int-tensorclass-compile] 0.1810ms 56.0899μs 17.8285 KOps/s 17.3817 KOps/s $\color{#35bf28}+2.57\%$
test_compile_indexing[int-tensorclass-eager] 0.2608ms 21.0219μs 47.5695 KOps/s 47.6077 KOps/s $\color{#d91a1a}-0.08\%$
test_compile_indexing[int-pytree-compile] 0.2300ms 56.3323μs 17.7518 KOps/s 16.4449 KOps/s $\textbf{\color{#35bf28}+7.95\%}$
test_compile_indexing[int-pytree-eager] 0.2422ms 20.9077μs 47.8294 KOps/s 47.8565 KOps/s $\color{#d91a1a}-0.06\%$
test_compile_replace[single-eager] 94.7110μs 47.1859μs 21.1928 KOps/s 20.0728 KOps/s $\textbf{\color{#35bf28}+5.58\%}$
test_compile_replace[single-compile] 0.2089ms 0.1181ms 8.4676 KOps/s 8.2475 KOps/s $\color{#35bf28}+2.67\%$
test_compile_replace[multi-eager] 0.7053ms 0.5856ms 1.7077 KOps/s 1.8121 KOps/s $\textbf{\color{#d91a1a}-5.76\%}$
test_compile_replace[multi-compile] 0.1734ms 0.1242ms 8.0506 KOps/s 7.8789 KOps/s $\color{#35bf28}+2.18\%$
test_compile_tc_getattr_20[eager] 0.2337ms 0.1814ms 5.5122 KOps/s 5.9384 KOps/s $\textbf{\color{#d91a1a}-7.18\%}$
test_compile_tc_getattr_20[compile] 0.3962ms 0.1355ms 7.3799 KOps/s 7.3732 KOps/s $\color{#35bf28}+0.09\%$
test_compile_clone_shallow[20-eager] 49.5010μs 18.4746μs 54.1283 KOps/s 54.2949 KOps/s $\color{#d91a1a}-0.31\%$
test_compile_clone_shallow[20-compile] 81.5610μs 17.7636μs 56.2949 KOps/s 59.2950 KOps/s $\textbf{\color{#d91a1a}-5.06\%}$
test_compile_clone_shallow[40-eager] 58.6610μs 32.1741μs 31.0809 KOps/s 30.9690 KOps/s $\color{#35bf28}+0.36\%$
test_compile_clone_shallow[40-compile] 73.8010μs 17.7230μs 56.4238 KOps/s 57.2539 KOps/s $\color{#d91a1a}-1.45\%$
test_compile_clone_shallow[80-eager] 94.6220μs 60.8818μs 16.4253 KOps/s 16.6330 KOps/s $\color{#d91a1a}-1.25\%$
test_compile_clone_shallow[80-compile] 0.1617ms 20.9437μs 47.7471 KOps/s 48.7331 KOps/s $\color{#d91a1a}-2.02\%$
test_compile_update_inplace[eager] 0.1898ms 57.5098μs 17.3883 KOps/s 16.8904 KOps/s $\color{#35bf28}+2.95\%$
test_compile_update_inplace[compile] 0.4062ms 0.1483ms 6.7412 KOps/s 6.7172 KOps/s $\color{#35bf28}+0.36\%$
test_mod_add[eager] 0.1922ms 48.0719μs 20.8022 KOps/s 19.7888 KOps/s $\textbf{\color{#35bf28}+5.12\%}$
test_mod_add[compile] 0.6913ms 0.1199ms 8.3374 KOps/s 8.2814 KOps/s $\color{#35bf28}+0.68\%$
test_mod_add[compile-overhead] 0.2484ms 0.1616ms 6.1891 KOps/s 5.9931 KOps/s $\color{#35bf28}+3.27\%$
test_mod_wrap[eager] 0.4313ms 0.3025ms 3.3054 KOps/s 3.4162 KOps/s $\color{#d91a1a}-3.25\%$
test_mod_wrap[compile] 0.5278ms 0.3605ms 2.7736 KOps/s 2.6623 KOps/s $\color{#35bf28}+4.18\%$
test_mod_wrap[compile-overhead] 9.1856ms 5.0160ms 199.3606 Ops/s 202.8642 Ops/s $\color{#d91a1a}-1.73\%$
test_mod_wrap_and_backward[eager] 1.6728ms 1.5054ms 664.2757 Ops/s 664.5805 Ops/s $\color{#d91a1a}-0.05\%$
test_mod_wrap_and_backward[compile] 1.6361ms 1.5486ms 645.7457 Ops/s 686.9211 Ops/s $\textbf{\color{#d91a1a}-5.99\%}$
test_mod_wrap_and_backward[compile-overhead] 1.4728ms 1.0072ms 992.8159 Ops/s 1.0918 KOps/s $\textbf{\color{#d91a1a}-9.07\%}$
test_seq_add[eager] 0.2200ms 0.1537ms 6.5059 KOps/s 6.4060 KOps/s $\color{#35bf28}+1.56\%$
test_seq_add[compile] 0.6424ms 0.1265ms 7.9077 KOps/s 7.5913 KOps/s $\color{#35bf28}+4.17\%$
test_seq_add[compile-overhead] 0.4295ms 0.1671ms 5.9848 KOps/s 5.7515 KOps/s $\color{#35bf28}+4.06\%$
test_seq_wrap[eager] 0.5758ms 0.5082ms 1.9678 KOps/s 1.9442 KOps/s $\color{#35bf28}+1.22\%$
test_seq_wrap[compile] 0.4441ms 0.3770ms 2.6526 KOps/s 2.5802 KOps/s $\color{#35bf28}+2.81\%$
test_seq_wrap[compile-overhead] 0.3732ms 0.2826ms 3.5383 KOps/s 3.5022 KOps/s $\color{#35bf28}+1.03\%$
test_func_call_runtime[False-eager] 0.9641ms 0.8721ms 1.1467 KOps/s 1.1376 KOps/s $\color{#35bf28}+0.80\%$
test_func_call_runtime[False-compile] 1.0290ms 0.9431ms 1.0603 KOps/s 1.1009 KOps/s $\color{#d91a1a}-3.68\%$
test_func_call_runtime[False-compile-overhead] 0.5784ms 0.4713ms 2.1217 KOps/s 2.0774 KOps/s $\color{#35bf28}+2.13\%$
test_func_call_runtime[True-eager] 1.1289ms 1.0544ms 948.3958 Ops/s 942.6925 Ops/s $\color{#35bf28}+0.60\%$
test_func_call_runtime[True-compile] 1.0243ms 0.9155ms 1.0923 KOps/s 1.0617 KOps/s $\color{#35bf28}+2.88\%$
test_func_call_runtime[True-compile-overhead] 0.5325ms 0.4857ms 2.0590 KOps/s 2.0301 KOps/s $\color{#35bf28}+1.42\%$
test_func_call_cm_runtime[False-eager] 0.9462ms 0.8613ms 1.1611 KOps/s 1.1496 KOps/s $\color{#35bf28}+1.00\%$
test_func_call_cm_runtime[False-compile] 0.9746ms 0.8901ms 1.1235 KOps/s 1.0922 KOps/s $\color{#35bf28}+2.86\%$
test_func_call_cm_runtime[False-compile-overhead] 0.5203ms 0.4518ms 2.2133 KOps/s 2.1884 KOps/s $\color{#35bf28}+1.14\%$
test_func_call_cm_runtime[True-eager] 1.3867ms 1.2159ms 822.4596 Ops/s 813.1497 Ops/s $\color{#35bf28}+1.14\%$
test_func_call_cm_runtime[True-compile] 1.1876ms 0.9657ms 1.0355 KOps/s 1.0448 KOps/s $\color{#d91a1a}-0.89\%$
test_func_call_cm_runtime[True-compile-overhead] 0.5552ms 0.4960ms 2.0162 KOps/s 1.9898 KOps/s $\color{#35bf28}+1.33\%$
test_vmap_func_call_cm_runtime[eager] 2.8795ms 2.3466ms 426.1408 Ops/s 417.3437 Ops/s $\color{#35bf28}+2.11\%$
test_vmap_func_call_cm_runtime[compile] 1.0844ms 0.9803ms 1.0201 KOps/s 1.0121 KOps/s $\color{#35bf28}+0.80\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.5606ms 0.5010ms 1.9962 KOps/s 1.9853 KOps/s $\color{#35bf28}+0.55\%$
test_distributed 0.5675ms 0.1524ms 6.5635 KOps/s 6.6315 KOps/s $\color{#d91a1a}-1.03\%$
test_tdmodule 0.3561ms 26.9099μs 37.1610 KOps/s 36.8895 KOps/s $\color{#35bf28}+0.74\%$
test_tdmodule_dispatch 72.1810μs 42.7526μs 23.3904 KOps/s 22.3448 KOps/s $\color{#35bf28}+4.68\%$
test_tdseq 48.9510μs 25.8041μs 38.7535 KOps/s 37.7227 KOps/s $\color{#35bf28}+2.73\%$
test_tdseq_dispatch 64.3510μs 45.5787μs 21.9401 KOps/s 21.2988 KOps/s $\color{#35bf28}+3.01\%$
test_instantiation_functorch 2.0693ms 1.9651ms 508.8906 Ops/s 501.8981 Ops/s $\color{#35bf28}+1.39\%$
test_exec_functorch 0.2285ms 0.1712ms 5.8427 KOps/s 5.8163 KOps/s $\color{#35bf28}+0.45\%$
test_exec_functional_call 0.2159ms 0.1524ms 6.5635 KOps/s 6.3698 KOps/s $\color{#35bf28}+3.04\%$
test_exec_td_decorator 0.4531ms 0.2275ms 4.3959 KOps/s 4.4047 KOps/s $\color{#d91a1a}-0.20\%$
test_vmap_mlp_speed_decorator[True-True] 1.0059ms 0.8190ms 1.2210 KOps/s 1.1965 KOps/s $\color{#35bf28}+2.05\%$
test_vmap_mlp_speed_decorator[True-False] 1.0387ms 0.8333ms 1.2001 KOps/s 1.2187 KOps/s $\color{#d91a1a}-1.52\%$
test_vmap_mlp_speed_decorator[False-True] 0.9104ms 0.7054ms 1.4177 KOps/s 1.4268 KOps/s $\color{#d91a1a}-0.64\%$
test_vmap_mlp_speed_decorator[False-False] 0.8855ms 0.6982ms 1.4322 KOps/s 1.4302 KOps/s $\color{#35bf28}+0.14\%$
test_vmap_transformer_speed_decorator[True-True] 21.1271ms 20.5350ms 48.6973 Ops/s 48.7562 Ops/s $\color{#d91a1a}-0.12\%$
test_vmap_transformer_speed_decorator[True-False] 21.5278ms 20.6181ms 48.5010 Ops/s 48.9811 Ops/s $\color{#d91a1a}-0.98\%$
test_vmap_transformer_speed_decorator[False-True] 21.5111ms 20.5105ms 48.7555 Ops/s 49.5219 Ops/s $\color{#d91a1a}-1.55\%$
test_vmap_transformer_speed_decorator[False-False] 20.9965ms 20.5789ms 48.5935 Ops/s 49.4009 Ops/s $\color{#d91a1a}-1.63\%$
test_to_module_speed[True] 1.4989ms 1.4093ms 709.5899 Ops/s 709.7196 Ops/s $\color{#d91a1a}-0.02\%$
test_to_module_speed[False] 1.9523ms 1.3799ms 724.7085 Ops/s 718.4850 Ops/s $\color{#35bf28}+0.87\%$
test_tc_init 80.8910μs 43.1361μs 23.1824 KOps/s 23.1456 KOps/s $\color{#35bf28}+0.16\%$
test_tc_init_tensor_only 36.7710μs 9.0420μs 110.5952 KOps/s 108.1161 KOps/s $\color{#35bf28}+2.29\%$
test_tc_init_nested 0.1758ms 84.8680μs 11.7830 KOps/s 11.7178 KOps/s $\color{#35bf28}+0.56\%$
test_tc_init_many_fields 40.6910μs 15.5145μs 64.4559 KOps/s 64.3098 KOps/s $\color{#35bf28}+0.23\%$
test_tc_first_layer_tensor 28.9210μs 1.7061μs 586.1357 KOps/s 594.6621 KOps/s $\color{#d91a1a}-1.43\%$
test_tc_first_layer_tensor_only 3.0076μs 0.3838μs 2.6059 MOps/s 2.6328 MOps/s $\color{#d91a1a}-1.02\%$
test_tc_first_layer_tensor_set 29.1410μs 3.7036μs 270.0093 KOps/s 274.9580 KOps/s $\color{#d91a1a}-1.80\%$
test_tc_first_layer_tensor_only_set 32.6900μs 3.3514μs 298.3795 KOps/s 329.1446 KOps/s $\textbf{\color{#d91a1a}-9.35\%}$
test_tc_first_layer_nontensor 45.6200μs 5.6879μs 175.8122 KOps/s 173.7308 KOps/s $\color{#35bf28}+1.20\%$
test_tc_second_layer_tensor 48.0300μs 4.1205μs 242.6868 KOps/s 240.8788 KOps/s $\color{#35bf28}+0.75\%$
test_tc_second_layer_nontensor 36.2400μs 8.1930μs 122.0559 KOps/s 123.1175 KOps/s $\color{#d91a1a}-0.86\%$
test_unbind 0.2840s 16.0626ms 62.2565 Ops/s 71.8551 Ops/s $\textbf{\color{#d91a1a}-13.36\%}$
test_full_like 13.5915ms 4.4127ms 226.6183 Ops/s 226.8728 Ops/s $\color{#d91a1a}-0.11\%$
test_zeros_like 4.9395ms 4.3468ms 230.0526 Ops/s 228.8276 Ops/s $\color{#35bf28}+0.54\%$
test_ones_like 4.4663ms 4.2171ms 237.1315 Ops/s 236.1548 Ops/s $\color{#35bf28}+0.41\%$
test_clone 6.8025ms 6.4423ms 155.2244 Ops/s 155.6817 Ops/s $\color{#d91a1a}-0.29\%$
test_squeeze 64.7110μs 13.4812μs 74.1775 KOps/s 73.6810 KOps/s $\color{#35bf28}+0.67\%$
test_unsqueeze 0.2773ms 0.1082ms 9.2430 KOps/s 8.7936 KOps/s $\textbf{\color{#35bf28}+5.11\%}$
test_split 0.2299ms 0.1741ms 5.7445 KOps/s 5.5012 KOps/s $\color{#35bf28}+4.42\%$
test_permute 0.2513ms 0.2067ms 4.8381 KOps/s 4.8115 KOps/s $\color{#35bf28}+0.55\%$
test_stack 51.4225ms 51.0774ms 19.5781 Ops/s 19.8420 Ops/s $\color{#d91a1a}-1.33\%$
test_cat 51.2335ms 50.6720ms 19.7348 Ops/s 19.5576 Ops/s $\color{#35bf28}+0.91\%$
test_sequential_tensordict 0.6085ms 0.2068ms 4.8365 KOps/s 4.6101 KOps/s $\color{#35bf28}+4.91\%$
test_sequential_graph_module 0.1643ms 0.1127ms 8.8756 KOps/s 8.4968 KOps/s $\color{#35bf28}+4.46\%$
test_nested_tensordict 0.6730ms 0.2675ms 3.7379 KOps/s 3.5170 KOps/s $\textbf{\color{#35bf28}+6.28\%}$
test_nested_graph_module 0.1803ms 0.1266ms 7.8963 KOps/s 8.0389 KOps/s $\color{#d91a1a}-1.77\%$

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 22, 2026

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 261. Improved: $\large\color{#35bf28}18$. Worsened: $\large\color{#d91a1a}7$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 28.6620μs 15.0079μs 66.6314 KOps/s 67.2457 KOps/s $\color{#d91a1a}-0.91\%$
test_plain_set_stack_nested 43.7620μs 15.3491μs 65.1506 KOps/s 66.4815 KOps/s $\color{#d91a1a}-2.00\%$
test_plain_set_nested_inplace 41.0520μs 16.8772μs 59.2517 KOps/s 59.4970 KOps/s $\color{#d91a1a}-0.41\%$
test_plain_set_stack_nested_inplace 45.6720μs 16.7925μs 59.5505 KOps/s 60.0916 KOps/s $\color{#d91a1a}-0.90\%$
test_items 39.8620μs 6.0160μs 166.2232 KOps/s 165.3600 KOps/s $\color{#35bf28}+0.52\%$
test_items_nested 0.5653ms 0.4689ms 2.1324 KOps/s 2.1395 KOps/s $\color{#d91a1a}-0.33\%$
test_items_nested_locked 0.5372ms 0.4692ms 2.1314 KOps/s 2.1281 KOps/s $\color{#35bf28}+0.15\%$
test_items_nested_leaf 0.2127ms 97.5954μs 10.2464 KOps/s 10.1443 KOps/s $\color{#35bf28}+1.01\%$
test_items_stack_nested 0.6012ms 0.4656ms 2.1479 KOps/s 2.1448 KOps/s $\color{#35bf28}+0.15\%$
test_items_stack_nested_leaf 0.1439ms 97.7263μs 10.2327 KOps/s 10.1389 KOps/s $\color{#35bf28}+0.92\%$
test_items_stack_nested_locked 0.5530ms 0.4690ms 2.1322 KOps/s 2.1319 KOps/s $\color{#35bf28}+0.02\%$
test_keys 28.5420μs 4.2292μs 236.4530 KOps/s 236.7801 KOps/s $\color{#d91a1a}-0.14\%$
test_keys_nested 0.1696ms 0.1290ms 7.7490 KOps/s 7.6521 KOps/s $\color{#35bf28}+1.27\%$
test_keys_nested_locked 2.2435ms 0.1378ms 7.2574 KOps/s 7.1118 KOps/s $\color{#35bf28}+2.05\%$
test_keys_nested_leaf 0.1654ms 0.1203ms 8.3125 KOps/s 8.2179 KOps/s $\color{#35bf28}+1.15\%$
test_keys_stack_nested 0.1675ms 0.1301ms 7.6840 KOps/s 7.6564 KOps/s $\color{#35bf28}+0.36\%$
test_keys_stack_nested_leaf 0.1526ms 0.1206ms 8.2908 KOps/s 8.2313 KOps/s $\color{#35bf28}+0.72\%$
test_keys_stack_nested_locked 0.1797ms 0.1386ms 7.2164 KOps/s 7.1324 KOps/s $\color{#35bf28}+1.18\%$
test_values 6.2664μs 1.0097μs 990.4105 KOps/s 980.1075 KOps/s $\color{#35bf28}+1.05\%$
test_values_nested 0.1090ms 52.2247μs 19.1480 KOps/s 18.9226 KOps/s $\color{#35bf28}+1.19\%$
test_values_nested_locked 81.0640μs 55.4270μs 18.0417 KOps/s 17.6459 KOps/s $\color{#35bf28}+2.24\%$
test_values_nested_leaf 89.3850μs 59.5935μs 16.7804 KOps/s 16.5065 KOps/s $\color{#35bf28}+1.66\%$
test_values_stack_nested 87.2940μs 52.4257μs 19.0746 KOps/s 18.7635 KOps/s $\color{#35bf28}+1.66\%$
test_values_stack_nested_leaf 0.1103ms 59.7417μs 16.7387 KOps/s 16.4649 KOps/s $\color{#35bf28}+1.66\%$
test_values_stack_nested_locked 0.1438ms 54.6926μs 18.2840 KOps/s 17.6462 KOps/s $\color{#35bf28}+3.61\%$
test_membership 4.5302μs 0.8424μs 1.1871 MOps/s 1.1823 MOps/s $\color{#35bf28}+0.41\%$
test_membership_nested 93.5050μs 2.8762μs 347.6855 KOps/s 348.2645 KOps/s $\color{#d91a1a}-0.17\%$
test_membership_nested_leaf 41.5520μs 2.9160μs 342.9354 KOps/s 351.1082 KOps/s $\color{#d91a1a}-2.33\%$
test_membership_stacked_nested 23.4920μs 2.9277μs 341.5688 KOps/s 353.4211 KOps/s $\color{#d91a1a}-3.35\%$
test_membership_stacked_nested_leaf 57.9240μs 2.9171μs 342.8015 KOps/s 352.7103 KOps/s $\color{#d91a1a}-2.81\%$
test_membership_nested_last 23.6410μs 4.3352μs 230.6698 KOps/s 231.6368 KOps/s $\color{#d91a1a}-0.42\%$
test_membership_nested_leaf_last 65.0730μs 4.3214μs 231.4077 KOps/s 231.5529 KOps/s $\color{#d91a1a}-0.06\%$
test_membership_stacked_nested_last 30.6620μs 4.3736μs 228.6466 KOps/s 229.4449 KOps/s $\color{#d91a1a}-0.35\%$
test_membership_stacked_nested_leaf_last 52.6730μs 4.3456μs 230.1170 KOps/s 231.3949 KOps/s $\color{#d91a1a}-0.55\%$
test_nested_getleaf 63.4530μs 21.8087μs 45.8533 KOps/s 47.3263 KOps/s $\color{#d91a1a}-3.11\%$
test_nested_get 61.8340μs 20.5782μs 48.5952 KOps/s 49.5123 KOps/s $\color{#d91a1a}-1.85\%$
test_stacked_getleaf 53.7830μs 21.5482μs 46.4075 KOps/s 46.8159 KOps/s $\color{#d91a1a}-0.87\%$
test_stacked_get 95.2950μs 20.5283μs 48.7132 KOps/s 49.5268 KOps/s $\color{#d91a1a}-1.64\%$
test_nested_getitemleaf 68.0330μs 22.2213μs 45.0019 KOps/s 46.1816 KOps/s $\color{#d91a1a}-2.55\%$
test_nested_getitem 56.6730μs 21.1663μs 47.2449 KOps/s 48.5145 KOps/s $\color{#d91a1a}-2.62\%$
test_stacked_getitemleaf 52.4830μs 22.0017μs 45.4511 KOps/s 45.6281 KOps/s $\color{#d91a1a}-0.39\%$
test_stacked_getitem 60.1030μs 21.2810μs 46.9902 KOps/s 47.9602 KOps/s $\color{#d91a1a}-2.02\%$
test_lock_nested 4.7036ms 0.4822ms 2.0737 KOps/s 2.0800 KOps/s $\color{#d91a1a}-0.30\%$
test_lock_stack_nested 0.5642ms 0.4848ms 2.0627 KOps/s 2.0455 KOps/s $\color{#35bf28}+0.85\%$
test_unlock_nested 0.5111ms 0.3937ms 2.5397 KOps/s 2.5522 KOps/s $\color{#d91a1a}-0.49\%$
test_unlock_stack_nested 0.4345ms 0.3923ms 2.5490 KOps/s 2.5085 KOps/s $\color{#35bf28}+1.61\%$
test_flatten_speed 0.1679ms 0.1227ms 8.1499 KOps/s 8.2686 KOps/s $\color{#d91a1a}-1.44\%$
test_unflatten_speed 0.6487ms 0.5723ms 1.7473 KOps/s 1.7533 KOps/s $\color{#d91a1a}-0.34\%$
test_common_ops 0.8470ms 0.6990ms 1.4307 KOps/s 1.4123 KOps/s $\color{#35bf28}+1.30\%$
test_creation 0.1170ms 3.1382μs 318.6552 KOps/s 316.5544 KOps/s $\color{#35bf28}+0.66\%$
test_creation_empty 38.1820μs 6.9770μs 143.3273 KOps/s 142.6226 KOps/s $\color{#35bf28}+0.49\%$
test_creation_nested_1 43.9630μs 11.5680μs 86.4455 KOps/s 86.1063 KOps/s $\color{#35bf28}+0.39\%$
test_creation_nested_2 41.4630μs 13.3702μs 74.7934 KOps/s 74.6531 KOps/s $\color{#35bf28}+0.19\%$
test_creation_many_keys[10] 89.8550μs 21.2111μs 47.1452 KOps/s 47.4183 KOps/s $\color{#d91a1a}-0.58\%$
test_creation_many_keys[50] 0.1563ms 91.3589μs 10.9458 KOps/s 10.9599 KOps/s $\color{#d91a1a}-0.13\%$
test_creation_many_keys[100] 0.2313ms 0.1795ms 5.5724 KOps/s 5.6095 KOps/s $\color{#d91a1a}-0.66\%$
test_creation_nested_many_keys[10] 95.6750μs 45.5023μs 21.9769 KOps/s 22.0846 KOps/s $\color{#d91a1a}-0.49\%$
test_creation_nested_many_keys[50] 0.2342ms 0.1866ms 5.3595 KOps/s 5.3886 KOps/s $\color{#d91a1a}-0.54\%$
test_clone 45.7830μs 13.3607μs 74.8465 KOps/s 73.8179 KOps/s $\color{#35bf28}+1.39\%$
test_getitem[int] 1.5124ms 15.6594μs 63.8594 KOps/s 58.6511 KOps/s $\textbf{\color{#35bf28}+8.88\%}$
test_getitem[slice_int] 0.1406ms 25.3419μs 39.4604 KOps/s 39.8257 KOps/s $\color{#d91a1a}-0.92\%$
test_getitem[range] 0.1805ms 63.2595μs 15.8079 KOps/s 15.4643 KOps/s $\color{#35bf28}+2.22\%$
test_getitem[tuple] 0.1444ms 24.6653μs 40.5428 KOps/s 40.9876 KOps/s $\color{#d91a1a}-1.09\%$
test_getitem[list] 0.1823ms 58.1102μs 17.2087 KOps/s 16.9342 KOps/s $\color{#35bf28}+1.62\%$
test_setitem_dim[int] 66.4540μs 25.6563μs 38.9768 KOps/s 38.0770 KOps/s $\color{#35bf28}+2.36\%$
test_setitem_dim[slice_int] 63.4430μs 42.4423μs 23.5614 KOps/s 22.9330 KOps/s $\color{#35bf28}+2.74\%$
test_setitem_dim[range] 0.1211ms 95.2883μs 10.4945 KOps/s 10.3925 KOps/s $\color{#35bf28}+0.98\%$
test_setitem_dim[tuple] 59.9230μs 39.3041μs 25.4426 KOps/s 24.3676 KOps/s $\color{#35bf28}+4.41\%$
test_setitem 73.5640μs 17.8743μs 55.9462 KOps/s 54.5760 KOps/s $\color{#35bf28}+2.51\%$
test_set 90.4350μs 17.1295μs 58.3788 KOps/s 56.9805 KOps/s $\color{#35bf28}+2.45\%$
test_set_shared 0.5106ms 0.2023ms 4.9426 KOps/s 4.9123 KOps/s $\color{#35bf28}+0.62\%$
test_update 0.2101ms 22.2393μs 44.9655 KOps/s 44.6049 KOps/s $\color{#35bf28}+0.81\%$
test_update_nested 0.2160ms 33.2463μs 30.0785 KOps/s 29.1100 KOps/s $\color{#35bf28}+3.33\%$
test_update__nested 0.4440ms 34.4980μs 28.9872 KOps/s 28.1590 KOps/s $\color{#35bf28}+2.94\%$
test_set_nested 80.9150μs 18.9800μs 52.6871 KOps/s 51.5156 KOps/s $\color{#35bf28}+2.27\%$
test_set_nested_new 63.5340μs 24.0261μs 41.6215 KOps/s 40.7438 KOps/s $\color{#35bf28}+2.15\%$
test_select 88.7150μs 39.8775μs 25.0768 KOps/s 24.6134 KOps/s $\color{#35bf28}+1.88\%$
test_select_nested 0.1412ms 74.5650μs 13.4111 KOps/s 13.5488 KOps/s $\color{#d91a1a}-1.02\%$
test_exclude_nested 0.1375ms 90.7191μs 11.0230 KOps/s 10.9387 KOps/s $\color{#35bf28}+0.77\%$
test_empty[True] 0.4841ms 0.4007ms 2.4959 KOps/s 2.5055 KOps/s $\color{#d91a1a}-0.38\%$
test_empty[False] 10.3405μs 1.3115μs 762.4760 KOps/s 764.1446 KOps/s $\color{#d91a1a}-0.22\%$
test_to 0.1106ms 76.4915μs 13.0733 KOps/s 13.5497 KOps/s $\color{#d91a1a}-3.52\%$
test_to_nonblocking 0.1228ms 68.8130μs 14.5321 KOps/s 14.9935 KOps/s $\color{#d91a1a}-3.08\%$
test_unbind_speed 0.3700ms 0.3396ms 2.9445 KOps/s 2.9597 KOps/s $\color{#d91a1a}-0.51\%$
test_unbind_speed_stack0 0.3942ms 0.3347ms 2.9875 KOps/s 3.0026 KOps/s $\color{#d91a1a}-0.51\%$
test_unbind_speed_stack1 0.1066s 0.9261ms 1.0798 KOps/s 1.1735 KOps/s $\textbf{\color{#d91a1a}-7.98\%}$
test_split 1.2216ms 1.1445ms 873.7703 Ops/s 788.0144 Ops/s $\textbf{\color{#35bf28}+10.88\%}$
test_chunk 0.1067s 1.2164ms 822.0704 Ops/s 925.7265 Ops/s $\textbf{\color{#d91a1a}-11.20\%}$
test_to_cpu_blocking 19.8704ms 19.6994ms 50.7630 Ops/s 44.8447 Ops/s $\textbf{\color{#35bf28}+13.20\%}$
test_to_cpu_global_sync 11.9033ms 11.7327ms 85.2320 Ops/s 83.2873 Ops/s $\color{#35bf28}+2.33\%$
test_to_cpu_event_sync 13.0166ms 12.6895ms 78.8056 Ops/s 77.0840 Ops/s $\color{#35bf28}+2.23\%$
test_to_cpu_default 0.1190s 14.0679ms 71.0840 Ops/s 76.9804 Ops/s $\textbf{\color{#d91a1a}-7.66\%}$
test_consolidate[False-None] 4.2714ms 4.1304ms 242.1047 Ops/s 237.8536 Ops/s $\color{#35bf28}+1.79\%$
test_consolidate[default-None] 2.4932ms 2.0598ms 485.4735 Ops/s 468.3001 Ops/s $\color{#35bf28}+3.67\%$
test_consolidate[reduce-overhead-None] 2.1575ms 1.9837ms 504.0987 Ops/s 490.9068 Ops/s $\color{#35bf28}+2.69\%$
test_consolidate_njt[False-None] 8.7302ms 8.4938ms 117.7333 Ops/s 115.5250 Ops/s $\color{#35bf28}+1.91\%$
test_to[False-False-None] 2.3235ms 2.1638ms 462.1505 Ops/s 453.5922 Ops/s $\color{#35bf28}+1.89\%$
test_to[True-False-None] 2.1783ms 1.9107ms 523.3589 Ops/s 515.9296 Ops/s $\color{#35bf28}+1.44\%$
test_to[within-False-None] 6.3092ms 6.1165ms 163.4913 Ops/s 159.0473 Ops/s $\color{#35bf28}+2.79\%$
test_to[True-default-None] 9.9023ms 9.2415ms 108.2074 Ops/s 104.8987 Ops/s $\color{#35bf28}+3.15\%$
test_to_njt[False-False-None] 9.0763ms 8.5713ms 116.6681 Ops/s 114.0979 Ops/s $\color{#35bf28}+2.25\%$
test_to_njt[True-False-None] 7.5678ms 7.0084ms 142.6855 Ops/s 140.9423 Ops/s $\color{#35bf28}+1.24\%$
test_to_njt[within-False-None] 15.8535ms 15.5817ms 64.1778 Ops/s 62.7229 Ops/s $\color{#35bf28}+2.32\%$
test_creation[device0] 0.3968ms 0.1144ms 8.7411 KOps/s 8.4379 KOps/s $\color{#35bf28}+3.59\%$
test_creation_from_tensor 0.5486ms 0.1126ms 8.8826 KOps/s 8.5570 KOps/s $\color{#35bf28}+3.80\%$
test_add_one[memmap_tensor0] 0.2212ms 6.6069μs 151.3560 KOps/s 145.3775 KOps/s $\color{#35bf28}+4.11\%$
test_contiguous[memmap_tensor0] 13.1310μs 0.6470μs 1.5457 MOps/s 2.2425 MOps/s $\textbf{\color{#d91a1a}-31.07\%}$
test_stack[memmap_tensor0] 0.1358ms 4.5939μs 217.6821 KOps/s 217.7411 KOps/s $\color{#d91a1a}-0.03\%$
test_memmaptd_index 1.0723ms 0.2743ms 3.6460 KOps/s 3.6765 KOps/s $\color{#d91a1a}-0.83\%$
test_memmaptd_index_astensor 0.5351ms 0.3736ms 2.6769 KOps/s 2.6607 KOps/s $\color{#35bf28}+0.61\%$
test_memmaptd_index_op 0.8031ms 0.6279ms 1.5927 KOps/s 1.5615 KOps/s $\color{#35bf28}+2.00\%$
test_serialize_model 0.1374s 0.1357s 7.3681 Ops/s 7.3457 Ops/s $\color{#35bf28}+0.30\%$
test_serialize_model_pickle 1.3676s 1.2132s 0.8243 Ops/s 0.8386 Ops/s $\color{#d91a1a}-1.70\%$
test_serialize_weights 0.1362s 0.1332s 7.5049 Ops/s 7.4691 Ops/s $\color{#35bf28}+0.48\%$
test_serialize_weights_returnearly 0.4502s 88.2853ms 11.3269 Ops/s 15.7536 Ops/s $\textbf{\color{#d91a1a}-28.10\%}$
test_serialize_weights_pickle 1.3790s 1.1896s 0.8406 Ops/s 0.8233 Ops/s $\color{#35bf28}+2.10\%$
test_reshape_pytree 0.2032ms 32.3455μs 30.9162 KOps/s 30.7388 KOps/s $\color{#35bf28}+0.58\%$
test_reshape_td 88.8140μs 46.9619μs 21.2938 KOps/s 22.1930 KOps/s $\color{#d91a1a}-4.05\%$
test_view_pytree 0.2150ms 31.8311μs 31.4158 KOps/s 31.4261 KOps/s $\color{#d91a1a}-0.03\%$
test_view_td 98.1560μs 53.9334μs 18.5414 KOps/s 18.6262 KOps/s $\color{#d91a1a}-0.46\%$
test_unbind_pytree 0.2351ms 36.3402μs 27.5177 KOps/s 27.6095 KOps/s $\color{#d91a1a}-0.33\%$
test_unbind_td 0.1094ms 49.8098μs 20.0764 KOps/s 19.8057 KOps/s $\color{#35bf28}+1.37\%$
test_split_pytree 0.1958ms 42.0649μs 23.7728 KOps/s 23.8420 KOps/s $\color{#d91a1a}-0.29\%$
test_split_td 0.1470ms 64.4886μs 15.5066 KOps/s 15.3002 KOps/s $\color{#35bf28}+1.35\%$
test_add_pytree 0.2319ms 41.9933μs 23.8133 KOps/s 24.1340 KOps/s $\color{#d91a1a}-1.33\%$
test_add_td 0.1018ms 57.4730μs 17.3995 KOps/s 17.5071 KOps/s $\color{#d91a1a}-0.61\%$
test_compile_add_one_nested[tensordict-compile] 0.2502ms 0.1609ms 6.2157 KOps/s 5.7277 KOps/s $\textbf{\color{#35bf28}+8.52\%}$
test_compile_add_one_nested[tensordict-eager] 0.2806ms 0.2032ms 4.9218 KOps/s 5.0099 KOps/s $\color{#d91a1a}-1.76\%$
test_compile_add_one_nested[pytree-compile] 0.2417ms 0.1278ms 7.8252 KOps/s 7.5428 KOps/s $\color{#35bf28}+3.74\%$
test_compile_add_one_nested[pytree-eager] 0.4330ms 0.1805ms 5.5392 KOps/s 5.4799 KOps/s $\color{#35bf28}+1.08\%$
test_compile_copy_nested[tensordict-compile] 0.2383ms 22.0998μs 45.2492 KOps/s 61.7339 KOps/s $\textbf{\color{#d91a1a}-26.70\%}$
test_compile_copy_nested[tensordict-eager] 0.1023ms 54.3595μs 18.3961 KOps/s 18.3401 KOps/s $\color{#35bf28}+0.31\%$
test_compile_copy_nested[pytree-compile] 0.1719ms 16.1412μs 61.9531 KOps/s 61.2389 KOps/s $\color{#35bf28}+1.17\%$
test_compile_copy_nested[pytree-eager] 0.3717ms 68.2317μs 14.6559 KOps/s 14.8233 KOps/s $\color{#d91a1a}-1.13\%$
test_compile_add_one_flat[tensordict-compile] 0.2917ms 0.2007ms 4.9817 KOps/s 4.8413 KOps/s $\color{#35bf28}+2.90\%$
test_compile_add_one_flat[tensordict-eager] 0.3926ms 0.2767ms 3.6145 KOps/s 3.5877 KOps/s $\color{#35bf28}+0.75\%$
test_compile_add_one_flat[tensorclass-compile] 0.5747ms 0.1359ms 7.3609 KOps/s 7.0332 KOps/s $\color{#35bf28}+4.66\%$
test_compile_add_one_flat[tensorclass-eager] 0.4977ms 73.9351μs 13.5254 KOps/s 13.3774 KOps/s $\color{#35bf28}+1.11\%$
test_compile_add_one_flat[pytree-compile] 0.2302ms 0.1800ms 5.5541 KOps/s 5.4285 KOps/s $\color{#35bf28}+2.31\%$
test_compile_add_one_flat[pytree-eager] 0.8348ms 0.5352ms 1.8685 KOps/s 1.8666 KOps/s $\color{#35bf28}+0.10\%$
test_compile_add_self_flat[tensordict-eager] 0.4539ms 0.3300ms 3.0302 KOps/s 3.0144 KOps/s $\color{#35bf28}+0.52\%$
test_compile_add_self_flat[tensordict-compile] 0.2667ms 0.1992ms 5.0203 KOps/s 2.9971 KOps/s $\textbf{\color{#35bf28}+67.50\%}$
test_compile_add_self_flat[tensorclass-eager] 0.1435ms 89.9344μs 11.1192 KOps/s 11.0288 KOps/s $\color{#35bf28}+0.82\%$
test_compile_add_self_flat[tensorclass-compile] 0.2048ms 0.1381ms 7.2436 KOps/s 6.7764 KOps/s $\textbf{\color{#35bf28}+6.90\%}$
test_compile_add_self_flat[pytree-eager] 0.6429ms 0.4374ms 2.2860 KOps/s 2.2357 KOps/s $\color{#35bf28}+2.25\%$
test_compile_add_self_flat[pytree-compile] 0.5227ms 0.1804ms 5.5433 KOps/s 5.4090 KOps/s $\color{#35bf28}+2.48\%$
test_compile_copy_flat[tensordict-compile] 0.1227ms 19.6328μs 50.9351 KOps/s 52.2070 KOps/s $\color{#d91a1a}-2.44\%$
test_compile_copy_flat[tensordict-eager] 76.1040μs 41.8574μs 23.8907 KOps/s 24.6092 KOps/s $\color{#d91a1a}-2.92\%$
test_compile_copy_flat[pytree-compile] 94.0450μs 16.7314μs 59.7678 KOps/s 59.4184 KOps/s $\color{#35bf28}+0.59\%$
test_compile_copy_flat[pytree-eager] 0.3537ms 53.3347μs 18.7495 KOps/s 18.9840 KOps/s $\color{#d91a1a}-1.23\%$
test_compile_assign_and_add[tensordict-compile] 2.1307ms 0.1905ms 5.2482 KOps/s 4.8278 KOps/s $\textbf{\color{#35bf28}+8.71\%}$
test_compile_assign_and_add[tensordict-eager] 3.6233ms 3.4288ms 291.6461 Ops/s 293.1837 Ops/s $\color{#d91a1a}-0.52\%$
test_compile_assign_and_add[pytree-compile] 2.0896ms 0.1786ms 5.6002 KOps/s 5.4263 KOps/s $\color{#35bf28}+3.20\%$
test_compile_assign_and_add[pytree-eager] 2.9890ms 2.8426ms 351.7910 Ops/s 349.3808 Ops/s $\color{#35bf28}+0.69\%$
test_compile_indexing[tensor-tensordict-compile] 0.1907ms 0.1266ms 7.8972 KOps/s 7.5163 KOps/s $\textbf{\color{#35bf28}+5.07\%}$
test_compile_indexing[tensor-tensordict-eager] 0.2912ms 75.7462μs 13.2020 KOps/s 13.4288 KOps/s $\color{#d91a1a}-1.69\%$
test_compile_indexing[tensor-tensorclass-compile] 0.2390ms 0.1144ms 8.7380 KOps/s 8.5780 KOps/s $\color{#35bf28}+1.86\%$
test_compile_indexing[tensor-tensorclass-eager] 0.2514ms 46.2413μs 21.6257 KOps/s 22.2741 KOps/s $\color{#d91a1a}-2.91\%$
test_compile_indexing[tensor-pytree-compile] 0.1644ms 0.1150ms 8.6974 KOps/s 8.3198 KOps/s $\color{#35bf28}+4.54\%$
test_compile_indexing[tensor-pytree-eager] 0.2697ms 47.3221μs 21.1318 KOps/s 22.3780 KOps/s $\textbf{\color{#d91a1a}-5.57\%}$
test_compile_indexing[slice-tensordict-compile] 0.1524ms 69.8571μs 14.3149 KOps/s 13.6554 KOps/s $\color{#35bf28}+4.83\%$
test_compile_indexing[slice-tensordict-eager] 0.2011ms 29.3589μs 34.0612 KOps/s 34.9385 KOps/s $\color{#d91a1a}-2.51\%$
test_compile_indexing[slice-tensorclass-compile] 99.2960μs 59.2702μs 16.8719 KOps/s 16.7119 KOps/s $\color{#35bf28}+0.96\%$
test_compile_indexing[slice-tensorclass-eager] 0.2454ms 22.5140μs 44.4168 KOps/s 43.9578 KOps/s $\color{#35bf28}+1.04\%$
test_compile_indexing[slice-pytree-compile] 0.1032ms 62.5002μs 15.9999 KOps/s 16.5026 KOps/s $\color{#d91a1a}-3.05\%$
test_compile_indexing[slice-pytree-eager] 0.3107ms 22.4263μs 44.5905 KOps/s 44.4053 KOps/s $\color{#35bf28}+0.42\%$
test_compile_indexing[int-tensordict-compile] 0.1140ms 71.1889μs 14.0471 KOps/s 13.1944 KOps/s $\textbf{\color{#35bf28}+6.46\%}$
test_compile_indexing[int-tensordict-eager] 0.2520ms 27.9739μs 35.7476 KOps/s 34.7374 KOps/s $\color{#35bf28}+2.91\%$
test_compile_indexing[int-tensorclass-compile] 0.1330ms 58.4026μs 17.1225 KOps/s 16.6371 KOps/s $\color{#35bf28}+2.92\%$
test_compile_indexing[int-tensorclass-eager] 0.2540ms 22.5184μs 44.4081 KOps/s 44.8660 KOps/s $\color{#d91a1a}-1.02\%$
test_compile_indexing[int-pytree-compile] 0.1122ms 61.8639μs 16.1645 KOps/s 16.5932 KOps/s $\color{#d91a1a}-2.58\%$
test_compile_indexing[int-pytree-eager] 0.2388ms 22.3679μs 44.7070 KOps/s 44.4272 KOps/s $\color{#35bf28}+0.63\%$
test_compile_replace[single-eager] 0.1084ms 48.7066μs 20.5311 KOps/s 20.8137 KOps/s $\color{#d91a1a}-1.36\%$
test_compile_replace[single-compile] 0.2218ms 0.1227ms 8.1492 KOps/s 7.9066 KOps/s $\color{#35bf28}+3.07\%$
test_compile_replace[multi-eager] 0.6775ms 0.5704ms 1.7531 KOps/s 1.7874 KOps/s $\color{#d91a1a}-1.92\%$
test_compile_replace[multi-compile] 0.2672ms 0.1291ms 7.7454 KOps/s 7.5586 KOps/s $\color{#35bf28}+2.47\%$
test_compile_tc_getattr_20[eager] 0.2573ms 0.1725ms 5.7987 KOps/s 5.8919 KOps/s $\color{#d91a1a}-1.58\%$
test_compile_tc_getattr_20[compile] 0.2944ms 0.1380ms 7.2439 KOps/s 7.1428 KOps/s $\color{#35bf28}+1.41\%$
test_compile_clone_shallow[20-eager] 52.8830μs 19.4867μs 51.3171 KOps/s 51.5268 KOps/s $\color{#d91a1a}-0.41\%$
test_compile_clone_shallow[20-compile] 0.1029ms 17.7586μs 56.3108 KOps/s 49.3903 KOps/s $\textbf{\color{#35bf28}+14.01\%}$
test_compile_clone_shallow[40-eager] 64.2630μs 34.3949μs 29.0741 KOps/s 29.6729 KOps/s $\color{#d91a1a}-2.02\%$
test_compile_clone_shallow[40-compile] 92.7950μs 18.4560μs 54.1828 KOps/s 53.2331 KOps/s $\color{#35bf28}+1.78\%$
test_compile_clone_shallow[80-eager] 0.1032ms 63.4669μs 15.7563 KOps/s 15.7657 KOps/s $\color{#d91a1a}-0.06\%$
test_compile_clone_shallow[80-compile] 57.7330μs 20.8229μs 48.0239 KOps/s 46.1219 KOps/s $\color{#35bf28}+4.12\%$
test_compile_update_inplace[eager] 0.1248ms 58.8896μs 16.9809 KOps/s 16.6198 KOps/s $\color{#35bf28}+2.17\%$
test_compile_update_inplace[compile] 0.2337ms 0.1521ms 6.5728 KOps/s 6.4109 KOps/s $\color{#35bf28}+2.53\%$
test_mod_add[eager] 0.1382ms 49.1709μs 20.3372 KOps/s 20.4579 KOps/s $\color{#d91a1a}-0.59\%$
test_mod_add[compile] 0.1608ms 0.1190ms 8.4014 KOps/s 8.3534 KOps/s $\color{#35bf28}+0.58\%$
test_mod_add[compile-overhead] 0.2537ms 0.1672ms 5.9802 KOps/s 5.8106 KOps/s $\color{#35bf28}+2.92\%$
test_mod_wrap[eager] 0.7558ms 0.2987ms 3.3474 KOps/s 3.3344 KOps/s $\color{#35bf28}+0.39\%$
test_mod_wrap[compile] 0.4847ms 0.3799ms 2.6324 KOps/s 2.5490 KOps/s $\color{#35bf28}+3.27\%$
test_mod_wrap[compile-overhead] 9.0248ms 4.9392ms 202.4628 Ops/s 200.6953 Ops/s $\color{#35bf28}+0.88\%$
test_mod_wrap_and_backward[eager] 1.6916ms 1.5108ms 661.8867 Ops/s 644.0789 Ops/s $\color{#35bf28}+2.76\%$
test_mod_wrap_and_backward[compile] 1.7930ms 1.4830ms 674.3053 Ops/s 671.6148 Ops/s $\color{#35bf28}+0.40\%$
test_mod_wrap_and_backward[compile-overhead] 1.3155ms 0.9190ms 1.0882 KOps/s 958.5758 Ops/s $\textbf{\color{#35bf28}+13.52\%}$
test_seq_add[eager] 0.2321ms 0.1537ms 6.5051 KOps/s 6.3818 KOps/s $\color{#35bf28}+1.93\%$
test_seq_add[compile] 0.2768ms 0.1284ms 7.7909 KOps/s 7.4154 KOps/s $\textbf{\color{#35bf28}+5.06\%}$
test_seq_add[compile-overhead] 0.3254ms 0.1764ms 5.6703 KOps/s 5.6234 KOps/s $\color{#35bf28}+0.83\%$
test_seq_wrap[eager] 0.7239ms 0.5490ms 1.8215 KOps/s 1.9046 KOps/s $\color{#d91a1a}-4.36\%$
test_seq_wrap[compile] 0.5448ms 0.3985ms 2.5093 KOps/s 2.4881 KOps/s $\color{#35bf28}+0.85\%$
test_seq_wrap[compile-overhead] 0.4365ms 0.2898ms 3.4505 KOps/s 3.3759 KOps/s $\color{#35bf28}+2.21\%$
test_func_call_runtime[False-eager] 0.9864ms 0.8774ms 1.1397 KOps/s 1.1374 KOps/s $\color{#35bf28}+0.21\%$
test_func_call_runtime[False-compile] 1.1451ms 0.9480ms 1.0548 KOps/s 1.0571 KOps/s $\color{#d91a1a}-0.21\%$
test_func_call_runtime[False-compile-overhead] 0.5707ms 0.4964ms 2.0146 KOps/s 1.9974 KOps/s $\color{#35bf28}+0.86\%$
test_func_call_runtime[True-eager] 1.2399ms 1.0804ms 925.6162 Ops/s 919.1248 Ops/s $\color{#35bf28}+0.71\%$
test_func_call_runtime[True-compile] 1.0282ms 0.9552ms 1.0469 KOps/s 1.0457 KOps/s $\color{#35bf28}+0.12\%$
test_func_call_runtime[True-compile-overhead] 0.6116ms 0.5095ms 1.9626 KOps/s 1.9341 KOps/s $\color{#35bf28}+1.48\%$
test_func_call_cm_runtime[False-eager] 0.9599ms 0.8424ms 1.1871 KOps/s 1.1526 KOps/s $\color{#35bf28}+2.99\%$
test_func_call_cm_runtime[False-compile] 1.1576ms 0.9161ms 1.0916 KOps/s 1.0809 KOps/s $\color{#35bf28}+0.99\%$
test_func_call_cm_runtime[False-compile-overhead] 0.5191ms 0.4713ms 2.1219 KOps/s 2.0853 KOps/s $\color{#35bf28}+1.76\%$
test_func_call_cm_runtime[True-eager] 1.3366ms 1.2353ms 809.5423 Ops/s 808.0787 Ops/s $\color{#35bf28}+0.18\%$
test_func_call_cm_runtime[True-compile] 1.0423ms 0.9630ms 1.0385 KOps/s 1.0198 KOps/s $\color{#35bf28}+1.83\%$
test_func_call_cm_runtime[True-compile-overhead] 0.6494ms 0.5163ms 1.9368 KOps/s 1.9077 KOps/s $\color{#35bf28}+1.53\%$
test_vmap_func_call_cm_runtime[eager] 2.9186ms 2.3976ms 417.0896 Ops/s 412.6215 Ops/s $\color{#35bf28}+1.08\%$
test_vmap_func_call_cm_runtime[compile] 1.1403ms 0.9901ms 1.0100 KOps/s 1.0092 KOps/s $\color{#35bf28}+0.08\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.5750ms 0.5202ms 1.9225 KOps/s 1.8714 KOps/s $\color{#35bf28}+2.73\%$
test_distributed 0.6216ms 0.1531ms 6.5310 KOps/s 6.3878 KOps/s $\color{#35bf28}+2.24\%$
test_tdmodule 0.5286ms 28.1339μs 35.5443 KOps/s 35.7473 KOps/s $\color{#d91a1a}-0.57\%$
test_tdmodule_dispatch 75.7740μs 44.5142μs 22.4648 KOps/s 21.1258 KOps/s $\textbf{\color{#35bf28}+6.34\%}$
test_tdseq 47.2520μs 26.4356μs 37.8277 KOps/s 35.7814 KOps/s $\textbf{\color{#35bf28}+5.72\%}$
test_tdseq_dispatch 68.7230μs 47.1289μs 21.2184 KOps/s 20.0126 KOps/s $\textbf{\color{#35bf28}+6.03\%}$
test_instantiation_functorch 2.2022ms 2.0835ms 479.9671 Ops/s 476.3557 Ops/s $\color{#35bf28}+0.76\%$
test_exec_functorch 0.2252ms 0.1806ms 5.5375 KOps/s 5.3918 KOps/s $\color{#35bf28}+2.70\%$
test_exec_functional_call 0.2337ms 0.1614ms 6.1955 KOps/s 5.9701 KOps/s $\color{#35bf28}+3.78\%$
test_exec_td_decorator 0.4606ms 0.2394ms 4.1771 KOps/s 4.0690 KOps/s $\color{#35bf28}+2.66\%$
test_vmap_mlp_speed_decorator[True-True] 1.0373ms 0.8300ms 1.2048 KOps/s 1.1768 KOps/s $\color{#35bf28}+2.38\%$
test_vmap_mlp_speed_decorator[True-False] 1.0097ms 0.8226ms 1.2156 KOps/s 1.1818 KOps/s $\color{#35bf28}+2.87\%$
test_vmap_mlp_speed_decorator[False-True] 0.9824ms 0.7129ms 1.4028 KOps/s 1.3545 KOps/s $\color{#35bf28}+3.56\%$
test_vmap_mlp_speed_decorator[False-False] 0.9489ms 0.7244ms 1.3804 KOps/s 1.3428 KOps/s $\color{#35bf28}+2.80\%$
test_vmap_transformer_speed_decorator[True-True] 21.7486ms 20.7768ms 48.1306 Ops/s 46.9235 Ops/s $\color{#35bf28}+2.57\%$
test_vmap_transformer_speed_decorator[True-False] 21.4125ms 20.6817ms 48.3519 Ops/s 46.7767 Ops/s $\color{#35bf28}+3.37\%$
test_vmap_transformer_speed_decorator[False-True] 20.7225ms 20.5406ms 48.6841 Ops/s 47.3338 Ops/s $\color{#35bf28}+2.85\%$
test_vmap_transformer_speed_decorator[False-False] 20.6878ms 20.5053ms 48.7678 Ops/s 47.4696 Ops/s $\color{#35bf28}+2.73\%$
test_to_module_speed[True] 1.5960ms 1.4720ms 679.3482 Ops/s 678.3769 Ops/s $\color{#35bf28}+0.14\%$
test_to_module_speed[False] 1.5752ms 1.4627ms 683.6769 Ops/s 693.7501 Ops/s $\color{#d91a1a}-1.45\%$
test_tc_init 87.3450μs 45.8763μs 21.7978 KOps/s 22.3988 KOps/s $\color{#d91a1a}-2.68\%$
test_tc_init_tensor_only 40.7020μs 9.7666μs 102.3893 KOps/s 103.6967 KOps/s $\color{#d91a1a}-1.26\%$
test_tc_init_nested 0.1446ms 89.8270μs 11.1325 KOps/s 11.3617 KOps/s $\color{#d91a1a}-2.02\%$
test_tc_init_many_fields 50.1430μs 16.3915μs 61.0071 KOps/s 61.3388 KOps/s $\color{#d91a1a}-0.54\%$
test_tc_first_layer_tensor 25.4210μs 1.8204μs 549.3283 KOps/s 559.9207 KOps/s $\color{#d91a1a}-1.89\%$
test_tc_first_layer_tensor_only 1.8946μs 0.3950μs 2.5313 MOps/s 2.5403 MOps/s $\color{#d91a1a}-0.35\%$
test_tc_first_layer_tensor_set 56.2230μs 3.7476μs 266.8367 KOps/s 255.3973 KOps/s $\color{#35bf28}+4.48\%$
test_tc_first_layer_tensor_only_set 24.5610μs 3.4371μs 290.9401 KOps/s 296.4539 KOps/s $\color{#d91a1a}-1.86\%$
test_tc_first_layer_nontensor 38.3920μs 6.1470μs 162.6813 KOps/s 163.0628 KOps/s $\color{#d91a1a}-0.23\%$
test_tc_second_layer_tensor 21.5110μs 4.3786μs 228.3827 KOps/s 231.8405 KOps/s $\color{#d91a1a}-1.49\%$
test_tc_second_layer_nontensor 27.6710μs 8.6889μs 115.0888 KOps/s 116.2332 KOps/s $\color{#d91a1a}-0.98\%$
test_unbind 0.2716s 14.2049ms 70.3982 Ops/s 68.4909 Ops/s $\color{#35bf28}+2.78\%$
test_full_like 4.8684ms 4.3977ms 227.3910 Ops/s 227.7108 Ops/s $\color{#d91a1a}-0.14\%$
test_zeros_like 4.9023ms 4.3748ms 228.5811 Ops/s 59.9954 Ops/s $\textbf{\color{#35bf28}+281.00\%}$
test_ones_like 4.6147ms 4.3917ms 227.7048 Ops/s 59.8608 Ops/s $\textbf{\color{#35bf28}+280.39\%}$
test_clone 6.8277ms 6.4713ms 154.5293 Ops/s 56.6585 Ops/s $\textbf{\color{#35bf28}+172.74\%}$
test_squeeze 0.1592ms 14.1383μs 70.7297 KOps/s 71.9278 KOps/s $\color{#d91a1a}-1.67\%$
test_unsqueeze 0.2447ms 0.1104ms 9.0607 KOps/s 8.8629 KOps/s $\color{#35bf28}+2.23\%$
test_split 0.3961ms 0.1826ms 5.4761 KOps/s 5.4434 KOps/s $\color{#35bf28}+0.60\%$
test_permute 0.3496ms 0.2108ms 4.7438 KOps/s 4.6625 KOps/s $\color{#35bf28}+1.74\%$
test_stack 51.7023ms 51.3559ms 19.4720 Ops/s 19.4419 Ops/s $\color{#35bf28}+0.15\%$
test_cat 52.0170ms 51.3793ms 19.4631 Ops/s 19.4614 Ops/s $+0.01\%$
test_sequential_tensordict 0.6023ms 0.2249ms 4.4456 KOps/s 4.3367 KOps/s $\color{#35bf28}+2.51\%$
test_sequential_graph_module 0.1660ms 0.1208ms 8.2786 KOps/s 8.0499 KOps/s $\color{#35bf28}+2.84\%$
test_nested_tensordict 0.6413ms 0.3017ms 3.3144 KOps/s 3.3904 KOps/s $\color{#d91a1a}-2.24\%$
test_nested_graph_module 0.2078ms 0.1285ms 7.7800 KOps/s 7.6585 KOps/s $\color{#35bf28}+1.59\%$

@vmoens vmoens force-pushed the td-update-device branch from e74aa4e to 90874fe Compare April 22, 2026 14:46
@vmoens vmoens changed the title [Feature] Per-leaf device/dtype casting via metadata tensordict [Feature] Per-leaf device/dtype casting via an attrs tensordict Apr 22, 2026
@vmoens vmoens force-pushed the td-update-device branch 3 times, most recently from f2f6c68 to 094ac90 Compare April 22, 2026 16:09
Adds `TensorMetaData`, `TensorDictBase.metadata()`, and extends `to()` so a
positional tensordict argument is interpreted as a per-leaf device/dtype spec.
This lets a deviceless tensordict be aligned to the heterogeneous placement
of another tensordict:

    td3 = td0.to(td2.metadata())  # each leaf cast to its counterpart's device

Copies are issued asynchronously by default; a single `_sync_all()` runs at
the end when any leaf actually crossed devices, unless `non_blocking=True` is
passed. `non_blocking_pin` / `num_threads` raise `NotImplementedError` on the
per-leaf path for now.
@vmoens vmoens force-pushed the td-update-device branch from 094ac90 to 72d4c87 Compare April 22, 2026 17:49
@vmoens vmoens merged commit e68c2f6 into main Apr 22, 2026
70 of 73 checks passed
@vmoens vmoens deleted the td-update-device branch April 22, 2026 20:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Compile torch.compile related documentation Improvements or additions to documentation Feature New feature tensorclass Test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant