[Feature] Per-leaf device/dtype casting via an attrs tensordict#1678
Merged
[Feature] Per-leaf device/dtype casting via an attrs tensordict#1678
Conversation
Contributor
|
| Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
|---|---|---|---|---|---|
| test_plain_set_nested | 30.4000μs | 14.5112μs | 68.9124 KOps/s | 67.4541 KOps/s | |
| test_plain_set_stack_nested | 46.1010μs | 14.5986μs | 68.4996 KOps/s | 68.5048 KOps/s | |
| test_plain_set_nested_inplace | 50.9910μs | 16.2171μs | 61.6635 KOps/s | 61.2384 KOps/s | |
| test_plain_set_stack_nested_inplace | 77.2310μs | 15.5032μs | 64.5027 KOps/s | 62.2711 KOps/s | |
| test_items | 22.8610μs | 5.4583μs | 183.2075 KOps/s | 183.8782 KOps/s | |
| test_items_nested | 0.5333ms | 0.4481ms | 2.2318 KOps/s | 2.2301 KOps/s | |
| test_items_nested_locked | 0.5391ms | 0.4538ms | 2.2037 KOps/s | 2.2258 KOps/s | |
| test_items_nested_leaf | 0.1255ms | 92.1751μs | 10.8489 KOps/s | 10.7168 KOps/s | |
| test_items_stack_nested | 0.5512ms | 0.4498ms | 2.2230 KOps/s | 2.2589 KOps/s | |
| test_items_stack_nested_leaf | 0.1324ms | 92.1550μs | 10.8513 KOps/s | 10.7914 KOps/s | |
| test_items_stack_nested_locked | 0.5396ms | 0.4486ms | 2.2292 KOps/s | 2.2327 KOps/s | |
| test_keys | 30.3410μs | 4.1318μs | 242.0240 KOps/s | 243.8970 KOps/s | |
| test_keys_nested | 0.2120ms | 0.1263ms | 7.9158 KOps/s | 7.8350 KOps/s | |
| test_keys_nested_locked | 2.3325ms | 0.1348ms | 7.4202 KOps/s | 7.3012 KOps/s | |
| test_keys_nested_leaf | 0.1534ms | 0.1171ms | 8.5429 KOps/s | 8.4337 KOps/s | |
| test_keys_stack_nested | 0.1771ms | 0.1274ms | 7.8500 KOps/s | 7.8144 KOps/s | |
| test_keys_stack_nested_leaf | 0.1765ms | 0.1170ms | 8.5449 KOps/s | 8.4406 KOps/s | |
| test_keys_stack_nested_locked | 0.2105ms | 0.1348ms | 7.4170 KOps/s | 7.3660 KOps/s | |
| test_values | 6.7542μs | 1.0027μs | 997.2598 KOps/s | 998.7781 KOps/s | |
| test_values_nested | 85.2610μs | 50.9512μs | 19.6266 KOps/s | 19.3964 KOps/s | |
| test_values_nested_locked | 82.2510μs | 53.3267μs | 18.7523 KOps/s | 18.4737 KOps/s | |
| test_values_nested_leaf | 84.6410μs | 57.8152μs | 17.2965 KOps/s | 16.9285 KOps/s | |
| test_values_stack_nested | 83.3110μs | 50.3705μs | 19.8529 KOps/s | 19.5025 KOps/s | |
| test_values_stack_nested_leaf | 95.3720μs | 58.0873μs | 17.2155 KOps/s | 16.9291 KOps/s | |
| test_values_stack_nested_locked | 84.1110μs | 54.1461μs | 18.4686 KOps/s | 18.3939 KOps/s | |
| test_membership | 5.1852μs | 0.7913μs | 1.2637 MOps/s | 1.0819 MOps/s | |
| test_membership_nested | 23.9100μs | 2.6824μs | 372.7996 KOps/s | 368.0186 KOps/s | |
| test_membership_nested_leaf | 15.5800μs | 2.5960μs | 385.2077 KOps/s | 382.7969 KOps/s | |
| test_membership_stacked_nested | 24.6400μs | 2.6631μs | 375.4960 KOps/s | 368.5481 KOps/s | |
| test_membership_stacked_nested_leaf | 25.4710μs | 2.6973μs | 370.7438 KOps/s | 370.9850 KOps/s | |
| test_membership_nested_last | 35.0500μs | 4.0753μs | 245.3779 KOps/s | 244.2707 KOps/s | |
| test_membership_nested_leaf_last | 31.7110μs | 4.0567μs | 246.5059 KOps/s | 243.6923 KOps/s | |
| test_membership_stacked_nested_last | 26.3910μs | 4.1021μs | 243.7796 KOps/s | 244.2593 KOps/s | |
| test_membership_stacked_nested_leaf_last | 38.4610μs | 4.0298μs | 248.1531 KOps/s | 245.9264 KOps/s | |
| test_nested_getleaf | 50.4600μs | 20.4245μs | 48.9607 KOps/s | 48.7673 KOps/s | |
| test_nested_get | 48.5810μs | 19.2582μs | 51.9259 KOps/s | 50.8732 KOps/s | |
| test_stacked_getleaf | 51.4310μs | 20.2222μs | 49.4505 KOps/s | 48.4592 KOps/s | |
| test_stacked_get | 50.6110μs | 19.4233μs | 51.4845 KOps/s | 51.3102 KOps/s | |
| test_nested_getitemleaf | 55.1610μs | 20.9052μs | 47.8350 KOps/s | 47.7028 KOps/s | |
| test_nested_getitem | 50.4000μs | 19.9503μs | 50.1245 KOps/s | 50.4153 KOps/s | |
| test_stacked_getitemleaf | 44.8110μs | 21.2437μs | 47.0727 KOps/s | 47.6541 KOps/s | |
| test_stacked_getitem | 47.4410μs | 20.0763μs | 49.8099 KOps/s | 49.4999 KOps/s | |
| test_lock_nested | 4.6325ms | 0.4644ms | 2.1532 KOps/s | 2.1914 KOps/s | |
| test_lock_stack_nested | 0.5415ms | 0.4645ms | 2.1529 KOps/s | 2.1572 KOps/s | |
| test_unlock_nested | 0.4742ms | 0.3786ms | 2.6415 KOps/s | 2.6650 KOps/s | |
| test_unlock_stack_nested | 0.4620ms | 0.3784ms | 2.6425 KOps/s | 2.6280 KOps/s | |
| test_flatten_speed | 0.1686ms | 0.1140ms | 8.7687 KOps/s | 8.6428 KOps/s | |
| test_unflatten_speed | 0.6387ms | 0.5432ms | 1.8411 KOps/s | 1.8208 KOps/s | |
| test_common_ops | 0.8049ms | 0.6733ms | 1.4852 KOps/s | 1.4675 KOps/s | |
| test_creation | 0.1127ms | 2.9404μs | 340.0892 KOps/s | 339.5890 KOps/s | |
| test_creation_empty | 37.9510μs | 6.5908μs | 151.7257 KOps/s | 150.8327 KOps/s | |
| test_creation_nested_1 | 38.6810μs | 10.9258μs | 91.5269 KOps/s | 90.8728 KOps/s | |
| test_creation_nested_2 | 35.9400μs | 12.6478μs | 79.0652 KOps/s | 78.4327 KOps/s | |
| test_creation_many_keys[10] | 64.5010μs | 19.3088μs | 51.7900 KOps/s | 50.4365 KOps/s | |
| test_creation_many_keys[50] | 0.1411ms | 84.4460μs | 11.8419 KOps/s | 11.8174 KOps/s | |
| test_creation_many_keys[100] | 0.2273ms | 0.1663ms | 6.0143 KOps/s | 6.0275 KOps/s | |
| test_creation_nested_many_keys[10] | 77.1910μs | 42.1310μs | 23.7355 KOps/s | 23.5625 KOps/s | |
| test_creation_nested_many_keys[50] | 0.2398ms | 0.1744ms | 5.7356 KOps/s | 5.8252 KOps/s | |
| test_clone | 38.1500μs | 12.5745μs | 79.5263 KOps/s | 78.1974 KOps/s | |
| test_getitem[int] | 1.6827ms | 14.6516μs | 68.2517 KOps/s | 60.9363 KOps/s | |
| test_getitem[slice_int] | 0.1384ms | 23.5285μs | 42.5016 KOps/s | 41.5304 KOps/s | |
| test_getitem[range] | 0.1873ms | 61.8198μs | 16.1761 KOps/s | 16.3050 KOps/s | |
| test_getitem[tuple] | 0.1417ms | 22.9836μs | 43.5092 KOps/s | 42.6359 KOps/s | |
| test_getitem[list] | 0.1847ms | 55.5228μs | 18.0106 KOps/s | 17.9001 KOps/s | |
| test_setitem_dim[int] | 50.2710μs | 24.1599μs | 41.3909 KOps/s | 40.9322 KOps/s | |
| test_setitem_dim[slice_int] | 63.2110μs | 40.0733μs | 24.9543 KOps/s | 24.0646 KOps/s | |
| test_setitem_dim[range] | 0.1147ms | 90.5079μs | 11.0488 KOps/s | 10.9145 KOps/s | |
| test_setitem_dim[tuple] | 56.9910μs | 37.1483μs | 26.9191 KOps/s | 25.6424 KOps/s | |
| test_setitem | 44.3100μs | 16.9626μs | 58.9532 KOps/s | 57.7920 KOps/s | |
| test_set | 44.3300μs | 16.1415μs | 61.9520 KOps/s | 61.0281 KOps/s | |
| test_set_shared | 0.5440ms | 0.2031ms | 4.9229 KOps/s | 4.9666 KOps/s | |
| test_update | 0.3264ms | 20.7837μs | 48.1146 KOps/s | 46.3705 KOps/s | |
| test_update_nested | 83.4910μs | 31.1896μs | 32.0620 KOps/s | 31.4487 KOps/s | |
| test_update__nested | 0.4451ms | 32.8299μs | 30.4601 KOps/s | 29.9235 KOps/s | |
| test_set_nested | 49.6810μs | 17.7954μs | 56.1944 KOps/s | 53.9971 KOps/s | |
| test_set_nested_new | 55.2810μs | 22.8680μs | 43.7292 KOps/s | 43.4621 KOps/s | |
| test_select | 0.1187ms | 38.6357μs | 25.8828 KOps/s | 25.3004 KOps/s | |
| test_select_nested | 0.1158ms | 69.9454μs | 14.2969 KOps/s | 14.2486 KOps/s | |
| test_exclude_nested | 0.1183ms | 86.9815μs | 11.4967 KOps/s | 11.5211 KOps/s | |
| test_empty[True] | 0.4235ms | 0.3805ms | 2.6282 KOps/s | 2.5875 KOps/s | |
| test_empty[False] | 7.5303μs | 1.2357μs | 809.2525 KOps/s | 805.8969 KOps/s | |
| test_to | 0.1065ms | 74.2178μs | 13.4739 KOps/s | 13.4636 KOps/s | |
| test_to_nonblocking | 98.7420μs | 67.3282μs | 14.8526 KOps/s | 14.5727 KOps/s | |
| test_unbind_speed | 0.3764ms | 0.3222ms | 3.1039 KOps/s | 3.1076 KOps/s | |
| test_unbind_speed_stack0 | 0.3666ms | 0.3176ms | 3.1482 KOps/s | 3.1463 KOps/s | |
| test_unbind_speed_stack1 | 0.1069s | 0.8945ms | 1.1179 KOps/s | 1.2302 KOps/s | |
| test_split | 1.1542ms | 1.0819ms | 924.2730 Ops/s | 821.4981 Ops/s | |
| test_chunk | 0.1071s | 1.1572ms | 864.1354 Ops/s | 951.6262 Ops/s | |
| test_to_cpu_blocking | 19.7954ms | 19.3902ms | 51.5723 Ops/s | 46.3241 Ops/s | |
| test_to_cpu_global_sync | 12.0771ms | 11.7870ms | 84.8394 Ops/s | 86.0982 Ops/s | |
| test_to_cpu_event_sync | 0.1193s | 14.1151ms | 70.8461 Ops/s | 79.3138 Ops/s | |
| test_to_cpu_default | 13.0684ms | 12.7545ms | 78.4039 Ops/s | 79.2042 Ops/s | |
| test_consolidate[False-None] | 4.0883ms | 3.9802ms | 251.2434 Ops/s | 220.5610 Ops/s | |
| test_consolidate[default-None] | 2.0245ms | 1.9406ms | 515.2932 Ops/s | 502.5758 Ops/s | |
| test_consolidate[reduce-overhead-None] | 1.9703ms | 1.8801ms | 531.8810 Ops/s | 519.8041 Ops/s | |
| test_consolidate_njt[False-None] | 8.7094ms | 8.3425ms | 119.8685 Ops/s | 119.3283 Ops/s | |
| test_to[False-False-None] | 2.3899ms | 2.1610ms | 462.7424 Ops/s | 467.1988 Ops/s | |
| test_to[True-False-None] | 1.9775ms | 1.8547ms | 539.1669 Ops/s | 534.2118 Ops/s | |
| test_to[within-False-None] | 6.2678ms | 5.9550ms | 167.9254 Ops/s | 167.5918 Ops/s | |
| test_to[True-default-None] | 0.1879s | 10.7288ms | 93.2073 Ops/s | 105.9481 Ops/s | |
| test_to_njt[False-False-None] | 8.6923ms | 8.3236ms | 120.1405 Ops/s | 119.0733 Ops/s | |
| test_to_njt[True-False-None] | 6.9745ms | 6.7731ms | 147.6432 Ops/s | 146.5535 Ops/s | |
| test_to_njt[within-False-None] | 15.8959ms | 15.0980ms | 66.2339 Ops/s | 65.8759 Ops/s | |
| test_creation[device0] | 0.5256ms | 0.1134ms | 8.8165 KOps/s | 8.9576 KOps/s | |
| test_creation_from_tensor | 0.4501ms | 0.1109ms | 9.0201 KOps/s | 8.9370 KOps/s | |
| test_add_one[memmap_tensor0] | 0.3462ms | 6.5414μs | 152.8720 KOps/s | 155.6245 KOps/s | |
| test_contiguous[memmap_tensor0] | 10.6900μs | 0.5927μs | 1.6873 MOps/s | 2.3925 MOps/s | |
| test_stack[memmap_tensor0] | 68.8010μs | 4.4982μs | 222.3097 KOps/s | 230.0748 KOps/s | |
| test_memmaptd_index | 1.0246ms | 0.2628ms | 3.8049 KOps/s | 3.8880 KOps/s | |
| test_memmaptd_index_astensor | 0.5050ms | 0.3565ms | 2.8053 KOps/s | 2.8303 KOps/s | |
| test_memmaptd_index_op | 0.9956ms | 0.6049ms | 1.6531 KOps/s | 1.6607 KOps/s | |
| test_serialize_model | 0.1365s | 0.1345s | 7.4335 Ops/s | 7.4775 Ops/s | |
| test_serialize_model_pickle | 1.3478s | 1.1927s | 0.8385 Ops/s | 0.8257 Ops/s | |
| test_serialize_weights | 0.1363s | 0.1344s | 7.4411 Ops/s | 6.1504 Ops/s | |
| test_serialize_weights_returnearly | 0.4554s | 86.7181ms | 11.5316 Ops/s | 15.9951 Ops/s | |
| test_serialize_weights_pickle | 1.3514s | 1.2119s | 0.8251 Ops/s | 0.8228 Ops/s | |
| test_reshape_pytree | 0.2045ms | 30.9364μs | 32.3244 KOps/s | 32.4721 KOps/s | |
| test_reshape_td | 82.1320μs | 42.8793μs | 23.3213 KOps/s | 22.9904 KOps/s | |
| test_view_pytree | 0.2144ms | 30.4150μs | 32.8785 KOps/s | 32.5443 KOps/s | |
| test_view_td | 85.7810μs | 50.7659μs | 19.6982 KOps/s | 19.3235 KOps/s | |
| test_unbind_pytree | 0.2261ms | 34.2319μs | 29.2125 KOps/s | 28.8789 KOps/s | |
| test_unbind_td | 0.1776ms | 47.5736μs | 21.0200 KOps/s | 21.1010 KOps/s | |
| test_split_pytree | 0.2501ms | 40.4660μs | 24.7121 KOps/s | 24.9444 KOps/s | |
| test_split_td | 0.2151ms | 61.5954μs | 16.2350 KOps/s | 15.9687 KOps/s | |
| test_add_pytree | 0.2252ms | 39.7204μs | 25.1760 KOps/s | 24.7558 KOps/s | |
| test_add_td | 0.1104ms | 54.0950μs | 18.4860 KOps/s | 18.5304 KOps/s | |
| test_compile_add_one_nested[tensordict-compile] | 0.2370ms | 0.1533ms | 6.5224 KOps/s | 6.0591 KOps/s | |
| test_compile_add_one_nested[tensordict-eager] | 0.3007ms | 0.1981ms | 5.0486 KOps/s | 5.1436 KOps/s | |
| test_compile_add_one_nested[pytree-compile] | 0.1842ms | 0.1225ms | 8.1642 KOps/s | 7.8099 KOps/s | |
| test_compile_add_one_nested[pytree-eager] | 0.4258ms | 0.1763ms | 5.6719 KOps/s | 5.7822 KOps/s | |
| test_compile_copy_nested[tensordict-compile] | 0.3293ms | 16.6724μs | 59.9795 KOps/s | 65.5208 KOps/s | |
| test_compile_copy_nested[tensordict-eager] | 88.0610μs | 51.0293μs | 19.5966 KOps/s | 19.9790 KOps/s | |
| test_compile_copy_nested[pytree-compile] | 0.1229ms | 15.4819μs | 64.5916 KOps/s | 63.9608 KOps/s | |
| test_compile_copy_nested[pytree-eager] | 0.3585ms | 63.3280μs | 15.7908 KOps/s | 15.7972 KOps/s | |
| test_compile_add_one_flat[tensordict-compile] | 0.3317ms | 0.1959ms | 5.1035 KOps/s | 4.8545 KOps/s | |
| test_compile_add_one_flat[tensordict-eager] | 0.3592ms | 0.2719ms | 3.6783 KOps/s | 3.6702 KOps/s | |
| test_compile_add_one_flat[tensorclass-compile] | 0.1755ms | 0.1330ms | 7.5208 KOps/s | 7.2756 KOps/s | |
| test_compile_add_one_flat[tensorclass-eager] | 0.1203ms | 75.1498μs | 13.3068 KOps/s | 13.3816 KOps/s | |
| test_compile_add_one_flat[pytree-compile] | 0.5055ms | 0.1807ms | 5.5339 KOps/s | 5.5515 KOps/s | |
| test_compile_add_one_flat[pytree-eager] | 0.8193ms | 0.5406ms | 1.8497 KOps/s | 1.9326 KOps/s | |
| test_compile_add_self_flat[tensordict-eager] | 0.5422ms | 0.3254ms | 3.0735 KOps/s | 3.0818 KOps/s | |
| test_compile_add_self_flat[tensordict-compile] | 0.3129ms | 0.1948ms | 5.1342 KOps/s | 4.6654 KOps/s | |
| test_compile_add_self_flat[tensorclass-eager] | 0.1340ms | 86.6981μs | 11.5343 KOps/s | 10.9339 KOps/s | |
| test_compile_add_self_flat[tensorclass-compile] | 0.2752ms | 0.1326ms | 7.5440 KOps/s | 6.9025 KOps/s | |
| test_compile_add_self_flat[pytree-eager] | 0.1829s | 0.5216ms | 1.9173 KOps/s | 2.3015 KOps/s | |
| test_compile_add_self_flat[pytree-compile] | 0.2476ms | 0.1747ms | 5.7250 KOps/s | 5.5675 KOps/s | |
| test_compile_copy_flat[tensordict-compile] | 50.2110μs | 18.5152μs | 54.0097 KOps/s | 52.0928 KOps/s | |
| test_compile_copy_flat[tensordict-eager] | 81.1110μs | 40.1300μs | 24.9190 KOps/s | 25.1598 KOps/s | |
| test_compile_copy_flat[pytree-compile] | 0.2224ms | 15.9525μs | 62.6860 KOps/s | 61.0192 KOps/s | |
| test_compile_copy_flat[pytree-eager] | 0.3422ms | 51.7444μs | 19.3258 KOps/s | 19.6201 KOps/s | |
| test_compile_assign_and_add[tensordict-compile] | 1.9879ms | 0.1838ms | 5.4401 KOps/s | 5.0605 KOps/s | |
| test_compile_assign_and_add[tensordict-eager] | 3.6125ms | 3.2692ms | 305.8806 Ops/s | 301.7261 Ops/s | |
| test_compile_assign_and_add[pytree-compile] | 2.0007ms | 0.1718ms | 5.8204 KOps/s | 5.6547 KOps/s | |
| test_compile_assign_and_add[pytree-eager] | 2.9712ms | 2.8291ms | 353.4670 Ops/s | 358.5485 Ops/s | |
| test_compile_indexing[tensor-tensordict-compile] | 0.2363ms | 0.1274ms | 7.8467 KOps/s | 7.3902 KOps/s | |
| test_compile_indexing[tensor-tensordict-eager] | 0.3111ms | 74.5267μs | 13.4180 KOps/s | 13.3187 KOps/s | |
| test_compile_indexing[tensor-tensorclass-compile] | 0.2021ms | 0.1114ms | 8.9741 KOps/s | 8.5329 KOps/s | |
| test_compile_indexing[tensor-tensorclass-eager] | 0.2659ms | 46.6773μs | 21.4237 KOps/s | 21.9843 KOps/s | |
| test_compile_indexing[tensor-pytree-compile] | 0.1908ms | 0.1168ms | 8.5616 KOps/s | 8.8885 KOps/s | |
| test_compile_indexing[tensor-pytree-eager] | 0.2953ms | 46.6260μs | 21.4473 KOps/s | 22.9035 KOps/s | |
| test_compile_indexing[slice-tensordict-compile] | 0.1305ms | 70.1250μs | 14.2602 KOps/s | 14.2311 KOps/s | |
| test_compile_indexing[slice-tensordict-eager] | 0.2159ms | 26.6305μs | 37.5509 KOps/s | 37.3907 KOps/s | |
| test_compile_indexing[slice-tensorclass-compile] | 0.1654ms | 57.6589μs | 17.3434 KOps/s | 17.2774 KOps/s | |
| test_compile_indexing[slice-tensorclass-eager] | 0.2444ms | 21.2437μs | 47.0728 KOps/s | 47.2413 KOps/s | |
| test_compile_indexing[slice-pytree-compile] | 0.1898ms | 56.8451μs | 17.5917 KOps/s | 17.2458 KOps/s | |
| test_compile_indexing[slice-pytree-eager] | 0.2374ms | 21.0320μs | 47.5466 KOps/s | 47.7389 KOps/s | |
| test_compile_indexing[int-tensordict-compile] | 0.1226ms | 71.3029μs | 14.0247 KOps/s | 14.0397 KOps/s | |
| test_compile_indexing[int-tensordict-eager] | 0.2407ms | 26.7172μs | 37.4291 KOps/s | 37.7320 KOps/s | |
| test_compile_indexing[int-tensorclass-compile] | 0.1810ms | 56.0899μs | 17.8285 KOps/s | 17.3817 KOps/s | |
| test_compile_indexing[int-tensorclass-eager] | 0.2608ms | 21.0219μs | 47.5695 KOps/s | 47.6077 KOps/s | |
| test_compile_indexing[int-pytree-compile] | 0.2300ms | 56.3323μs | 17.7518 KOps/s | 16.4449 KOps/s | |
| test_compile_indexing[int-pytree-eager] | 0.2422ms | 20.9077μs | 47.8294 KOps/s | 47.8565 KOps/s | |
| test_compile_replace[single-eager] | 94.7110μs | 47.1859μs | 21.1928 KOps/s | 20.0728 KOps/s | |
| test_compile_replace[single-compile] | 0.2089ms | 0.1181ms | 8.4676 KOps/s | 8.2475 KOps/s | |
| test_compile_replace[multi-eager] | 0.7053ms | 0.5856ms | 1.7077 KOps/s | 1.8121 KOps/s | |
| test_compile_replace[multi-compile] | 0.1734ms | 0.1242ms | 8.0506 KOps/s | 7.8789 KOps/s | |
| test_compile_tc_getattr_20[eager] | 0.2337ms | 0.1814ms | 5.5122 KOps/s | 5.9384 KOps/s | |
| test_compile_tc_getattr_20[compile] | 0.3962ms | 0.1355ms | 7.3799 KOps/s | 7.3732 KOps/s | |
| test_compile_clone_shallow[20-eager] | 49.5010μs | 18.4746μs | 54.1283 KOps/s | 54.2949 KOps/s | |
| test_compile_clone_shallow[20-compile] | 81.5610μs | 17.7636μs | 56.2949 KOps/s | 59.2950 KOps/s | |
| test_compile_clone_shallow[40-eager] | 58.6610μs | 32.1741μs | 31.0809 KOps/s | 30.9690 KOps/s | |
| test_compile_clone_shallow[40-compile] | 73.8010μs | 17.7230μs | 56.4238 KOps/s | 57.2539 KOps/s | |
| test_compile_clone_shallow[80-eager] | 94.6220μs | 60.8818μs | 16.4253 KOps/s | 16.6330 KOps/s | |
| test_compile_clone_shallow[80-compile] | 0.1617ms | 20.9437μs | 47.7471 KOps/s | 48.7331 KOps/s | |
| test_compile_update_inplace[eager] | 0.1898ms | 57.5098μs | 17.3883 KOps/s | 16.8904 KOps/s | |
| test_compile_update_inplace[compile] | 0.4062ms | 0.1483ms | 6.7412 KOps/s | 6.7172 KOps/s | |
| test_mod_add[eager] | 0.1922ms | 48.0719μs | 20.8022 KOps/s | 19.7888 KOps/s | |
| test_mod_add[compile] | 0.6913ms | 0.1199ms | 8.3374 KOps/s | 8.2814 KOps/s | |
| test_mod_add[compile-overhead] | 0.2484ms | 0.1616ms | 6.1891 KOps/s | 5.9931 KOps/s | |
| test_mod_wrap[eager] | 0.4313ms | 0.3025ms | 3.3054 KOps/s | 3.4162 KOps/s | |
| test_mod_wrap[compile] | 0.5278ms | 0.3605ms | 2.7736 KOps/s | 2.6623 KOps/s | |
| test_mod_wrap[compile-overhead] | 9.1856ms | 5.0160ms | 199.3606 Ops/s | 202.8642 Ops/s | |
| test_mod_wrap_and_backward[eager] | 1.6728ms | 1.5054ms | 664.2757 Ops/s | 664.5805 Ops/s | |
| test_mod_wrap_and_backward[compile] | 1.6361ms | 1.5486ms | 645.7457 Ops/s | 686.9211 Ops/s | |
| test_mod_wrap_and_backward[compile-overhead] | 1.4728ms | 1.0072ms | 992.8159 Ops/s | 1.0918 KOps/s | |
| test_seq_add[eager] | 0.2200ms | 0.1537ms | 6.5059 KOps/s | 6.4060 KOps/s | |
| test_seq_add[compile] | 0.6424ms | 0.1265ms | 7.9077 KOps/s | 7.5913 KOps/s | |
| test_seq_add[compile-overhead] | 0.4295ms | 0.1671ms | 5.9848 KOps/s | 5.7515 KOps/s | |
| test_seq_wrap[eager] | 0.5758ms | 0.5082ms | 1.9678 KOps/s | 1.9442 KOps/s | |
| test_seq_wrap[compile] | 0.4441ms | 0.3770ms | 2.6526 KOps/s | 2.5802 KOps/s | |
| test_seq_wrap[compile-overhead] | 0.3732ms | 0.2826ms | 3.5383 KOps/s | 3.5022 KOps/s | |
| test_func_call_runtime[False-eager] | 0.9641ms | 0.8721ms | 1.1467 KOps/s | 1.1376 KOps/s | |
| test_func_call_runtime[False-compile] | 1.0290ms | 0.9431ms | 1.0603 KOps/s | 1.1009 KOps/s | |
| test_func_call_runtime[False-compile-overhead] | 0.5784ms | 0.4713ms | 2.1217 KOps/s | 2.0774 KOps/s | |
| test_func_call_runtime[True-eager] | 1.1289ms | 1.0544ms | 948.3958 Ops/s | 942.6925 Ops/s | |
| test_func_call_runtime[True-compile] | 1.0243ms | 0.9155ms | 1.0923 KOps/s | 1.0617 KOps/s | |
| test_func_call_runtime[True-compile-overhead] | 0.5325ms | 0.4857ms | 2.0590 KOps/s | 2.0301 KOps/s | |
| test_func_call_cm_runtime[False-eager] | 0.9462ms | 0.8613ms | 1.1611 KOps/s | 1.1496 KOps/s | |
| test_func_call_cm_runtime[False-compile] | 0.9746ms | 0.8901ms | 1.1235 KOps/s | 1.0922 KOps/s | |
| test_func_call_cm_runtime[False-compile-overhead] | 0.5203ms | 0.4518ms | 2.2133 KOps/s | 2.1884 KOps/s | |
| test_func_call_cm_runtime[True-eager] | 1.3867ms | 1.2159ms | 822.4596 Ops/s | 813.1497 Ops/s | |
| test_func_call_cm_runtime[True-compile] | 1.1876ms | 0.9657ms | 1.0355 KOps/s | 1.0448 KOps/s | |
| test_func_call_cm_runtime[True-compile-overhead] | 0.5552ms | 0.4960ms | 2.0162 KOps/s | 1.9898 KOps/s | |
| test_vmap_func_call_cm_runtime[eager] | 2.8795ms | 2.3466ms | 426.1408 Ops/s | 417.3437 Ops/s | |
| test_vmap_func_call_cm_runtime[compile] | 1.0844ms | 0.9803ms | 1.0201 KOps/s | 1.0121 KOps/s | |
| test_vmap_func_call_cm_runtime[compile-overhead] | 0.5606ms | 0.5010ms | 1.9962 KOps/s | 1.9853 KOps/s | |
| test_distributed | 0.5675ms | 0.1524ms | 6.5635 KOps/s | 6.6315 KOps/s | |
| test_tdmodule | 0.3561ms | 26.9099μs | 37.1610 KOps/s | 36.8895 KOps/s | |
| test_tdmodule_dispatch | 72.1810μs | 42.7526μs | 23.3904 KOps/s | 22.3448 KOps/s | |
| test_tdseq | 48.9510μs | 25.8041μs | 38.7535 KOps/s | 37.7227 KOps/s | |
| test_tdseq_dispatch | 64.3510μs | 45.5787μs | 21.9401 KOps/s | 21.2988 KOps/s | |
| test_instantiation_functorch | 2.0693ms | 1.9651ms | 508.8906 Ops/s | 501.8981 Ops/s | |
| test_exec_functorch | 0.2285ms | 0.1712ms | 5.8427 KOps/s | 5.8163 KOps/s | |
| test_exec_functional_call | 0.2159ms | 0.1524ms | 6.5635 KOps/s | 6.3698 KOps/s | |
| test_exec_td_decorator | 0.4531ms | 0.2275ms | 4.3959 KOps/s | 4.4047 KOps/s | |
| test_vmap_mlp_speed_decorator[True-True] | 1.0059ms | 0.8190ms | 1.2210 KOps/s | 1.1965 KOps/s | |
| test_vmap_mlp_speed_decorator[True-False] | 1.0387ms | 0.8333ms | 1.2001 KOps/s | 1.2187 KOps/s | |
| test_vmap_mlp_speed_decorator[False-True] | 0.9104ms | 0.7054ms | 1.4177 KOps/s | 1.4268 KOps/s | |
| test_vmap_mlp_speed_decorator[False-False] | 0.8855ms | 0.6982ms | 1.4322 KOps/s | 1.4302 KOps/s | |
| test_vmap_transformer_speed_decorator[True-True] | 21.1271ms | 20.5350ms | 48.6973 Ops/s | 48.7562 Ops/s | |
| test_vmap_transformer_speed_decorator[True-False] | 21.5278ms | 20.6181ms | 48.5010 Ops/s | 48.9811 Ops/s | |
| test_vmap_transformer_speed_decorator[False-True] | 21.5111ms | 20.5105ms | 48.7555 Ops/s | 49.5219 Ops/s | |
| test_vmap_transformer_speed_decorator[False-False] | 20.9965ms | 20.5789ms | 48.5935 Ops/s | 49.4009 Ops/s | |
| test_to_module_speed[True] | 1.4989ms | 1.4093ms | 709.5899 Ops/s | 709.7196 Ops/s | |
| test_to_module_speed[False] | 1.9523ms | 1.3799ms | 724.7085 Ops/s | 718.4850 Ops/s | |
| test_tc_init | 80.8910μs | 43.1361μs | 23.1824 KOps/s | 23.1456 KOps/s | |
| test_tc_init_tensor_only | 36.7710μs | 9.0420μs | 110.5952 KOps/s | 108.1161 KOps/s | |
| test_tc_init_nested | 0.1758ms | 84.8680μs | 11.7830 KOps/s | 11.7178 KOps/s | |
| test_tc_init_many_fields | 40.6910μs | 15.5145μs | 64.4559 KOps/s | 64.3098 KOps/s | |
| test_tc_first_layer_tensor | 28.9210μs | 1.7061μs | 586.1357 KOps/s | 594.6621 KOps/s | |
| test_tc_first_layer_tensor_only | 3.0076μs | 0.3838μs | 2.6059 MOps/s | 2.6328 MOps/s | |
| test_tc_first_layer_tensor_set | 29.1410μs | 3.7036μs | 270.0093 KOps/s | 274.9580 KOps/s | |
| test_tc_first_layer_tensor_only_set | 32.6900μs | 3.3514μs | 298.3795 KOps/s | 329.1446 KOps/s | |
| test_tc_first_layer_nontensor | 45.6200μs | 5.6879μs | 175.8122 KOps/s | 173.7308 KOps/s | |
| test_tc_second_layer_tensor | 48.0300μs | 4.1205μs | 242.6868 KOps/s | 240.8788 KOps/s | |
| test_tc_second_layer_nontensor | 36.2400μs | 8.1930μs | 122.0559 KOps/s | 123.1175 KOps/s | |
| test_unbind | 0.2840s | 16.0626ms | 62.2565 Ops/s | 71.8551 Ops/s | |
| test_full_like | 13.5915ms | 4.4127ms | 226.6183 Ops/s | 226.8728 Ops/s | |
| test_zeros_like | 4.9395ms | 4.3468ms | 230.0526 Ops/s | 228.8276 Ops/s | |
| test_ones_like | 4.4663ms | 4.2171ms | 237.1315 Ops/s | 236.1548 Ops/s | |
| test_clone | 6.8025ms | 6.4423ms | 155.2244 Ops/s | 155.6817 Ops/s | |
| test_squeeze | 64.7110μs | 13.4812μs | 74.1775 KOps/s | 73.6810 KOps/s | |
| test_unsqueeze | 0.2773ms | 0.1082ms | 9.2430 KOps/s | 8.7936 KOps/s | |
| test_split | 0.2299ms | 0.1741ms | 5.7445 KOps/s | 5.5012 KOps/s | |
| test_permute | 0.2513ms | 0.2067ms | 4.8381 KOps/s | 4.8115 KOps/s | |
| test_stack | 51.4225ms | 51.0774ms | 19.5781 Ops/s | 19.8420 Ops/s | |
| test_cat | 51.2335ms | 50.6720ms | 19.7348 Ops/s | 19.5576 Ops/s | |
| test_sequential_tensordict | 0.6085ms | 0.2068ms | 4.8365 KOps/s | 4.6101 KOps/s | |
| test_sequential_graph_module | 0.1643ms | 0.1127ms | 8.8756 KOps/s | 8.4968 KOps/s | |
| test_nested_tensordict | 0.6730ms | 0.2675ms | 3.7379 KOps/s | 3.5170 KOps/s | |
| test_nested_graph_module | 0.1803ms | 0.1266ms | 7.8963 KOps/s | 8.0389 KOps/s |
Contributor
|
| Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
|---|---|---|---|---|---|
| test_plain_set_nested | 28.6620μs | 15.0079μs | 66.6314 KOps/s | 67.2457 KOps/s | |
| test_plain_set_stack_nested | 43.7620μs | 15.3491μs | 65.1506 KOps/s | 66.4815 KOps/s | |
| test_plain_set_nested_inplace | 41.0520μs | 16.8772μs | 59.2517 KOps/s | 59.4970 KOps/s | |
| test_plain_set_stack_nested_inplace | 45.6720μs | 16.7925μs | 59.5505 KOps/s | 60.0916 KOps/s | |
| test_items | 39.8620μs | 6.0160μs | 166.2232 KOps/s | 165.3600 KOps/s | |
| test_items_nested | 0.5653ms | 0.4689ms | 2.1324 KOps/s | 2.1395 KOps/s | |
| test_items_nested_locked | 0.5372ms | 0.4692ms | 2.1314 KOps/s | 2.1281 KOps/s | |
| test_items_nested_leaf | 0.2127ms | 97.5954μs | 10.2464 KOps/s | 10.1443 KOps/s | |
| test_items_stack_nested | 0.6012ms | 0.4656ms | 2.1479 KOps/s | 2.1448 KOps/s | |
| test_items_stack_nested_leaf | 0.1439ms | 97.7263μs | 10.2327 KOps/s | 10.1389 KOps/s | |
| test_items_stack_nested_locked | 0.5530ms | 0.4690ms | 2.1322 KOps/s | 2.1319 KOps/s | |
| test_keys | 28.5420μs | 4.2292μs | 236.4530 KOps/s | 236.7801 KOps/s | |
| test_keys_nested | 0.1696ms | 0.1290ms | 7.7490 KOps/s | 7.6521 KOps/s | |
| test_keys_nested_locked | 2.2435ms | 0.1378ms | 7.2574 KOps/s | 7.1118 KOps/s | |
| test_keys_nested_leaf | 0.1654ms | 0.1203ms | 8.3125 KOps/s | 8.2179 KOps/s | |
| test_keys_stack_nested | 0.1675ms | 0.1301ms | 7.6840 KOps/s | 7.6564 KOps/s | |
| test_keys_stack_nested_leaf | 0.1526ms | 0.1206ms | 8.2908 KOps/s | 8.2313 KOps/s | |
| test_keys_stack_nested_locked | 0.1797ms | 0.1386ms | 7.2164 KOps/s | 7.1324 KOps/s | |
| test_values | 6.2664μs | 1.0097μs | 990.4105 KOps/s | 980.1075 KOps/s | |
| test_values_nested | 0.1090ms | 52.2247μs | 19.1480 KOps/s | 18.9226 KOps/s | |
| test_values_nested_locked | 81.0640μs | 55.4270μs | 18.0417 KOps/s | 17.6459 KOps/s | |
| test_values_nested_leaf | 89.3850μs | 59.5935μs | 16.7804 KOps/s | 16.5065 KOps/s | |
| test_values_stack_nested | 87.2940μs | 52.4257μs | 19.0746 KOps/s | 18.7635 KOps/s | |
| test_values_stack_nested_leaf | 0.1103ms | 59.7417μs | 16.7387 KOps/s | 16.4649 KOps/s | |
| test_values_stack_nested_locked | 0.1438ms | 54.6926μs | 18.2840 KOps/s | 17.6462 KOps/s | |
| test_membership | 4.5302μs | 0.8424μs | 1.1871 MOps/s | 1.1823 MOps/s | |
| test_membership_nested | 93.5050μs | 2.8762μs | 347.6855 KOps/s | 348.2645 KOps/s | |
| test_membership_nested_leaf | 41.5520μs | 2.9160μs | 342.9354 KOps/s | 351.1082 KOps/s | |
| test_membership_stacked_nested | 23.4920μs | 2.9277μs | 341.5688 KOps/s | 353.4211 KOps/s | |
| test_membership_stacked_nested_leaf | 57.9240μs | 2.9171μs | 342.8015 KOps/s | 352.7103 KOps/s | |
| test_membership_nested_last | 23.6410μs | 4.3352μs | 230.6698 KOps/s | 231.6368 KOps/s | |
| test_membership_nested_leaf_last | 65.0730μs | 4.3214μs | 231.4077 KOps/s | 231.5529 KOps/s | |
| test_membership_stacked_nested_last | 30.6620μs | 4.3736μs | 228.6466 KOps/s | 229.4449 KOps/s | |
| test_membership_stacked_nested_leaf_last | 52.6730μs | 4.3456μs | 230.1170 KOps/s | 231.3949 KOps/s | |
| test_nested_getleaf | 63.4530μs | 21.8087μs | 45.8533 KOps/s | 47.3263 KOps/s | |
| test_nested_get | 61.8340μs | 20.5782μs | 48.5952 KOps/s | 49.5123 KOps/s | |
| test_stacked_getleaf | 53.7830μs | 21.5482μs | 46.4075 KOps/s | 46.8159 KOps/s | |
| test_stacked_get | 95.2950μs | 20.5283μs | 48.7132 KOps/s | 49.5268 KOps/s | |
| test_nested_getitemleaf | 68.0330μs | 22.2213μs | 45.0019 KOps/s | 46.1816 KOps/s | |
| test_nested_getitem | 56.6730μs | 21.1663μs | 47.2449 KOps/s | 48.5145 KOps/s | |
| test_stacked_getitemleaf | 52.4830μs | 22.0017μs | 45.4511 KOps/s | 45.6281 KOps/s | |
| test_stacked_getitem | 60.1030μs | 21.2810μs | 46.9902 KOps/s | 47.9602 KOps/s | |
| test_lock_nested | 4.7036ms | 0.4822ms | 2.0737 KOps/s | 2.0800 KOps/s | |
| test_lock_stack_nested | 0.5642ms | 0.4848ms | 2.0627 KOps/s | 2.0455 KOps/s | |
| test_unlock_nested | 0.5111ms | 0.3937ms | 2.5397 KOps/s | 2.5522 KOps/s | |
| test_unlock_stack_nested | 0.4345ms | 0.3923ms | 2.5490 KOps/s | 2.5085 KOps/s | |
| test_flatten_speed | 0.1679ms | 0.1227ms | 8.1499 KOps/s | 8.2686 KOps/s | |
| test_unflatten_speed | 0.6487ms | 0.5723ms | 1.7473 KOps/s | 1.7533 KOps/s | |
| test_common_ops | 0.8470ms | 0.6990ms | 1.4307 KOps/s | 1.4123 KOps/s | |
| test_creation | 0.1170ms | 3.1382μs | 318.6552 KOps/s | 316.5544 KOps/s | |
| test_creation_empty | 38.1820μs | 6.9770μs | 143.3273 KOps/s | 142.6226 KOps/s | |
| test_creation_nested_1 | 43.9630μs | 11.5680μs | 86.4455 KOps/s | 86.1063 KOps/s | |
| test_creation_nested_2 | 41.4630μs | 13.3702μs | 74.7934 KOps/s | 74.6531 KOps/s | |
| test_creation_many_keys[10] | 89.8550μs | 21.2111μs | 47.1452 KOps/s | 47.4183 KOps/s | |
| test_creation_many_keys[50] | 0.1563ms | 91.3589μs | 10.9458 KOps/s | 10.9599 KOps/s | |
| test_creation_many_keys[100] | 0.2313ms | 0.1795ms | 5.5724 KOps/s | 5.6095 KOps/s | |
| test_creation_nested_many_keys[10] | 95.6750μs | 45.5023μs | 21.9769 KOps/s | 22.0846 KOps/s | |
| test_creation_nested_many_keys[50] | 0.2342ms | 0.1866ms | 5.3595 KOps/s | 5.3886 KOps/s | |
| test_clone | 45.7830μs | 13.3607μs | 74.8465 KOps/s | 73.8179 KOps/s | |
| test_getitem[int] | 1.5124ms | 15.6594μs | 63.8594 KOps/s | 58.6511 KOps/s | |
| test_getitem[slice_int] | 0.1406ms | 25.3419μs | 39.4604 KOps/s | 39.8257 KOps/s | |
| test_getitem[range] | 0.1805ms | 63.2595μs | 15.8079 KOps/s | 15.4643 KOps/s | |
| test_getitem[tuple] | 0.1444ms | 24.6653μs | 40.5428 KOps/s | 40.9876 KOps/s | |
| test_getitem[list] | 0.1823ms | 58.1102μs | 17.2087 KOps/s | 16.9342 KOps/s | |
| test_setitem_dim[int] | 66.4540μs | 25.6563μs | 38.9768 KOps/s | 38.0770 KOps/s | |
| test_setitem_dim[slice_int] | 63.4430μs | 42.4423μs | 23.5614 KOps/s | 22.9330 KOps/s | |
| test_setitem_dim[range] | 0.1211ms | 95.2883μs | 10.4945 KOps/s | 10.3925 KOps/s | |
| test_setitem_dim[tuple] | 59.9230μs | 39.3041μs | 25.4426 KOps/s | 24.3676 KOps/s | |
| test_setitem | 73.5640μs | 17.8743μs | 55.9462 KOps/s | 54.5760 KOps/s | |
| test_set | 90.4350μs | 17.1295μs | 58.3788 KOps/s | 56.9805 KOps/s | |
| test_set_shared | 0.5106ms | 0.2023ms | 4.9426 KOps/s | 4.9123 KOps/s | |
| test_update | 0.2101ms | 22.2393μs | 44.9655 KOps/s | 44.6049 KOps/s | |
| test_update_nested | 0.2160ms | 33.2463μs | 30.0785 KOps/s | 29.1100 KOps/s | |
| test_update__nested | 0.4440ms | 34.4980μs | 28.9872 KOps/s | 28.1590 KOps/s | |
| test_set_nested | 80.9150μs | 18.9800μs | 52.6871 KOps/s | 51.5156 KOps/s | |
| test_set_nested_new | 63.5340μs | 24.0261μs | 41.6215 KOps/s | 40.7438 KOps/s | |
| test_select | 88.7150μs | 39.8775μs | 25.0768 KOps/s | 24.6134 KOps/s | |
| test_select_nested | 0.1412ms | 74.5650μs | 13.4111 KOps/s | 13.5488 KOps/s | |
| test_exclude_nested | 0.1375ms | 90.7191μs | 11.0230 KOps/s | 10.9387 KOps/s | |
| test_empty[True] | 0.4841ms | 0.4007ms | 2.4959 KOps/s | 2.5055 KOps/s | |
| test_empty[False] | 10.3405μs | 1.3115μs | 762.4760 KOps/s | 764.1446 KOps/s | |
| test_to | 0.1106ms | 76.4915μs | 13.0733 KOps/s | 13.5497 KOps/s | |
| test_to_nonblocking | 0.1228ms | 68.8130μs | 14.5321 KOps/s | 14.9935 KOps/s | |
| test_unbind_speed | 0.3700ms | 0.3396ms | 2.9445 KOps/s | 2.9597 KOps/s | |
| test_unbind_speed_stack0 | 0.3942ms | 0.3347ms | 2.9875 KOps/s | 3.0026 KOps/s | |
| test_unbind_speed_stack1 | 0.1066s | 0.9261ms | 1.0798 KOps/s | 1.1735 KOps/s | |
| test_split | 1.2216ms | 1.1445ms | 873.7703 Ops/s | 788.0144 Ops/s | |
| test_chunk | 0.1067s | 1.2164ms | 822.0704 Ops/s | 925.7265 Ops/s | |
| test_to_cpu_blocking | 19.8704ms | 19.6994ms | 50.7630 Ops/s | 44.8447 Ops/s | |
| test_to_cpu_global_sync | 11.9033ms | 11.7327ms | 85.2320 Ops/s | 83.2873 Ops/s | |
| test_to_cpu_event_sync | 13.0166ms | 12.6895ms | 78.8056 Ops/s | 77.0840 Ops/s | |
| test_to_cpu_default | 0.1190s | 14.0679ms | 71.0840 Ops/s | 76.9804 Ops/s | |
| test_consolidate[False-None] | 4.2714ms | 4.1304ms | 242.1047 Ops/s | 237.8536 Ops/s | |
| test_consolidate[default-None] | 2.4932ms | 2.0598ms | 485.4735 Ops/s | 468.3001 Ops/s | |
| test_consolidate[reduce-overhead-None] | 2.1575ms | 1.9837ms | 504.0987 Ops/s | 490.9068 Ops/s | |
| test_consolidate_njt[False-None] | 8.7302ms | 8.4938ms | 117.7333 Ops/s | 115.5250 Ops/s | |
| test_to[False-False-None] | 2.3235ms | 2.1638ms | 462.1505 Ops/s | 453.5922 Ops/s | |
| test_to[True-False-None] | 2.1783ms | 1.9107ms | 523.3589 Ops/s | 515.9296 Ops/s | |
| test_to[within-False-None] | 6.3092ms | 6.1165ms | 163.4913 Ops/s | 159.0473 Ops/s | |
| test_to[True-default-None] | 9.9023ms | 9.2415ms | 108.2074 Ops/s | 104.8987 Ops/s | |
| test_to_njt[False-False-None] | 9.0763ms | 8.5713ms | 116.6681 Ops/s | 114.0979 Ops/s | |
| test_to_njt[True-False-None] | 7.5678ms | 7.0084ms | 142.6855 Ops/s | 140.9423 Ops/s | |
| test_to_njt[within-False-None] | 15.8535ms | 15.5817ms | 64.1778 Ops/s | 62.7229 Ops/s | |
| test_creation[device0] | 0.3968ms | 0.1144ms | 8.7411 KOps/s | 8.4379 KOps/s | |
| test_creation_from_tensor | 0.5486ms | 0.1126ms | 8.8826 KOps/s | 8.5570 KOps/s | |
| test_add_one[memmap_tensor0] | 0.2212ms | 6.6069μs | 151.3560 KOps/s | 145.3775 KOps/s | |
| test_contiguous[memmap_tensor0] | 13.1310μs | 0.6470μs | 1.5457 MOps/s | 2.2425 MOps/s | |
| test_stack[memmap_tensor0] | 0.1358ms | 4.5939μs | 217.6821 KOps/s | 217.7411 KOps/s | |
| test_memmaptd_index | 1.0723ms | 0.2743ms | 3.6460 KOps/s | 3.6765 KOps/s | |
| test_memmaptd_index_astensor | 0.5351ms | 0.3736ms | 2.6769 KOps/s | 2.6607 KOps/s | |
| test_memmaptd_index_op | 0.8031ms | 0.6279ms | 1.5927 KOps/s | 1.5615 KOps/s | |
| test_serialize_model | 0.1374s | 0.1357s | 7.3681 Ops/s | 7.3457 Ops/s | |
| test_serialize_model_pickle | 1.3676s | 1.2132s | 0.8243 Ops/s | 0.8386 Ops/s | |
| test_serialize_weights | 0.1362s | 0.1332s | 7.5049 Ops/s | 7.4691 Ops/s | |
| test_serialize_weights_returnearly | 0.4502s | 88.2853ms | 11.3269 Ops/s | 15.7536 Ops/s | |
| test_serialize_weights_pickle | 1.3790s | 1.1896s | 0.8406 Ops/s | 0.8233 Ops/s | |
| test_reshape_pytree | 0.2032ms | 32.3455μs | 30.9162 KOps/s | 30.7388 KOps/s | |
| test_reshape_td | 88.8140μs | 46.9619μs | 21.2938 KOps/s | 22.1930 KOps/s | |
| test_view_pytree | 0.2150ms | 31.8311μs | 31.4158 KOps/s | 31.4261 KOps/s | |
| test_view_td | 98.1560μs | 53.9334μs | 18.5414 KOps/s | 18.6262 KOps/s | |
| test_unbind_pytree | 0.2351ms | 36.3402μs | 27.5177 KOps/s | 27.6095 KOps/s | |
| test_unbind_td | 0.1094ms | 49.8098μs | 20.0764 KOps/s | 19.8057 KOps/s | |
| test_split_pytree | 0.1958ms | 42.0649μs | 23.7728 KOps/s | 23.8420 KOps/s | |
| test_split_td | 0.1470ms | 64.4886μs | 15.5066 KOps/s | 15.3002 KOps/s | |
| test_add_pytree | 0.2319ms | 41.9933μs | 23.8133 KOps/s | 24.1340 KOps/s | |
| test_add_td | 0.1018ms | 57.4730μs | 17.3995 KOps/s | 17.5071 KOps/s | |
| test_compile_add_one_nested[tensordict-compile] | 0.2502ms | 0.1609ms | 6.2157 KOps/s | 5.7277 KOps/s | |
| test_compile_add_one_nested[tensordict-eager] | 0.2806ms | 0.2032ms | 4.9218 KOps/s | 5.0099 KOps/s | |
| test_compile_add_one_nested[pytree-compile] | 0.2417ms | 0.1278ms | 7.8252 KOps/s | 7.5428 KOps/s | |
| test_compile_add_one_nested[pytree-eager] | 0.4330ms | 0.1805ms | 5.5392 KOps/s | 5.4799 KOps/s | |
| test_compile_copy_nested[tensordict-compile] | 0.2383ms | 22.0998μs | 45.2492 KOps/s | 61.7339 KOps/s | |
| test_compile_copy_nested[tensordict-eager] | 0.1023ms | 54.3595μs | 18.3961 KOps/s | 18.3401 KOps/s | |
| test_compile_copy_nested[pytree-compile] | 0.1719ms | 16.1412μs | 61.9531 KOps/s | 61.2389 KOps/s | |
| test_compile_copy_nested[pytree-eager] | 0.3717ms | 68.2317μs | 14.6559 KOps/s | 14.8233 KOps/s | |
| test_compile_add_one_flat[tensordict-compile] | 0.2917ms | 0.2007ms | 4.9817 KOps/s | 4.8413 KOps/s | |
| test_compile_add_one_flat[tensordict-eager] | 0.3926ms | 0.2767ms | 3.6145 KOps/s | 3.5877 KOps/s | |
| test_compile_add_one_flat[tensorclass-compile] | 0.5747ms | 0.1359ms | 7.3609 KOps/s | 7.0332 KOps/s | |
| test_compile_add_one_flat[tensorclass-eager] | 0.4977ms | 73.9351μs | 13.5254 KOps/s | 13.3774 KOps/s | |
| test_compile_add_one_flat[pytree-compile] | 0.2302ms | 0.1800ms | 5.5541 KOps/s | 5.4285 KOps/s | |
| test_compile_add_one_flat[pytree-eager] | 0.8348ms | 0.5352ms | 1.8685 KOps/s | 1.8666 KOps/s | |
| test_compile_add_self_flat[tensordict-eager] | 0.4539ms | 0.3300ms | 3.0302 KOps/s | 3.0144 KOps/s | |
| test_compile_add_self_flat[tensordict-compile] | 0.2667ms | 0.1992ms | 5.0203 KOps/s | 2.9971 KOps/s | |
| test_compile_add_self_flat[tensorclass-eager] | 0.1435ms | 89.9344μs | 11.1192 KOps/s | 11.0288 KOps/s | |
| test_compile_add_self_flat[tensorclass-compile] | 0.2048ms | 0.1381ms | 7.2436 KOps/s | 6.7764 KOps/s | |
| test_compile_add_self_flat[pytree-eager] | 0.6429ms | 0.4374ms | 2.2860 KOps/s | 2.2357 KOps/s | |
| test_compile_add_self_flat[pytree-compile] | 0.5227ms | 0.1804ms | 5.5433 KOps/s | 5.4090 KOps/s | |
| test_compile_copy_flat[tensordict-compile] | 0.1227ms | 19.6328μs | 50.9351 KOps/s | 52.2070 KOps/s | |
| test_compile_copy_flat[tensordict-eager] | 76.1040μs | 41.8574μs | 23.8907 KOps/s | 24.6092 KOps/s | |
| test_compile_copy_flat[pytree-compile] | 94.0450μs | 16.7314μs | 59.7678 KOps/s | 59.4184 KOps/s | |
| test_compile_copy_flat[pytree-eager] | 0.3537ms | 53.3347μs | 18.7495 KOps/s | 18.9840 KOps/s | |
| test_compile_assign_and_add[tensordict-compile] | 2.1307ms | 0.1905ms | 5.2482 KOps/s | 4.8278 KOps/s | |
| test_compile_assign_and_add[tensordict-eager] | 3.6233ms | 3.4288ms | 291.6461 Ops/s | 293.1837 Ops/s | |
| test_compile_assign_and_add[pytree-compile] | 2.0896ms | 0.1786ms | 5.6002 KOps/s | 5.4263 KOps/s | |
| test_compile_assign_and_add[pytree-eager] | 2.9890ms | 2.8426ms | 351.7910 Ops/s | 349.3808 Ops/s | |
| test_compile_indexing[tensor-tensordict-compile] | 0.1907ms | 0.1266ms | 7.8972 KOps/s | 7.5163 KOps/s | |
| test_compile_indexing[tensor-tensordict-eager] | 0.2912ms | 75.7462μs | 13.2020 KOps/s | 13.4288 KOps/s | |
| test_compile_indexing[tensor-tensorclass-compile] | 0.2390ms | 0.1144ms | 8.7380 KOps/s | 8.5780 KOps/s | |
| test_compile_indexing[tensor-tensorclass-eager] | 0.2514ms | 46.2413μs | 21.6257 KOps/s | 22.2741 KOps/s | |
| test_compile_indexing[tensor-pytree-compile] | 0.1644ms | 0.1150ms | 8.6974 KOps/s | 8.3198 KOps/s | |
| test_compile_indexing[tensor-pytree-eager] | 0.2697ms | 47.3221μs | 21.1318 KOps/s | 22.3780 KOps/s | |
| test_compile_indexing[slice-tensordict-compile] | 0.1524ms | 69.8571μs | 14.3149 KOps/s | 13.6554 KOps/s | |
| test_compile_indexing[slice-tensordict-eager] | 0.2011ms | 29.3589μs | 34.0612 KOps/s | 34.9385 KOps/s | |
| test_compile_indexing[slice-tensorclass-compile] | 99.2960μs | 59.2702μs | 16.8719 KOps/s | 16.7119 KOps/s | |
| test_compile_indexing[slice-tensorclass-eager] | 0.2454ms | 22.5140μs | 44.4168 KOps/s | 43.9578 KOps/s | |
| test_compile_indexing[slice-pytree-compile] | 0.1032ms | 62.5002μs | 15.9999 KOps/s | 16.5026 KOps/s | |
| test_compile_indexing[slice-pytree-eager] | 0.3107ms | 22.4263μs | 44.5905 KOps/s | 44.4053 KOps/s | |
| test_compile_indexing[int-tensordict-compile] | 0.1140ms | 71.1889μs | 14.0471 KOps/s | 13.1944 KOps/s | |
| test_compile_indexing[int-tensordict-eager] | 0.2520ms | 27.9739μs | 35.7476 KOps/s | 34.7374 KOps/s | |
| test_compile_indexing[int-tensorclass-compile] | 0.1330ms | 58.4026μs | 17.1225 KOps/s | 16.6371 KOps/s | |
| test_compile_indexing[int-tensorclass-eager] | 0.2540ms | 22.5184μs | 44.4081 KOps/s | 44.8660 KOps/s | |
| test_compile_indexing[int-pytree-compile] | 0.1122ms | 61.8639μs | 16.1645 KOps/s | 16.5932 KOps/s | |
| test_compile_indexing[int-pytree-eager] | 0.2388ms | 22.3679μs | 44.7070 KOps/s | 44.4272 KOps/s | |
| test_compile_replace[single-eager] | 0.1084ms | 48.7066μs | 20.5311 KOps/s | 20.8137 KOps/s | |
| test_compile_replace[single-compile] | 0.2218ms | 0.1227ms | 8.1492 KOps/s | 7.9066 KOps/s | |
| test_compile_replace[multi-eager] | 0.6775ms | 0.5704ms | 1.7531 KOps/s | 1.7874 KOps/s | |
| test_compile_replace[multi-compile] | 0.2672ms | 0.1291ms | 7.7454 KOps/s | 7.5586 KOps/s | |
| test_compile_tc_getattr_20[eager] | 0.2573ms | 0.1725ms | 5.7987 KOps/s | 5.8919 KOps/s | |
| test_compile_tc_getattr_20[compile] | 0.2944ms | 0.1380ms | 7.2439 KOps/s | 7.1428 KOps/s | |
| test_compile_clone_shallow[20-eager] | 52.8830μs | 19.4867μs | 51.3171 KOps/s | 51.5268 KOps/s | |
| test_compile_clone_shallow[20-compile] | 0.1029ms | 17.7586μs | 56.3108 KOps/s | 49.3903 KOps/s | |
| test_compile_clone_shallow[40-eager] | 64.2630μs | 34.3949μs | 29.0741 KOps/s | 29.6729 KOps/s | |
| test_compile_clone_shallow[40-compile] | 92.7950μs | 18.4560μs | 54.1828 KOps/s | 53.2331 KOps/s | |
| test_compile_clone_shallow[80-eager] | 0.1032ms | 63.4669μs | 15.7563 KOps/s | 15.7657 KOps/s | |
| test_compile_clone_shallow[80-compile] | 57.7330μs | 20.8229μs | 48.0239 KOps/s | 46.1219 KOps/s | |
| test_compile_update_inplace[eager] | 0.1248ms | 58.8896μs | 16.9809 KOps/s | 16.6198 KOps/s | |
| test_compile_update_inplace[compile] | 0.2337ms | 0.1521ms | 6.5728 KOps/s | 6.4109 KOps/s | |
| test_mod_add[eager] | 0.1382ms | 49.1709μs | 20.3372 KOps/s | 20.4579 KOps/s | |
| test_mod_add[compile] | 0.1608ms | 0.1190ms | 8.4014 KOps/s | 8.3534 KOps/s | |
| test_mod_add[compile-overhead] | 0.2537ms | 0.1672ms | 5.9802 KOps/s | 5.8106 KOps/s | |
| test_mod_wrap[eager] | 0.7558ms | 0.2987ms | 3.3474 KOps/s | 3.3344 KOps/s | |
| test_mod_wrap[compile] | 0.4847ms | 0.3799ms | 2.6324 KOps/s | 2.5490 KOps/s | |
| test_mod_wrap[compile-overhead] | 9.0248ms | 4.9392ms | 202.4628 Ops/s | 200.6953 Ops/s | |
| test_mod_wrap_and_backward[eager] | 1.6916ms | 1.5108ms | 661.8867 Ops/s | 644.0789 Ops/s | |
| test_mod_wrap_and_backward[compile] | 1.7930ms | 1.4830ms | 674.3053 Ops/s | 671.6148 Ops/s | |
| test_mod_wrap_and_backward[compile-overhead] | 1.3155ms | 0.9190ms | 1.0882 KOps/s | 958.5758 Ops/s | |
| test_seq_add[eager] | 0.2321ms | 0.1537ms | 6.5051 KOps/s | 6.3818 KOps/s | |
| test_seq_add[compile] | 0.2768ms | 0.1284ms | 7.7909 KOps/s | 7.4154 KOps/s | |
| test_seq_add[compile-overhead] | 0.3254ms | 0.1764ms | 5.6703 KOps/s | 5.6234 KOps/s | |
| test_seq_wrap[eager] | 0.7239ms | 0.5490ms | 1.8215 KOps/s | 1.9046 KOps/s | |
| test_seq_wrap[compile] | 0.5448ms | 0.3985ms | 2.5093 KOps/s | 2.4881 KOps/s | |
| test_seq_wrap[compile-overhead] | 0.4365ms | 0.2898ms | 3.4505 KOps/s | 3.3759 KOps/s | |
| test_func_call_runtime[False-eager] | 0.9864ms | 0.8774ms | 1.1397 KOps/s | 1.1374 KOps/s | |
| test_func_call_runtime[False-compile] | 1.1451ms | 0.9480ms | 1.0548 KOps/s | 1.0571 KOps/s | |
| test_func_call_runtime[False-compile-overhead] | 0.5707ms | 0.4964ms | 2.0146 KOps/s | 1.9974 KOps/s | |
| test_func_call_runtime[True-eager] | 1.2399ms | 1.0804ms | 925.6162 Ops/s | 919.1248 Ops/s | |
| test_func_call_runtime[True-compile] | 1.0282ms | 0.9552ms | 1.0469 KOps/s | 1.0457 KOps/s | |
| test_func_call_runtime[True-compile-overhead] | 0.6116ms | 0.5095ms | 1.9626 KOps/s | 1.9341 KOps/s | |
| test_func_call_cm_runtime[False-eager] | 0.9599ms | 0.8424ms | 1.1871 KOps/s | 1.1526 KOps/s | |
| test_func_call_cm_runtime[False-compile] | 1.1576ms | 0.9161ms | 1.0916 KOps/s | 1.0809 KOps/s | |
| test_func_call_cm_runtime[False-compile-overhead] | 0.5191ms | 0.4713ms | 2.1219 KOps/s | 2.0853 KOps/s | |
| test_func_call_cm_runtime[True-eager] | 1.3366ms | 1.2353ms | 809.5423 Ops/s | 808.0787 Ops/s | |
| test_func_call_cm_runtime[True-compile] | 1.0423ms | 0.9630ms | 1.0385 KOps/s | 1.0198 KOps/s | |
| test_func_call_cm_runtime[True-compile-overhead] | 0.6494ms | 0.5163ms | 1.9368 KOps/s | 1.9077 KOps/s | |
| test_vmap_func_call_cm_runtime[eager] | 2.9186ms | 2.3976ms | 417.0896 Ops/s | 412.6215 Ops/s | |
| test_vmap_func_call_cm_runtime[compile] | 1.1403ms | 0.9901ms | 1.0100 KOps/s | 1.0092 KOps/s | |
| test_vmap_func_call_cm_runtime[compile-overhead] | 0.5750ms | 0.5202ms | 1.9225 KOps/s | 1.8714 KOps/s | |
| test_distributed | 0.6216ms | 0.1531ms | 6.5310 KOps/s | 6.3878 KOps/s | |
| test_tdmodule | 0.5286ms | 28.1339μs | 35.5443 KOps/s | 35.7473 KOps/s | |
| test_tdmodule_dispatch | 75.7740μs | 44.5142μs | 22.4648 KOps/s | 21.1258 KOps/s | |
| test_tdseq | 47.2520μs | 26.4356μs | 37.8277 KOps/s | 35.7814 KOps/s | |
| test_tdseq_dispatch | 68.7230μs | 47.1289μs | 21.2184 KOps/s | 20.0126 KOps/s | |
| test_instantiation_functorch | 2.2022ms | 2.0835ms | 479.9671 Ops/s | 476.3557 Ops/s | |
| test_exec_functorch | 0.2252ms | 0.1806ms | 5.5375 KOps/s | 5.3918 KOps/s | |
| test_exec_functional_call | 0.2337ms | 0.1614ms | 6.1955 KOps/s | 5.9701 KOps/s | |
| test_exec_td_decorator | 0.4606ms | 0.2394ms | 4.1771 KOps/s | 4.0690 KOps/s | |
| test_vmap_mlp_speed_decorator[True-True] | 1.0373ms | 0.8300ms | 1.2048 KOps/s | 1.1768 KOps/s | |
| test_vmap_mlp_speed_decorator[True-False] | 1.0097ms | 0.8226ms | 1.2156 KOps/s | 1.1818 KOps/s | |
| test_vmap_mlp_speed_decorator[False-True] | 0.9824ms | 0.7129ms | 1.4028 KOps/s | 1.3545 KOps/s | |
| test_vmap_mlp_speed_decorator[False-False] | 0.9489ms | 0.7244ms | 1.3804 KOps/s | 1.3428 KOps/s | |
| test_vmap_transformer_speed_decorator[True-True] | 21.7486ms | 20.7768ms | 48.1306 Ops/s | 46.9235 Ops/s | |
| test_vmap_transformer_speed_decorator[True-False] | 21.4125ms | 20.6817ms | 48.3519 Ops/s | 46.7767 Ops/s | |
| test_vmap_transformer_speed_decorator[False-True] | 20.7225ms | 20.5406ms | 48.6841 Ops/s | 47.3338 Ops/s | |
| test_vmap_transformer_speed_decorator[False-False] | 20.6878ms | 20.5053ms | 48.7678 Ops/s | 47.4696 Ops/s | |
| test_to_module_speed[True] | 1.5960ms | 1.4720ms | 679.3482 Ops/s | 678.3769 Ops/s | |
| test_to_module_speed[False] | 1.5752ms | 1.4627ms | 683.6769 Ops/s | 693.7501 Ops/s | |
| test_tc_init | 87.3450μs | 45.8763μs | 21.7978 KOps/s | 22.3988 KOps/s | |
| test_tc_init_tensor_only | 40.7020μs | 9.7666μs | 102.3893 KOps/s | 103.6967 KOps/s | |
| test_tc_init_nested | 0.1446ms | 89.8270μs | 11.1325 KOps/s | 11.3617 KOps/s | |
| test_tc_init_many_fields | 50.1430μs | 16.3915μs | 61.0071 KOps/s | 61.3388 KOps/s | |
| test_tc_first_layer_tensor | 25.4210μs | 1.8204μs | 549.3283 KOps/s | 559.9207 KOps/s | |
| test_tc_first_layer_tensor_only | 1.8946μs | 0.3950μs | 2.5313 MOps/s | 2.5403 MOps/s | |
| test_tc_first_layer_tensor_set | 56.2230μs | 3.7476μs | 266.8367 KOps/s | 255.3973 KOps/s | |
| test_tc_first_layer_tensor_only_set | 24.5610μs | 3.4371μs | 290.9401 KOps/s | 296.4539 KOps/s | |
| test_tc_first_layer_nontensor | 38.3920μs | 6.1470μs | 162.6813 KOps/s | 163.0628 KOps/s | |
| test_tc_second_layer_tensor | 21.5110μs | 4.3786μs | 228.3827 KOps/s | 231.8405 KOps/s | |
| test_tc_second_layer_nontensor | 27.6710μs | 8.6889μs | 115.0888 KOps/s | 116.2332 KOps/s | |
| test_unbind | 0.2716s | 14.2049ms | 70.3982 Ops/s | 68.4909 Ops/s | |
| test_full_like | 4.8684ms | 4.3977ms | 227.3910 Ops/s | 227.7108 Ops/s | |
| test_zeros_like | 4.9023ms | 4.3748ms | 228.5811 Ops/s | 59.9954 Ops/s | |
| test_ones_like | 4.6147ms | 4.3917ms | 227.7048 Ops/s | 59.8608 Ops/s | |
| test_clone | 6.8277ms | 6.4713ms | 154.5293 Ops/s | 56.6585 Ops/s | |
| test_squeeze | 0.1592ms | 14.1383μs | 70.7297 KOps/s | 71.9278 KOps/s | |
| test_unsqueeze | 0.2447ms | 0.1104ms | 9.0607 KOps/s | 8.8629 KOps/s | |
| test_split | 0.3961ms | 0.1826ms | 5.4761 KOps/s | 5.4434 KOps/s | |
| test_permute | 0.3496ms | 0.2108ms | 4.7438 KOps/s | 4.6625 KOps/s | |
| test_stack | 51.7023ms | 51.3559ms | 19.4720 Ops/s | 19.4419 Ops/s | |
| test_cat | 52.0170ms | 51.3793ms | 19.4631 Ops/s | 19.4614 Ops/s | |
| test_sequential_tensordict | 0.6023ms | 0.2249ms | 4.4456 KOps/s | 4.3367 KOps/s | |
| test_sequential_graph_module | 0.1660ms | 0.1208ms | 8.2786 KOps/s | 8.0499 KOps/s | |
| test_nested_tensordict | 0.6413ms | 0.3017ms | 3.3144 KOps/s | 3.3904 KOps/s | |
| test_nested_graph_module | 0.2078ms | 0.1285ms | 7.7800 KOps/s | 7.6585 KOps/s |
e74aa4e to
90874fe
Compare
f2f6c68 to
094ac90
Compare
Adds `TensorMetaData`, `TensorDictBase.metadata()`, and extends `to()` so a
positional tensordict argument is interpreted as a per-leaf device/dtype spec.
This lets a deviceless tensordict be aligned to the heterogeneous placement
of another tensordict:
td3 = td0.to(td2.metadata()) # each leaf cast to its counterpart's device
Copies are issued asynchronously by default; a single `_sync_all()` runs at
the end when any leaf actually crossed devices, unless `non_blocking=True` is
passed. `non_blocking_pin` / `num_threads` raise `NotImplementedError` on the
per-leaf path for now.
094ac90 to
72d4c87
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
TensorAttrs, aTensorClassrecording per-leaftgt_device/tgt_dtype/tgt_shape(names prefixed withtgt_to avoid shadowing the tensordict's owndevice/dtype/shape). Matches PyTorch's "Tensor Attributes" vocabulary.TensorDictBase.attrs(fields=("device", "dtype", "shape"), num_threads=None)— returns a deviceless tensordict whose leaves areTensorAttrs. Scales cleanly: new attributes can be appended tofieldswithout proliferating boolean kwargs.TensorDictBase.to()so a positional tensordict argument is interpreted as a per-leaf spec and dispatched via a new_to_per_leaf()walker. Missing keys pass through unchanged.Motivation: a deviceless tensordict may hold leaves on heterogeneous devices. Today
td.to(other_td)picks a single device; there was no way to say "move each leaf to the device its counterpart lives on." Now:Async / sync semantics
non_blocking=None): per-leaf copies issued async; one_sync_all()at the end if any leaf actually crossed devices. Stricter than coreto(device)'s D2H-only sync guard because a single per-leaf call can span mixed directions (cuda:0 → cpu,cpu → cuda:1,cuda:0 → cuda:1) that Torch won't coordinate for you.non_blocking=True: fully async, no trailing sync. Caller's responsibility.non_blocking=False: blocking copies.Threading
num_threads=is now accepted on bothattrs()and the per-leafto()path; it plumbs through to_fast_apply. Defaults to single-threaded because the CPU-only path is GIL-bound — local microbenchmarks showednum_threads=4is 3–4× slower than single-threaded on CPU. Callers opt in when they know they have real device transfers (D2H/H2D/D2D cross-device) to overlap, which is where_fast_apply's thread pool earns its keep.Deferred
non_blocking_pin=Trueon the per-leaf path — still raisesNotImplementedErrorpending a follow-up that mirrors the multithreaded pin-memory walker from_to_cuda_with_pin_mem.TensorAttrsconstruction is ~17 µs/leaf (dominated by threeNonTensorDatawraps). For very wide tensordicts this adds up onattrs(); a lightweight storage path is a reasonable follow-up.Test plan
test/test_tensordict.py:attrs()basics, field-subset, nested structures, per-leaf dtype cast, missing-key passthrough, roundtrip, extra-positional rejection,non_blocking_pinrejection, sync-toggle monkeypatch, and a CUDA-gated heterogeneous-device scenario.pytest test/test_tensordict.py— 7611 passed, 856 skipped.pytest test/test_tensorclass.py— 143 passed, 1 skipped (streaming not installed).🤖 Generated with Claude Code