Skip to content

[BugFix] num_samples=None fix#1292

Merged
vmoens merged 2 commits intogh/vmoens/52/basefrom
gh/vmoens/52/head
Apr 23, 2025
Merged

[BugFix] num_samples=None fix#1292
vmoens merged 2 commits intogh/vmoens/52/basefrom
gh/vmoens/52/head

Conversation

@vmoens
Copy link
Copy Markdown
Collaborator

@vmoens vmoens commented Apr 23, 2025

[ghstack-poisoned]
vmoens pushed a commit that referenced this pull request Apr 23, 2025
ghstack-source-id: bbcf573
Pull Request resolved: #1292
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 23, 2025
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 23, 2025

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 233. Improved: $\large\color{#35bf28}9$. Worsened: $\large\color{#d91a1a}18$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 47.7610μs 11.5496μs 86.5828 KOps/s 87.7811 KOps/s $\color{#d91a1a}-1.37\%$
test_plain_set_stack_nested 40.8910μs 11.5753μs 86.3906 KOps/s 86.8288 KOps/s $\color{#d91a1a}-0.50\%$
test_plain_set_nested_inplace 39.9000μs 12.7155μs 78.6442 KOps/s 78.9726 KOps/s $\color{#d91a1a}-0.42\%$
test_plain_set_stack_nested_inplace 40.3710μs 12.6541μs 79.0258 KOps/s 79.4279 KOps/s $\color{#d91a1a}-0.51\%$
test_items 40.5810μs 2.9226μs 342.1583 KOps/s 336.7217 KOps/s $\color{#35bf28}+1.61\%$
test_items_nested 1.4547ms 0.3624ms 2.7595 KOps/s 2.7437 KOps/s $\color{#35bf28}+0.57\%$
test_items_nested_locked 0.5335ms 0.3635ms 2.7513 KOps/s 2.7265 KOps/s $\color{#35bf28}+0.91\%$
test_items_nested_leaf 0.1022ms 60.8742μs 16.4273 KOps/s 16.5184 KOps/s $\color{#d91a1a}-0.55\%$
test_items_stack_nested 0.5065ms 0.3655ms 2.7361 KOps/s 2.7333 KOps/s $\color{#35bf28}+0.10\%$
test_items_stack_nested_leaf 0.1116ms 60.9966μs 16.3944 KOps/s 16.4799 KOps/s $\color{#d91a1a}-0.52\%$
test_items_stack_nested_locked 0.5429ms 0.3662ms 2.7311 KOps/s 2.7565 KOps/s $\color{#d91a1a}-0.92\%$
test_keys 32.0110μs 3.4563μs 289.3266 KOps/s 290.2443 KOps/s $\color{#d91a1a}-0.32\%$
test_keys_nested 0.1406ms 89.1274μs 11.2199 KOps/s 11.1173 KOps/s $\color{#35bf28}+0.92\%$
test_keys_nested_locked 2.4687ms 94.5542μs 10.5759 KOps/s 10.4296 KOps/s $\color{#35bf28}+1.40\%$
test_keys_nested_leaf 0.1305ms 79.5424μs 12.5719 KOps/s 12.4162 KOps/s $\color{#35bf28}+1.25\%$
test_keys_stack_nested 0.1342ms 89.5541μs 11.1664 KOps/s 11.2169 KOps/s $\color{#d91a1a}-0.45\%$
test_keys_stack_nested_leaf 0.1045ms 79.5347μs 12.5731 KOps/s 12.5035 KOps/s $\color{#35bf28}+0.56\%$
test_keys_stack_nested_locked 0.1400ms 94.1055μs 10.6264 KOps/s 10.4537 KOps/s $\color{#35bf28}+1.65\%$
test_values 11.9618μs 0.8545μs 1.1702 MOps/s 1.1770 MOps/s $\color{#d91a1a}-0.57\%$
test_values_nested 82.0810μs 37.5859μs 26.6057 KOps/s 26.3430 KOps/s $\color{#35bf28}+1.00\%$
test_values_nested_locked 87.0320μs 39.8801μs 25.0752 KOps/s 25.1304 KOps/s $\color{#d91a1a}-0.22\%$
test_values_nested_leaf 86.5710μs 43.2671μs 23.1122 KOps/s 23.3172 KOps/s $\color{#d91a1a}-0.88\%$
test_values_stack_nested 99.4120μs 37.6951μs 26.5286 KOps/s 26.2307 KOps/s $\color{#35bf28}+1.14\%$
test_values_stack_nested_leaf 0.5311ms 43.1929μs 23.1520 KOps/s 23.0891 KOps/s $\color{#35bf28}+0.27\%$
test_values_stack_nested_locked 70.1820μs 39.9303μs 25.0437 KOps/s 25.0205 KOps/s $\color{#35bf28}+0.09\%$
test_membership 1.7480μs 0.5041μs 1.9839 MOps/s 2.0086 MOps/s $\color{#d91a1a}-1.23\%$
test_membership_nested 14.1555μs 2.0359μs 491.1881 KOps/s 493.7813 KOps/s $\color{#d91a1a}-0.53\%$
test_membership_nested_leaf 34.6360μs 2.0274μs 493.2473 KOps/s 494.1348 KOps/s $\color{#d91a1a}-0.18\%$
test_membership_stacked_nested 29.5410μs 2.0761μs 481.6789 KOps/s 475.0638 KOps/s $\color{#35bf28}+1.39\%$
test_membership_stacked_nested_leaf 32.4400μs 2.1092μs 474.1171 KOps/s 475.9177 KOps/s $\color{#d91a1a}-0.38\%$
test_membership_nested_last 33.5300μs 3.0819μs 324.4721 KOps/s 325.6003 KOps/s $\color{#d91a1a}-0.35\%$
test_membership_nested_leaf_last 26.7710μs 3.0734μs 325.3748 KOps/s 326.3891 KOps/s $\color{#d91a1a}-0.31\%$
test_membership_stacked_nested_last 32.6600μs 3.0748μs 325.2211 KOps/s 327.9335 KOps/s $\color{#d91a1a}-0.83\%$
test_membership_stacked_nested_leaf_last 22.8200μs 3.0497μs 327.8970 KOps/s 326.1569 KOps/s $\color{#35bf28}+0.53\%$
test_nested_getleaf 38.8500μs 13.0655μs 76.5374 KOps/s 76.7504 KOps/s $\color{#d91a1a}-0.28\%$
test_nested_get 46.8510μs 12.4854μs 80.0936 KOps/s 80.9789 KOps/s $\color{#d91a1a}-1.09\%$
test_stacked_getleaf 43.9610μs 13.0476μs 76.6425 KOps/s 76.7543 KOps/s $\color{#d91a1a}-0.15\%$
test_stacked_get 42.6010μs 12.4750μs 80.1606 KOps/s 81.0133 KOps/s $\color{#d91a1a}-1.05\%$
test_nested_getitemleaf 34.3500μs 13.4647μs 74.2682 KOps/s 74.4949 KOps/s $\color{#d91a1a}-0.30\%$
test_nested_getitem 48.0210μs 12.7807μs 78.2429 KOps/s 78.8093 KOps/s $\color{#d91a1a}-0.72\%$
test_stacked_getitemleaf 38.8310μs 13.4479μs 74.3608 KOps/s 74.7511 KOps/s $\color{#d91a1a}-0.52\%$
test_stacked_getitem 56.0800μs 12.6745μs 78.8986 KOps/s 79.2070 KOps/s $\color{#d91a1a}-0.39\%$
test_lock_nested 1.6981ms 0.3574ms 2.7981 KOps/s 2.7899 KOps/s $\color{#35bf28}+0.29\%$
test_lock_stack_nested 0.3805ms 0.3463ms 2.8874 KOps/s 2.9067 KOps/s $\color{#d91a1a}-0.66\%$
test_unlock_nested 0.5011ms 0.2989ms 3.3452 KOps/s 3.4489 KOps/s $\color{#d91a1a}-3.01\%$
test_unlock_stack_nested 0.3191ms 0.2828ms 3.5364 KOps/s 3.5647 KOps/s $\color{#d91a1a}-0.79\%$
test_flatten_speed 0.1126ms 77.2772μs 12.9404 KOps/s 12.7643 KOps/s $\color{#35bf28}+1.38\%$
test_unflatten_speed 0.7804ms 0.4035ms 2.4783 KOps/s 2.4997 KOps/s $\color{#d91a1a}-0.85\%$
test_common_ops 1.0054ms 0.6289ms 1.5902 KOps/s 1.5945 KOps/s $\color{#d91a1a}-0.27\%$
test_creation 0.1329ms 1.7595μs 568.3329 KOps/s 569.9571 KOps/s $\color{#d91a1a}-0.28\%$
test_creation_empty 0.5453ms 7.3317μs 136.3943 KOps/s 139.1530 KOps/s $\color{#d91a1a}-1.98\%$
test_creation_nested_1 99.6110μs 10.2008μs 98.0318 KOps/s 99.6897 KOps/s $\color{#d91a1a}-1.66\%$
test_creation_nested_2 0.4037ms 13.0738μs 76.4887 KOps/s 76.6725 KOps/s $\color{#d91a1a}-0.24\%$
test_clone 97.2920μs 10.3921μs 96.2265 KOps/s 99.6344 KOps/s $\color{#d91a1a}-3.42\%$
test_getitem[int] 0.1784ms 10.7554μs 92.9764 KOps/s 64.0398 KOps/s $\textbf{\color{#35bf28}+45.19\%}$
test_getitem[slice_int] 0.1143ms 20.4492μs 48.9016 KOps/s 48.8877 KOps/s $\color{#35bf28}+0.03\%$
test_getitem[range] 0.4436ms 40.6750μs 24.5851 KOps/s 26.4554 KOps/s $\textbf{\color{#d91a1a}-7.07\%}$
test_getitem[tuple] 0.1109ms 18.2691μs 54.7371 KOps/s 55.4791 KOps/s $\color{#d91a1a}-1.34\%$
test_getitem[list] 0.1326ms 32.4884μs 30.7802 KOps/s 30.5367 KOps/s $\color{#35bf28}+0.80\%$
test_setitem_dim[int] 41.0600μs 20.2770μs 49.3170 KOps/s 53.0512 KOps/s $\textbf{\color{#d91a1a}-7.04\%}$
test_setitem_dim[slice_int] 62.6210μs 39.2092μs 25.5042 KOps/s 26.6466 KOps/s $\color{#d91a1a}-4.29\%$
test_setitem_dim[range] 80.6210μs 55.8509μs 17.9048 KOps/s 19.5174 KOps/s $\textbf{\color{#d91a1a}-8.26\%}$
test_setitem_dim[tuple] 53.5110μs 32.0970μs 31.1556 KOps/s 31.5702 KOps/s $\color{#d91a1a}-1.31\%$
test_setitem 0.2147ms 15.0249μs 66.5560 KOps/s 66.1879 KOps/s $\color{#35bf28}+0.56\%$
test_set 0.4133ms 14.4936μs 68.9958 KOps/s 70.0589 KOps/s $\color{#d91a1a}-1.52\%$
test_set_shared 0.5329ms 0.1654ms 6.0453 KOps/s 6.3116 KOps/s $\color{#d91a1a}-4.22\%$
test_update 0.4092ms 18.2306μs 54.8529 KOps/s 55.7038 KOps/s $\color{#d91a1a}-1.53\%$
test_update_nested 0.4282ms 30.0238μs 33.3069 KOps/s 34.9584 KOps/s $\color{#d91a1a}-4.72\%$
test_update__nested 67.9610μs 24.3491μs 41.0693 KOps/s 42.6222 KOps/s $\color{#d91a1a}-3.64\%$
test_set_nested 0.1342ms 15.9347μs 62.7562 KOps/s 64.4414 KOps/s $\color{#d91a1a}-2.62\%$
test_set_nested_new 0.4203ms 18.9133μs 52.8727 KOps/s 52.9856 KOps/s $\color{#d91a1a}-0.21\%$
test_select 0.1166ms 30.2301μs 33.0797 KOps/s 33.3268 KOps/s $\color{#d91a1a}-0.74\%$
test_select_nested 0.4308ms 43.3735μs 23.0555 KOps/s 22.7858 KOps/s $\color{#35bf28}+1.18\%$
test_exclude_nested 0.4618ms 63.4549μs 15.7592 KOps/s 15.6954 KOps/s $\color{#35bf28}+0.41\%$
test_empty[True] 0.3303ms 0.2968ms 3.3692 KOps/s 3.3709 KOps/s $\color{#d91a1a}-0.05\%$
test_empty[False] 39.2847μs 0.8243μs 1.2131 MOps/s 1.2073 MOps/s $\color{#35bf28}+0.48\%$
test_to 89.3720μs 58.3387μs 17.1413 KOps/s 17.4112 KOps/s $\color{#d91a1a}-1.55\%$
test_to_nonblocking 0.4520ms 49.6750μs 20.1308 KOps/s 19.9454 KOps/s $\color{#35bf28}+0.93\%$
test_unbind_speed 0.8423ms 0.2410ms 4.1494 KOps/s 4.1320 KOps/s $\color{#35bf28}+0.42\%$
test_unbind_speed_stack0 0.2850ms 0.2354ms 4.2488 KOps/s 4.1366 KOps/s $\color{#35bf28}+2.71\%$
test_unbind_speed_stack1 93.4563ms 0.7347ms 1.3610 KOps/s 1.4888 KOps/s $\textbf{\color{#d91a1a}-8.58\%}$
test_split 94.0070ms 1.5885ms 629.5416 Ops/s 574.3775 Ops/s $\textbf{\color{#35bf28}+9.60\%}$
test_chunk 94.7254ms 1.5990ms 625.4101 Ops/s 686.8355 Ops/s $\textbf{\color{#d91a1a}-8.94\%}$
test_consolidate[False-None] 96.9260ms 3.1003ms 322.5536 Ops/s 325.5700 Ops/s $\color{#d91a1a}-0.93\%$
test_consolidate[default-None] 1.8065ms 1.7379ms 575.4175 Ops/s 576.1416 Ops/s $\color{#d91a1a}-0.13\%$
test_consolidate[reduce-overhead-None] 1.8234ms 1.7602ms 568.1196 Ops/s 569.4821 Ops/s $\color{#d91a1a}-0.24\%$
test_consolidate_njt[False-None] 7.2790ms 6.8173ms 146.6847 Ops/s 112.8152 Ops/s $\textbf{\color{#35bf28}+30.02\%}$
test_to[False-False-None] 1.8940ms 1.8001ms 555.5294 Ops/s 565.0895 Ops/s $\color{#d91a1a}-1.69\%$
test_to[True-False-None] 1.9577ms 1.4243ms 702.0892 Ops/s 696.3913 Ops/s $\color{#35bf28}+0.82\%$
test_to[within-False-None] 4.7220ms 4.3384ms 230.4975 Ops/s 226.6597 Ops/s $\color{#35bf28}+1.69\%$
test_to[True-default-None] 5.4819ms 5.1586ms 193.8509 Ops/s 185.6798 Ops/s $\color{#35bf28}+4.40\%$
test_to_njt[False-False-None] 7.4044ms 6.9663ms 143.5479 Ops/s 142.7365 Ops/s $\color{#35bf28}+0.57\%$
test_to_njt[True-False-None] 6.1158ms 5.5511ms 180.1439 Ops/s 176.9641 Ops/s $\color{#35bf28}+1.80\%$
test_to_njt[within-False-None] 13.5401ms 12.8326ms 77.9263 Ops/s 81.0409 Ops/s $\color{#d91a1a}-3.84\%$
test_creation[device0] 0.3001ms 79.5750μs 12.5668 KOps/s 12.5616 KOps/s $\color{#35bf28}+0.04\%$
test_creation_from_tensor 0.5474ms 88.5773μs 11.2896 KOps/s 11.7600 KOps/s $\color{#d91a1a}-4.00\%$
test_add_one[memmap_tensor0] 0.4335ms 6.4807μs 154.3046 KOps/s 158.8140 KOps/s $\color{#d91a1a}-2.84\%$
test_contiguous[memmap_tensor0] 3.4940μs 0.4277μs 2.3382 MOps/s 2.3961 MOps/s $\color{#d91a1a}-2.42\%$
test_stack[memmap_tensor0] 35.5800μs 4.6875μs 213.3314 KOps/s 228.9571 KOps/s $\textbf{\color{#d91a1a}-6.82\%}$
test_memmaptd_index 1.6602ms 0.2424ms 4.1258 KOps/s 4.2036 KOps/s $\color{#d91a1a}-1.85\%$
test_memmaptd_index_astensor 0.4364ms 0.3032ms 3.2987 KOps/s 3.3022 KOps/s $\color{#d91a1a}-0.11\%$
test_memmaptd_index_op 0.9380ms 0.5418ms 1.8456 KOps/s 1.8306 KOps/s $\color{#35bf28}+0.82\%$
test_serialize_model 0.1350s 0.1326s 7.5439 Ops/s 7.5421 Ops/s $\color{#35bf28}+0.02\%$
test_serialize_model_pickle 1.3470s 1.2145s 0.8234 Ops/s 0.8254 Ops/s $\color{#d91a1a}-0.24\%$
test_serialize_weights 0.2861s 0.1535s 6.5140 Ops/s 7.5781 Ops/s $\textbf{\color{#d91a1a}-14.04\%}$
test_serialize_weights_returnearly 0.3343s 53.3288ms 18.7516 Ops/s 12.9389 Ops/s $\textbf{\color{#35bf28}+44.92\%}$
test_serialize_weights_pickle 1.3760s 1.2158s 0.8225 Ops/s 0.8138 Ops/s $\color{#35bf28}+1.07\%$
test_reshape_pytree 50.6410μs 22.0732μs 45.3039 KOps/s 44.7185 KOps/s $\color{#35bf28}+1.31\%$
test_reshape_td 49.5710μs 27.2394μs 36.7116 KOps/s 36.7874 KOps/s $\color{#d91a1a}-0.21\%$
test_view_pytree 51.6710μs 21.6160μs 46.2620 KOps/s 45.4449 KOps/s $\color{#35bf28}+1.80\%$
test_view_td 77.3820μs 33.4129μs 29.9286 KOps/s 30.6693 KOps/s $\color{#d91a1a}-2.42\%$
test_unbind_pytree 83.8510μs 28.1633μs 35.5072 KOps/s 35.5664 KOps/s $\color{#d91a1a}-0.17\%$
test_unbind_td 0.6087ms 38.4347μs 26.0182 KOps/s 27.0597 KOps/s $\color{#d91a1a}-3.85\%$
test_split_pytree 61.2410μs 30.2643μs 33.0422 KOps/s 33.2739 KOps/s $\color{#d91a1a}-0.70\%$
test_split_td 0.7371ms 40.4795μs 24.7039 KOps/s 24.8689 KOps/s $\color{#d91a1a}-0.66\%$
test_add_pytree 73.0820μs 32.9384μs 30.3597 KOps/s 30.5818 KOps/s $\color{#d91a1a}-0.73\%$
test_add_td 0.2785ms 47.2912μs 21.1456 KOps/s 20.9945 KOps/s $\color{#35bf28}+0.72\%$
test_compile_add_one_nested[tensordict-compile] 0.1778ms 0.1241ms 8.0556 KOps/s 7.7412 KOps/s $\color{#35bf28}+4.06\%$
test_compile_add_one_nested[tensordict-eager] 0.2336ms 0.1411ms 7.0874 KOps/s 6.9513 KOps/s $\color{#35bf28}+1.96\%$
test_compile_add_one_nested[pytree-compile] 0.1436ms 96.5759μs 10.3545 KOps/s 10.1791 KOps/s $\color{#35bf28}+1.72\%$
test_compile_add_one_nested[pytree-eager] 1.0096ms 0.1501ms 6.6608 KOps/s 6.5381 KOps/s $\color{#35bf28}+1.88\%$
test_compile_copy_nested[tensordict-compile] 65.7110μs 23.9714μs 41.7163 KOps/s 43.9636 KOps/s $\textbf{\color{#d91a1a}-5.11\%}$
test_compile_copy_nested[tensordict-eager] 0.1061ms 36.2615μs 27.5774 KOps/s 28.6101 KOps/s $\color{#d91a1a}-3.61\%$
test_compile_copy_nested[pytree-compile] 0.4457ms 64.6528μs 15.4672 KOps/s 15.3272 KOps/s $\color{#35bf28}+0.91\%$
test_compile_copy_nested[pytree-eager] 78.4010μs 48.6425μs 20.5582 KOps/s 20.1086 KOps/s $\color{#35bf28}+2.24\%$
test_compile_add_one_flat[tensordict-compile] 0.1985ms 0.1501ms 6.6608 KOps/s 7.0345 KOps/s $\textbf{\color{#d91a1a}-5.31\%}$
test_compile_add_one_flat[tensordict-eager] 0.3277ms 0.2215ms 4.5148 KOps/s 4.4722 KOps/s $\color{#35bf28}+0.95\%$
test_compile_add_one_flat[tensorclass-compile] 0.1496ms 0.1009ms 9.9067 KOps/s 10.3362 KOps/s $\color{#d91a1a}-4.16\%$
test_compile_add_one_flat[tensorclass-eager] 0.1291ms 59.3846μs 16.8394 KOps/s 16.1446 KOps/s $\color{#35bf28}+4.30\%$
test_compile_add_one_flat[pytree-compile] 0.2106ms 0.1381ms 7.2432 KOps/s 7.3519 KOps/s $\color{#d91a1a}-1.48\%$
test_compile_add_one_flat[pytree-eager] 0.5660ms 0.4861ms 2.0572 KOps/s 2.0212 KOps/s $\color{#35bf28}+1.78\%$
test_compile_add_self_flat[tensordict-eager] 0.4315ms 0.2690ms 3.7170 KOps/s 3.7106 KOps/s $\color{#35bf28}+0.17\%$
test_compile_add_self_flat[tensordict-compile] 0.1849ms 0.1467ms 6.8184 KOps/s 6.9554 KOps/s $\color{#d91a1a}-1.97\%$
test_compile_add_self_flat[tensorclass-eager] 0.1663ms 72.2010μs 13.8502 KOps/s 13.5710 KOps/s $\color{#35bf28}+2.06\%$
test_compile_add_self_flat[tensorclass-compile] 0.1382ms 0.1007ms 9.9264 KOps/s 10.1578 KOps/s $\color{#d91a1a}-2.28\%$
test_compile_add_self_flat[pytree-eager] 0.4746ms 0.4187ms 2.3885 KOps/s 2.4154 KOps/s $\color{#d91a1a}-1.11\%$
test_compile_add_self_flat[pytree-compile] 0.1935ms 0.1393ms 7.1801 KOps/s 7.4245 KOps/s $\color{#d91a1a}-3.29\%$
test_compile_copy_flat[tensordict-compile] 94.2210μs 20.2219μs 49.4514 KOps/s 54.6818 KOps/s $\textbf{\color{#d91a1a}-9.57\%}$
test_compile_copy_flat[tensordict-eager] 64.7010μs 32.0017μs 31.2483 KOps/s 30.9516 KOps/s $\color{#35bf28}+0.96\%$
test_compile_copy_flat[pytree-compile] 0.1042ms 69.1073μs 14.4703 KOps/s 14.3602 KOps/s $\color{#35bf28}+0.77\%$
test_compile_copy_flat[pytree-eager] 86.0910μs 51.9341μs 19.2552 KOps/s 19.3714 KOps/s $\color{#d91a1a}-0.60\%$
test_compile_assign_and_add[tensordict-compile] 1.6588ms 0.3983ms 2.5110 KOps/s 2.2157 KOps/s $\textbf{\color{#35bf28}+13.33\%}$
test_compile_assign_and_add[tensordict-eager] 3.0078ms 2.8005ms 357.0814 Ops/s 362.4276 Ops/s $\color{#d91a1a}-1.48\%$
test_compile_assign_and_add[pytree-compile] 1.5917ms 0.4327ms 2.3109 KOps/s 2.2753 KOps/s $\color{#35bf28}+1.57\%$
test_compile_assign_and_add[pytree-eager] 2.8000ms 2.6893ms 371.8416 Ops/s 379.1233 Ops/s $\color{#d91a1a}-1.92\%$
test_compile_indexing[tensor-tensordict-compile] 0.1644ms 0.1150ms 8.6950 KOps/s 9.0320 KOps/s $\color{#d91a1a}-3.73\%$
test_compile_indexing[tensor-tensordict-eager] 0.5534ms 82.9029μs 12.0623 KOps/s 12.0560 KOps/s $\color{#35bf28}+0.05\%$
test_compile_indexing[tensor-tensorclass-compile] 0.2304ms 0.1096ms 9.1244 KOps/s 9.4553 KOps/s $\color{#d91a1a}-3.50\%$
test_compile_indexing[tensor-tensorclass-eager] 0.1139ms 68.1058μs 14.6830 KOps/s 14.5688 KOps/s $\color{#35bf28}+0.78\%$
test_compile_indexing[tensor-pytree-compile] 0.1561ms 0.1121ms 8.9239 KOps/s 9.4490 KOps/s $\textbf{\color{#d91a1a}-5.56\%}$
test_compile_indexing[tensor-pytree-eager] 0.1170ms 71.4899μs 13.9880 KOps/s 14.6908 KOps/s $\color{#d91a1a}-4.78\%$
test_compile_indexing[slice-tensordict-compile] 0.1419ms 99.2252μs 10.0781 KOps/s 9.9965 KOps/s $\color{#35bf28}+0.82\%$
test_compile_indexing[slice-tensordict-eager] 0.1486ms 19.3848μs 51.5868 KOps/s 53.0194 KOps/s $\color{#d91a1a}-2.70\%$
test_compile_indexing[slice-tensorclass-compile] 0.1475ms 96.2101μs 10.3939 KOps/s 9.9302 KOps/s $\color{#35bf28}+4.67\%$
test_compile_indexing[slice-tensorclass-eager] 46.4110μs 15.7557μs 63.4691 KOps/s 64.1581 KOps/s $\color{#d91a1a}-1.07\%$
test_compile_indexing[slice-pytree-compile] 0.1485ms 0.1003ms 9.9720 KOps/s 10.1925 KOps/s $\color{#d91a1a}-2.16\%$
test_compile_indexing[slice-pytree-eager] 53.3310μs 15.9765μs 62.5921 KOps/s 64.0749 KOps/s $\color{#d91a1a}-2.31\%$
test_compile_indexing[int-tensordict-compile] 0.1492ms 0.1049ms 9.5343 KOps/s 9.7120 KOps/s $\color{#d91a1a}-1.83\%$
test_compile_indexing[int-tensordict-eager] 0.6465ms 19.4544μs 51.4024 KOps/s 53.4599 KOps/s $\color{#d91a1a}-3.85\%$
test_compile_indexing[int-tensorclass-compile] 0.1574ms 0.1006ms 9.9355 KOps/s 10.1418 KOps/s $\color{#d91a1a}-2.03\%$
test_compile_indexing[int-tensorclass-eager] 80.9510μs 15.7413μs 63.5270 KOps/s 64.4231 KOps/s $\color{#d91a1a}-1.39\%$
test_compile_indexing[int-pytree-compile] 0.1239ms 96.6006μs 10.3519 KOps/s 10.2359 KOps/s $\color{#35bf28}+1.13\%$
test_compile_indexing[int-pytree-eager] 62.0610μs 15.7655μs 63.4296 KOps/s 64.8981 KOps/s $\color{#d91a1a}-2.26\%$
test_mod_add[eager] 81.8910μs 38.4468μs 26.0100 KOps/s 26.1044 KOps/s $\color{#d91a1a}-0.36\%$
test_mod_add[compile] 0.1251ms 83.7849μs 11.9353 KOps/s 12.1770 KOps/s $\color{#d91a1a}-1.98\%$
test_mod_add[compile-overhead] 0.3307ms 0.1702ms 5.8755 KOps/s 5.5977 KOps/s $\color{#35bf28}+4.96\%$
test_mod_wrap[eager] 0.3325ms 0.2515ms 3.9767 KOps/s 3.7696 KOps/s $\textbf{\color{#35bf28}+5.49\%}$
test_mod_wrap[compile] 0.3513ms 0.3024ms 3.3063 KOps/s 3.4430 KOps/s $\color{#d91a1a}-3.97\%$
test_mod_wrap[compile-overhead] 7.5182ms 3.9370ms 253.9999 Ops/s 266.7229 Ops/s $\color{#d91a1a}-4.77\%$
test_mod_wrap_and_backward[eager] 1.4844ms 1.3581ms 736.3406 Ops/s 698.9184 Ops/s $\textbf{\color{#35bf28}+5.35\%}$
test_mod_wrap_and_backward[compile] 1.4049ms 1.2920ms 773.9865 Ops/s 714.4871 Ops/s $\textbf{\color{#35bf28}+8.33\%}$
test_mod_wrap_and_backward[compile-overhead] 1.3870ms 0.9250ms 1.0811 KOps/s 967.2420 Ops/s $\textbf{\color{#35bf28}+11.77\%}$
test_seq_add[eager] 0.3264ms 0.1358ms 7.3636 KOps/s 7.7509 KOps/s $\color{#d91a1a}-5.00\%$
test_seq_add[compile] 0.1926ms 95.1267μs 10.5123 KOps/s 10.6248 KOps/s $\color{#d91a1a}-1.06\%$
test_seq_add[compile-overhead] 0.1814ms 0.1354ms 7.3832 KOps/s 7.5256 KOps/s $\color{#d91a1a}-1.89\%$
test_seq_wrap[eager] 1.0425ms 0.4466ms 2.2391 KOps/s 2.2435 KOps/s $\color{#d91a1a}-0.19\%$
test_seq_wrap[compile] 1.2368ms 0.3317ms 3.0151 KOps/s 3.1033 KOps/s $\color{#d91a1a}-2.84\%$
test_seq_wrap[compile-overhead] 0.2780ms 0.2344ms 4.2661 KOps/s 4.3821 KOps/s $\color{#d91a1a}-2.65\%$
test_func_call_runtime[False-eager] 0.8585ms 0.7848ms 1.2742 KOps/s 1.3025 KOps/s $\color{#d91a1a}-2.17\%$
test_func_call_runtime[False-compile] 1.0919ms 0.7886ms 1.2680 KOps/s 1.3157 KOps/s $\color{#d91a1a}-3.62\%$
test_func_call_runtime[False-compile-overhead] 0.4128ms 0.3681ms 2.7165 KOps/s 2.7195 KOps/s $\color{#d91a1a}-0.11\%$
test_func_call_runtime[True-eager] 1.0773ms 0.9376ms 1.0666 KOps/s 1.1184 KOps/s $\color{#d91a1a}-4.63\%$
test_func_call_runtime[True-compile] 0.8911ms 0.8306ms 1.2039 KOps/s 1.2905 KOps/s $\textbf{\color{#d91a1a}-6.71\%}$
test_func_call_runtime[True-compile-overhead] 0.4413ms 0.3923ms 2.5489 KOps/s 2.5702 KOps/s $\color{#d91a1a}-0.83\%$
test_func_call_cm_runtime[False-eager] 0.7884ms 0.7225ms 1.3841 KOps/s 1.3961 KOps/s $\color{#d91a1a}-0.86\%$
test_func_call_cm_runtime[False-compile] 0.8757ms 0.7626ms 1.3113 KOps/s 1.2622 KOps/s $\color{#35bf28}+3.89\%$
test_func_call_cm_runtime[False-compile-overhead] 0.4207ms 0.3707ms 2.6977 KOps/s 2.7160 KOps/s $\color{#d91a1a}-0.67\%$
test_func_call_cm_runtime[True-eager] 1.0934ms 1.0020ms 997.9672 Ops/s 1.0013 KOps/s $\color{#d91a1a}-0.33\%$
test_func_call_cm_runtime[True-compile] 1.0387ms 0.9799ms 1.0205 KOps/s 998.9312 Ops/s $\color{#35bf28}+2.16\%$
test_func_call_cm_runtime[True-compile-overhead] 1.0620ms 0.9880ms 1.0122 KOps/s 1.0134 KOps/s $\color{#d91a1a}-0.12\%$
test_vmap_func_call_cm_runtime[eager] 2.5235ms 2.0528ms 487.1333 Ops/s 482.8268 Ops/s $\color{#35bf28}+0.89\%$
test_vmap_func_call_cm_runtime[compile] 0.8881ms 0.8232ms 1.2147 KOps/s 1.2012 KOps/s $\color{#35bf28}+1.12\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.4785ms 0.4276ms 2.3389 KOps/s 2.3756 KOps/s $\color{#d91a1a}-1.54\%$
test_distributed 2.8613ms 0.2832ms 3.5307 KOps/s 8.5829 KOps/s $\textbf{\color{#d91a1a}-58.86\%}$
test_tdmodule 57.3710μs 21.4393μs 46.6434 KOps/s 47.3013 KOps/s $\color{#d91a1a}-1.39\%$
test_tdmodule_dispatch 63.9410μs 39.8472μs 25.0959 KOps/s 25.8330 KOps/s $\color{#d91a1a}-2.85\%$
test_tdseq 33.1410μs 20.8928μs 47.8635 KOps/s 48.1893 KOps/s $\color{#d91a1a}-0.68\%$
test_tdseq_dispatch 62.0120μs 42.3797μs 23.5962 KOps/s 24.9057 KOps/s $\textbf{\color{#d91a1a}-5.26\%}$
test_instantiation_functorch 1.7129ms 1.5447ms 647.3754 Ops/s 653.4166 Ops/s $\color{#d91a1a}-0.92\%$
test_exec_functorch 0.1711ms 0.1379ms 7.2517 KOps/s 7.1747 KOps/s $\color{#35bf28}+1.07\%$
test_exec_functional_call 0.1856ms 0.1286ms 7.7735 KOps/s 7.6009 KOps/s $\color{#35bf28}+2.27\%$
test_exec_td_decorator 0.3810ms 0.1809ms 5.5270 KOps/s 5.5011 KOps/s $\color{#35bf28}+0.47\%$
test_vmap_mlp_speed_decorator[True-True] 0.8923ms 0.6818ms 1.4668 KOps/s 1.4685 KOps/s $\color{#d91a1a}-0.11\%$
test_vmap_mlp_speed_decorator[True-False] 1.0533ms 0.6773ms 1.4764 KOps/s 1.4776 KOps/s $\color{#d91a1a}-0.08\%$
test_vmap_mlp_speed_decorator[False-True] 0.9688ms 0.5863ms 1.7057 KOps/s 1.7072 KOps/s $\color{#d91a1a}-0.09\%$
test_vmap_mlp_speed_decorator[False-False] 0.9931ms 0.5892ms 1.6971 KOps/s 1.7046 KOps/s $\color{#d91a1a}-0.44\%$
test_vmap_transformer_speed_decorator[True-True] 19.3876ms 19.0154ms 52.5889 Ops/s 53.0085 Ops/s $\color{#d91a1a}-0.79\%$
test_vmap_transformer_speed_decorator[True-False] 19.3833ms 19.0063ms 52.6142 Ops/s 53.1396 Ops/s $\color{#d91a1a}-0.99\%$
test_vmap_transformer_speed_decorator[False-True] 19.2050ms 18.9335ms 52.8164 Ops/s 53.5632 Ops/s $\color{#d91a1a}-1.39\%$
test_vmap_transformer_speed_decorator[False-False] 19.6056ms 18.9687ms 52.7183 Ops/s 53.3088 Ops/s $\color{#d91a1a}-1.11\%$
test_to_module_speed[True] 1.4958ms 0.9781ms 1.0224 KOps/s 1.0171 KOps/s $\color{#35bf28}+0.52\%$
test_to_module_speed[False] 1.4224ms 0.9789ms 1.0216 KOps/s 1.0272 KOps/s $\color{#d91a1a}-0.54\%$
test_tc_init 0.4349ms 34.8168μs 28.7218 KOps/s 28.6653 KOps/s $\color{#35bf28}+0.20\%$
test_tc_init_tensor_only 0.1017ms 10.9352μs 91.4479 KOps/s 92.8246 KOps/s $\color{#d91a1a}-1.48\%$
test_tc_init_nested 0.4573ms 68.8419μs 14.5260 KOps/s 14.6132 KOps/s $\color{#d91a1a}-0.60\%$
test_tc_first_layer_tensor 23.2800μs 0.8990μs 1.1124 MOps/s 1.2218 MOps/s $\textbf{\color{#d91a1a}-8.96\%}$
test_tc_first_layer_tensor_only 39.2007μs 0.4235μs 2.3615 MOps/s 2.3660 MOps/s $\color{#d91a1a}-0.19\%$
test_tc_first_layer_tensor_set 22.5400μs 2.9252μs 341.8533 KOps/s 335.3999 KOps/s $\color{#35bf28}+1.92\%$
test_tc_first_layer_tensor_only_set 0.1321ms 1.7951μs 557.0614 KOps/s 550.4090 KOps/s $\color{#35bf28}+1.21\%$
test_tc_first_layer_nontensor 22.6610μs 2.3475μs 425.9809 KOps/s 428.9554 KOps/s $\color{#d91a1a}-0.69\%$
test_tc_second_layer_tensor 0.4005ms 1.7277μs 578.8184 KOps/s 568.3438 KOps/s $\color{#35bf28}+1.84\%$
test_tc_second_layer_nontensor 69.9310μs 3.1748μs 314.9839 KOps/s 315.6774 KOps/s $\color{#d91a1a}-0.22\%$
test_unbind 0.2459s 9.9631ms 100.3708 Ops/s 144.4284 Ops/s $\textbf{\color{#d91a1a}-30.50\%}$
test_full_like 9.3148ms 7.4574ms 134.0943 Ops/s 135.1390 Ops/s $\color{#d91a1a}-0.77\%$
test_zeros_like 5.9907ms 4.3669ms 228.9938 Ops/s 230.4555 Ops/s $\color{#d91a1a}-0.63\%$
test_ones_like 11.9161ms 8.7076ms 114.8417 Ops/s 230.1480 Ops/s $\textbf{\color{#d91a1a}-50.10\%}$
test_clone 6.9736ms 6.5971ms 151.5809 Ops/s 151.8811 Ops/s $\color{#d91a1a}-0.20\%$
test_squeeze 62.4610μs 10.2029μs 98.0114 KOps/s 102.4090 KOps/s $\color{#d91a1a}-4.29\%$
test_unsqueeze 0.5079ms 78.7398μs 12.7000 KOps/s 13.5776 KOps/s $\textbf{\color{#d91a1a}-6.46\%}$
test_split 0.2726ms 0.1634ms 6.1213 KOps/s 6.0335 KOps/s $\color{#35bf28}+1.45\%$
test_permute 0.5995ms 0.1868ms 5.3523 KOps/s 5.3889 KOps/s $\color{#d91a1a}-0.68\%$
test_stack 51.8193ms 51.1232ms 19.5606 Ops/s 20.0443 Ops/s $\color{#d91a1a}-2.41\%$
test_cat 51.6585ms 51.0157ms 19.6018 Ops/s 19.1026 Ops/s $\color{#35bf28}+2.61\%$

[ghstack-poisoned]
@vmoens vmoens mentioned this pull request Apr 23, 2025
@vmoens vmoens merged commit 769aa89 into gh/vmoens/52/base Apr 23, 2025
45 of 48 checks passed
vmoens pushed a commit that referenced this pull request Apr 23, 2025
ghstack-source-id: af6dacf
Pull Request resolved: #1292
@vmoens vmoens deleted the gh/vmoens/52/head branch April 23, 2025 11:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants