Hi all, thank you very much for your work on StableToolBench. Do you provide the evaluation json files anywhere? I am referring to the json files containing the results of running run_pass_rate.sh, which contain labels for which trajectories were determined to "pass" (you have an example in data_example/pass_rate_results/virtual_chatgpt_dfs/G1_instruction_virtual_chatgpt_dfs.json). I found the reproduction data zip file you provide, but I would like to see the results of evaluating this data as well. Thank you!
Hi all, thank you very much for your work on StableToolBench. Do you provide the evaluation json files anywhere? I am referring to the json files containing the results of running
run_pass_rate.sh, which contain labels for which trajectories were determined to "pass" (you have an example indata_example/pass_rate_results/virtual_chatgpt_dfs/G1_instruction_virtual_chatgpt_dfs.json). I found the reproduction data zip file you provide, but I would like to see the results of evaluating this data as well. Thank you!