Missing reproduction data and evaluation results

Hi all, thank you very much for your work on StableToolBench. Do you provide the evaluation json files anywhere? I am referring to the json files containing the results of running `run_pass_rate.sh`, which contain labels for which trajectories were determined to "pass" (you have an example in `data_example/pass_rate_results/virtual_chatgpt_dfs/G1_instruction_virtual_chatgpt_dfs.json`). I found the reproduction data zip file you provide, but I would like to see the results of evaluating this data as well. Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing reproduction data and evaluation results #30

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Missing reproduction data and evaluation results #30

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions