PAL-1217 heatlhbench original#10
Conversation
7f7b20a to
9e679ce
Compare
MiguelAFH
left a comment
There was a problem hiding this comment.
Left a comment to create a separate metric for health_bench_professional. Everything else looks good
| MetricSpec( | ||
| class_name="helm.benchmark.metrics.llm_jury_metrics.LLMJuryMetric", | ||
| args={ | ||
| "metric_name": "health_bench_score", |
There was a problem hiding this comment.
I would suggest to change the name of the metric to health_bench_professional_score. The annotators differ in that the health_bench uses only one prompt to evaluate and health_bench_professional uses one or two prompts, so it's not the same metric. For clarity, it would be best to have them as separate and include the new metric in the schema as well.
|
|
||
|
|
||
|
|
||
| - name: health_bench |
There was a problem hiding this comment.
In order for health_bench and health_bench_professional to appear in the leaderboard, they need to be placed in one of the groups for the medhelm categories (Look at how we have benchmarks under patient_communication in the schema). Also, it seems health_bench_professional has not been added to the schema.
MiguelAFH
left a comment
There was a problem hiding this comment.
Did final edits on the schema. Looks good to me now
this can be run like so
define in config the model_deployments.yaml
and run