Skip to content

PAL-1217 heatlhbench original#10

Merged
pratacosmin merged 8 commits intomainfrom
PAL-1217
May 7, 2026
Merged

PAL-1217 heatlhbench original#10
pratacosmin merged 8 commits intomainfrom
PAL-1217

Conversation

@pratacosmin
Copy link
Copy Markdown

this can be run like so
define in config the model_deployments.yaml
and run

 helm-run --run-entries "health_bench:model=jsl/medical-visual-lm-30b,jury_config_path=<PATH>/judges.yaml" --max-eval-instances 4 --suite my-suite --local-path ./config
helm-summarize --suite my-suite -o ./benchmark_output --schema-path  <PATH>/schema_medhelm.yaml
 helm-server --suite my-suite -o ./benchmark_output --port 8000

Copy link
Copy Markdown

@chakravarthik27 chakravarthik27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pratacosmin pratacosmin force-pushed the PAL-1217 branch 4 times, most recently from 7f7b20a to 9e679ce Compare April 29, 2026 13:35
Copy link
Copy Markdown

@blidiselalin blidiselalin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

Copy link
Copy Markdown
Collaborator

@MiguelAFH MiguelAFH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a comment to create a separate metric for health_bench_professional. Everything else looks good

MetricSpec(
class_name="helm.benchmark.metrics.llm_jury_metrics.LLMJuryMetric",
args={
"metric_name": "health_bench_score",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to change the name of the metric to health_bench_professional_score. The annotators differ in that the health_bench uses only one prompt to evaluate and health_bench_professional uses one or two prompts, so it's not the same metric. For clarity, it would be best to have them as separate and include the new metric in the schema as well.




- name: health_bench
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order for health_bench and health_bench_professional to appear in the leaderboard, they need to be placed in one of the groups for the medhelm categories (Look at how we have benchmarks under patient_communication in the schema). Also, it seems health_bench_professional has not been added to the schema.

Copy link
Copy Markdown
Collaborator

@MiguelAFH MiguelAFH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did final edits on the schema. Looks good to me now

@pratacosmin pratacosmin merged commit 77e37fd into main May 7, 2026
6 checks passed
@pratacosmin pratacosmin deleted the PAL-1217 branch May 7, 2026 06:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants