PAL-1217 heatlhbench original by pratacosmin · Pull Request #10 · PacificAI/medhelm

pratacosmin · 2026-04-29T08:37:28Z

this can be run like so
define in config the model_deployments.yaml
and run

 helm-run --run-entries "health_bench:model=jsl/medical-visual-lm-30b,jury_config_path=<PATH>/judges.yaml" --max-eval-instances 4 --suite my-suite --local-path ./config

helm-summarize --suite my-suite -o ./benchmark_output --schema-path  <PATH>/schema_medhelm.yaml

 helm-server --suite my-suite -o ./benchmark_output --port 8000

chakravarthik27

LGTM

blidiselalin

Looks good

MiguelAFH

Left a comment to create a separate metric for health_bench_professional. Everything else looks good

MiguelAFH · 2026-05-05T16:58:06Z

+        MetricSpec(
+            class_name="helm.benchmark.metrics.llm_jury_metrics.LLMJuryMetric",
+            args={
+                "metric_name": "health_bench_score",


I would suggest to change the name of the metric to health_bench_professional_score. The annotators differ in that the health_bench uses only one prompt to evaluate and health_bench_professional uses one or two prompts, so it's not the same metric. For clarity, it would be best to have them as separate and include the new metric in the schema as well.

MiguelAFH · 2026-05-05T17:44:11Z


+
+
+  - name: health_bench


In order for health_bench and health_bench_professional to appear in the leaderboard, they need to be placed in one of the groups for the medhelm categories (Look at how we have benchmarks under patient_communication in the schema). Also, it seems health_bench_professional has not been added to the schema.

…_scenario

MiguelAFH

Did final edits on the schema. Looks good to me now

pratacosmin requested review from blidiselalin, chakravarthik27 and iulianigas April 29, 2026 08:37

pratacosmin self-assigned this Apr 29, 2026

chakravarthik27 approved these changes Apr 29, 2026

View reviewed changes

pratacosmin force-pushed the PAL-1217 branch 4 times, most recently from 7f7b20a to 9e679ce Compare April 29, 2026 13:35

iulianigas approved these changes May 5, 2026

View reviewed changes

blidiselalin approved these changes May 5, 2026

View reviewed changes

pratacosmin force-pushed the PAL-1217 branch from a5e356e to 84bf4f7 Compare May 5, 2026 08:05

pratacosmin requested a review from MiguelAFH May 5, 2026 08:10

MiguelAFH requested changes May 5, 2026

View reviewed changes

MiguelAFH reviewed May 5, 2026

View reviewed changes

pratacosmin and others added 6 commits May 6, 2026 11:42

PAL-1217 heatlhbench original

71ed2e1

PAL-1217 heatlhbench original

97c3ce1

fix: format long strings for better readability

c855bff

fix: improved the static typing hints and codebase

0484584

fix: remove unused imports in health_bench_annotator and health_bench…

0b0d1a2

…_scenario

PAL-1217 fix: update schema and score

0f23eaa

pratacosmin force-pushed the PAL-1217 branch from 84bf4f7 to 0f23eaa Compare May 6, 2026 08:42

pratacosmin requested a review from MiguelAFH May 6, 2026 09:57

MiguelAFH added 2 commits May 6, 2026 09:09

Update schema_medhelm.yaml

bd707fb

Update schema_medhelm.yaml

d51762d

MiguelAFH approved these changes May 6, 2026

View reviewed changes

pratacosmin merged commit 77e37fd into main May 7, 2026
6 checks passed

pratacosmin deleted the PAL-1217 branch May 7, 2026 06:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PAL-1217 heatlhbench original#10

PAL-1217 heatlhbench original#10
pratacosmin merged 8 commits intomainfrom
PAL-1217

pratacosmin commented Apr 29, 2026

Uh oh!

chakravarthik27 left a comment

Uh oh!

blidiselalin left a comment

Uh oh!

MiguelAFH left a comment

Uh oh!

MiguelAFH May 5, 2026

Uh oh!

MiguelAFH May 5, 2026

Uh oh!

MiguelAFH left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

pratacosmin commented Apr 29, 2026

Uh oh!

chakravarthik27 left a comment

Choose a reason for hiding this comment

Uh oh!

blidiselalin left a comment

Choose a reason for hiding this comment

Uh oh!

MiguelAFH left a comment

Choose a reason for hiding this comment

Uh oh!

MiguelAFH May 5, 2026

Choose a reason for hiding this comment

Uh oh!

MiguelAFH May 5, 2026

Choose a reason for hiding this comment

Uh oh!

MiguelAFH left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants