Fix reversed gold index in boolq:contrastset prompt by vineethsaivs · Pull Request #1277 · huggingface/lighteval

vineethsaivs · 2026-06-29T19:56:07Z

What

boolq_contrastset_prompt (the prompt function for the boolq:contrastset task) builds its choices as ["Yes", "No"] but derives the gold index from a reversed ["No", "Yes"] lookup:

choices=["Yes", "No"],
gold_index=["No", "Yes"].index(line["answer"]),

So the gold index points at the opposite choice: an answer of "Yes" resolves to gold_index=1 (choices[1] == "No"), and "No" resolves to gold_index=0 (choices[0] == "Yes"). Every contrast-set sample is graded against the wrong answer.

Root cause

The ["No", "Yes"] lookup table is reversed relative to the function's own choices. The sibling boolq_prompt and both record_to_sample helpers in the same file consistently use the ["Yes", "No"] order that matches their choices, so this is an isolated typo.

Fix

Use the same ["Yes", "No"] lookup so the gold index lines up with choices:

gold_index=["Yes", "No"].index(line["answer"]),

Test

Added tests/unit/tasks/test_boolq.py asserting doc.choices[doc.gold_index] == answer for both "Yes" and "No". It fails before the change (gold resolves to the opposite choice) and passes after.

boolq_contrastset_prompt built its choices as ["Yes", "No"] but derived the gold index from a reversed ["No", "Yes"] lookup, so every contrast-set sample was graded against the opposite answer (answer "Yes" pointed at the "No" choice and vice versa). The sibling boolq_prompt and both record_to_sample helpers already use the ["Yes", "No"] order that matches the choices, so this was an isolated typo. Use the same ["Yes", "No"] lookup so the gold index points at the answer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix reversed gold index in boolq:contrastset prompt#1277

Fix reversed gold index in boolq:contrastset prompt#1277
vineethsaivs wants to merge 1 commit into
huggingface:mainfrom
vineethsaivs:fix/boolq-contrastset-gold-index

vineethsaivs commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

vineethsaivs commented Jun 29, 2026

What

Root cause

Fix

Test

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant