Skip to content

Fix reversed gold index in boolq:contrastset prompt#1277

Open
vineethsaivs wants to merge 1 commit into
huggingface:mainfrom
vineethsaivs:fix/boolq-contrastset-gold-index
Open

Fix reversed gold index in boolq:contrastset prompt#1277
vineethsaivs wants to merge 1 commit into
huggingface:mainfrom
vineethsaivs:fix/boolq-contrastset-gold-index

Conversation

@vineethsaivs

Copy link
Copy Markdown

What

boolq_contrastset_prompt (the prompt function for the boolq:contrastset task) builds its choices as ["Yes", "No"] but derives the gold index from a reversed ["No", "Yes"] lookup:

choices=["Yes", "No"],
gold_index=["No", "Yes"].index(line["answer"]),

So the gold index points at the opposite choice: an answer of "Yes" resolves to gold_index=1 (choices[1] == "No"), and "No" resolves to gold_index=0 (choices[0] == "Yes"). Every contrast-set sample is graded against the wrong answer.

Root cause

The ["No", "Yes"] lookup table is reversed relative to the function's own choices. The sibling boolq_prompt and both record_to_sample helpers in the same file consistently use the ["Yes", "No"] order that matches their choices, so this is an isolated typo.

Fix

Use the same ["Yes", "No"] lookup so the gold index lines up with choices:

gold_index=["Yes", "No"].index(line["answer"]),

Test

Added tests/unit/tasks/test_boolq.py asserting doc.choices[doc.gold_index] == answer for both "Yes" and "No". It fails before the change (gold resolves to the opposite choice) and passes after.

boolq_contrastset_prompt built its choices as ["Yes", "No"] but derived
the gold index from a reversed ["No", "Yes"] lookup, so every contrast-set
sample was graded against the opposite answer (answer "Yes" pointed at the
"No" choice and vice versa). The sibling boolq_prompt and both
record_to_sample helpers already use the ["Yes", "No"] order that matches
the choices, so this was an isolated typo.

Use the same ["Yes", "No"] lookup so the gold index points at the answer.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant