Fix reversed gold index in boolq:contrastset prompt#1277
Open
vineethsaivs wants to merge 1 commit into
Open
Conversation
boolq_contrastset_prompt built its choices as ["Yes", "No"] but derived the gold index from a reversed ["No", "Yes"] lookup, so every contrast-set sample was graded against the opposite answer (answer "Yes" pointed at the "No" choice and vice versa). The sibling boolq_prompt and both record_to_sample helpers already use the ["Yes", "No"] order that matches the choices, so this was an isolated typo. Use the same ["Yes", "No"] lookup so the gold index points at the answer.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
boolq_contrastset_prompt(the prompt function for theboolq:contrastsettask) builds its choices as["Yes", "No"]but derives the gold index from a reversed["No", "Yes"]lookup:So the gold index points at the opposite choice: an
answerof"Yes"resolves togold_index=1(choices[1] == "No"), and"No"resolves togold_index=0(choices[0] == "Yes"). Every contrast-set sample is graded against the wrong answer.Root cause
The
["No", "Yes"]lookup table is reversed relative to the function's ownchoices. The siblingboolq_promptand bothrecord_to_samplehelpers in the same file consistently use the["Yes", "No"]order that matches their choices, so this is an isolated typo.Fix
Use the same
["Yes", "No"]lookup so the gold index lines up withchoices:Test
Added
tests/unit/tasks/test_boolq.pyassertingdoc.choices[doc.gold_index] == answerfor both"Yes"and"No". It fails before the change (gold resolves to the opposite choice) and passes after.