Skip to content

Fix CorpusLevelF1Score(None) to report positive-class F1, not max-per-class#1275

Open
iamsharduld wants to merge 1 commit into
huggingface:mainfrom
iamsharduld:fix/corpus-f1-binary
Open

Fix CorpusLevelF1Score(None) to report positive-class F1, not max-per-class#1275
iamsharduld wants to merge 1 commit into
huggingface:mainfrom
iamsharduld:fix/corpus-f1-binary

Conversation

@iamsharduld

Copy link
Copy Markdown

What

CorpusLevelF1Score.compute_corpus (the num_classes == 2 path) returns
np.max(f1_score(golds, preds, average=self.average)). When average=None — which is how
Metrics.loglikelihood_f1 instantiates it (CorpusLevelF1Score(None), used by glue:mrpc and
glue:qqp) — f1_score returns the per-class F1 array, so np.max reports the best class's
F1 rather than the positive-class binary F1, inflating the score.

golds = [0,0,0,0,0,0,1,1,1,1]
preds = [0,0,0,0,0,1,0,0,0,1]
# per-class F1: [0.714 (class 0), 0.333 (class 1)]
CorpusLevelF1Score(None).compute_corpus(items)  # 0.714  (np.max)  -- should be 0.333

Fix

When average is None, return the positive class fscore[1]. Scalar averages
(micro/macro/weighted) are unchanged (np.max of a scalar was a no-op).

Tests

tests/test_unit_base_metrics.py::test_corpus_level_f1_binary_positive_class asserts the
positive-class F1 (1/3) on a fixture where the max-per-class would be ~0.714. Fails before,
passes after.

…-class

The num_classes==2 branch returned np.max(f1_score(..., average=None)). With
average=None, f1_score returns the per-class F1 array, so np.max reports the
BEST class's F1 instead of the positive-class binary F1 -- inflating the score.
loglikelihood_f1 (glue:mrpc, glue:qqp) uses CorpusLevelF1Score(None), so those
scores were overstated. Return the positive class (fscore[1]); scalar averages
(micro/macro/weighted) pass through unchanged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant