Fix CorpusLevelF1Score(None) to report positive-class F1, not max-per-class by iamsharduld · Pull Request #1275 · huggingface/lighteval

iamsharduld · 2026-06-25T20:05:40Z

What

CorpusLevelF1Score.compute_corpus (the num_classes == 2 path) returns
np.max(f1_score(golds, preds, average=self.average)). When average=None — which is how
Metrics.loglikelihood_f1 instantiates it (CorpusLevelF1Score(None), used by glue:mrpc and
glue:qqp) — f1_score returns the per-class F1 array, so np.max reports the best class's
F1 rather than the positive-class binary F1, inflating the score.

golds = [0,0,0,0,0,0,1,1,1,1]
preds = [0,0,0,0,0,1,0,0,0,1]
# per-class F1: [0.714 (class 0), 0.333 (class 1)]
CorpusLevelF1Score(None).compute_corpus(items)  # 0.714  (np.max)  -- should be 0.333

Fix

When average is None, return the positive class fscore[1]. Scalar averages
(micro/macro/weighted) are unchanged (np.max of a scalar was a no-op).

Tests

tests/test_unit_base_metrics.py::test_corpus_level_f1_binary_positive_class asserts the
positive-class F1 (1/3) on a fixture where the max-per-class would be ~0.714. Fails before,
passes after.

…-class The num_classes==2 branch returned np.max(f1_score(..., average=None)). With average=None, f1_score returns the per-class F1 array, so np.max reports the BEST class's F1 instead of the positive-class binary F1 -- inflating the score. loglikelihood_f1 (glue:mrpc, glue:qqp) uses CorpusLevelF1Score(None), so those scores were overstated. Return the positive class (fscore[1]); scalar averages (micro/macro/weighted) pass through unchanged.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix CorpusLevelF1Score(None) to report positive-class F1, not max-per-class#1275

Fix CorpusLevelF1Score(None) to report positive-class F1, not max-per-class#1275
iamsharduld wants to merge 1 commit into
huggingface:mainfrom
iamsharduld:fix/corpus-f1-binary

iamsharduld commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

iamsharduld commented Jun 25, 2026

What

Fix

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant