Skip to content

[ML] Randomize train/test cluster boundary assignment in RDataLoader#22196

Merged
siliataider merged 2 commits into
root-project:masterfrom
siliataider:rdataloader
May 18, 2026
Merged

[ML] Randomize train/test cluster boundary assignment in RDataLoader#22196
siliataider merged 2 commits into
root-project:masterfrom
siliataider:rdataloader

Conversation

@siliataider
Copy link
Copy Markdown
Contributor

@siliataider siliataider commented May 8, 2026

This Pull request:

Previously RDataLoader always assigned the first fraction of each cluster to training and the last fraction to validation. This meant that across different runs, the train/val split was always identical regardless of the seed.

This PR fixes the issue by using the shuffle seed to randomly decide, per cluster, whether training takes the prefix or suffix of that cluster.

This PR fixes #22194

@siliataider siliataider requested a review from vepadulano as a code owner May 8, 2026 16:17
@siliataider siliataider self-assigned this May 8, 2026
@siliataider siliataider added the in:ML Everything under ROOT/ML label May 8, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 8, 2026

Test Results

    22 files      22 suites   3d 12h 6m 31s ⏱️
 3 855 tests  3 837 ✅ 0 💤 18 ❌
77 031 runs  77 012 ✅ 0 💤 19 ❌

For more details on these failures, see this check.

Results for commit 5d6dbf8.

♻️ This comment has been updated with latest results.

Copy link
Copy Markdown
Member

@vepadulano vepadulano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! See minor considerations from my side.

Comment thread bindings/pyroot/pythonizations/test/ml_dataloader.py Outdated
Comment thread tree/ml/inc/ROOT/ML/RClusterLoader.hxx
@siliataider siliataider force-pushed the rdataloader branch 2 times, most recently from ef1b198 to 87db451 Compare May 15, 2026 14:43
@dpiparo
Copy link
Copy Markdown
Member

dpiparo commented May 18, 2026

Is a backport to 6.40 needed?

@siliataider siliataider added this to the 6.40.00 milestone May 18, 2026
@siliataider siliataider merged commit b5ecb80 into root-project:master May 18, 2026
75 of 82 checks passed
@siliataider
Copy link
Copy Markdown
Contributor Author

/backport to 6.40

@root-project-bot
Copy link
Copy Markdown

Preparing to backport PR #22196 to branch 6.40 requested by siliataider

@root-project-bot
Copy link
Copy Markdown

This PR has been backported to branch 6.40: #22324

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

in:ML Everything under ROOT/ML

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RDataLoader results in same train/test split across runs

4 participants