Add input data parameter to the encode_dataset.py by ltsabadz · Pull Request #92 · INRIA/geoarches

ltsabadz · 2025-09-24T15:18:46Z

Hi! Needed to follow docs couple of times and noticed that input data path was hardcoded as data/era5_240/full/. Would be helpful to have as an argument.

robert-DL · 2025-09-24T15:51:01Z

Hi, thx for the commit. In general, the encode_dataset.py is not really perfect as it also not allows for parallel generation or longer trajectories to be encoded .

ltsabadz · 2025-09-25T08:40:28Z

I also noticed that when you run the script on a subset of the data (e.g. 2007–2018), it still starts by saving files named as 1979 rather than the first year in the subset.

Another issue: if an older file like era5_240_pred_1979_0h.nc is already present in the output directory, and you try to run the script only for later years, the entire dataset gets skipped because of the following block:

current_year = 1979
xr_list = []
for i, batch in tqdm(enumerate(dl)):
    fname = Path(args.output_path).joinpath(f"era5_240_pred_{current_year}_0h.nc")
    if fname.exists():
        continue

But that's another topic..

robert-DL · 2025-09-25T15:18:32Z

The issue u mentioned here is actually already part of another issue.
Could you rebase to dev?

ltsabadz · 2025-10-01T12:24:19Z

The issue u mentioned here is actually already part of another issue. Could you rebase to dev?

Rebased and changed the base branch to dev

ltsabadz requested a review from robert-DL as a code owner September 24, 2025 15:18

ltsabadz added 2 commits October 1, 2025 12:17

add input data parameter to the script

cbbc798

fix linting

0af9556

ltsabadz force-pushed the feature/add_input_encode_dataset branch from 01d4abd to 0af9556 Compare October 1, 2025 12:18

ltsabadz requested a review from gcouairon as a code owner October 1, 2025 12:18

ltsabadz changed the base branch from main to dev October 1, 2025 12:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add input data parameter to the encode_dataset.py#92

Add input data parameter to the encode_dataset.py#92
ltsabadz wants to merge 2 commits intoINRIA:devfrom
ltsabadz:feature/add_input_encode_dataset

ltsabadz commented Sep 24, 2025

Uh oh!

robert-DL commented Sep 24, 2025

Uh oh!

ltsabadz commented Sep 25, 2025

Uh oh!

robert-DL commented Sep 25, 2025

Uh oh!

ltsabadz commented Oct 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ltsabadz commented Sep 24, 2025

Uh oh!

robert-DL commented Sep 24, 2025

Uh oh!

ltsabadz commented Sep 25, 2025

Uh oh!

robert-DL commented Sep 25, 2025

Uh oh!

ltsabadz commented Oct 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants