Skip to content

Fix local storage caching issue#3219

Open
ankush0545 wants to merge 1 commit into
Netflix:masterfrom
ankush0545:fix-localstorage-cache
Open

Fix local storage caching issue#3219
ankush0545 wants to merge 1 commit into
Netflix:masterfrom
ankush0545:fix-localstorage-cache

Conversation

@ankush0545

Copy link
Copy Markdown

Summary

Fixes an issue in LocalStorage where existing files could remain stale due to caching behavior when writing data to the local datastore.

Changes

  • Updated LocalStorage save logic to handle file writes more consistently.
  • Improved behavior when files already exist.
  • Ensured metadata and file contents remain synchronized after updates.

Testing

  • Reproduced the issue locally.
  • Verified that updated files are correctly reflected after writing.
  • Ran existing tests and confirmed they pass.

Impact

This change improves reliability when using LocalStorage as a datastore backend and helps prevent stale cached data from being served.

@greptile-apps

greptile-apps Bot commented May 30, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR modifies a single guard condition in LocalStorage.save_bytes with the intent of fixing stale cached data, but the change inverts the overwrite logic rather than correcting it.

  • Inverted skip condition: removing not from if not overwrite and os.path.exists(full_path) causes overwrite=True calls to silently skip every existing file (never updating them) and overwrite=False calls to always overwrite existing files (ignoring the flag entirely).
  • Correct original behavior: the original condition correctly skips writes only when the caller explicitly opts out of overwriting (overwrite=False) and the file already exists — reverting this change is the right course of action.

Confidence Score: 1/5

Not safe to merge — the change inverts the overwrite guard and will cause silent data-loss or incorrect skipping on every call to save_bytes where a file already exists.

The single changed line removes the not from the overwrite guard, which means any caller using overwrite=True (e.g. updating an artifact) will silently no-op on existing files, while any caller relying on the default overwrite=False to protect existing data will have those files silently replaced. Both failure modes are silent and affect core datastore write semantics.

metaflow/plugins/datastores/local_storage.py — the only changed file; the one-character edit to the overwrite guard inverts its behavior completely.

Important Files Changed

Filename Overview
metaflow/plugins/datastores/local_storage.py Removes not from the overwrite guard in save_bytes, inverting the skip logic so that existing files are never overwritten when overwrite=True and are always overwritten when overwrite=False.

Reviews (1): Last reviewed commit: "Fix local storage caching issue" | Re-trigger Greptile

Comment on lines +138 to 139
if overwrite and os.path.exists(full_path):
continue

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0 The guard condition has been inverted, breaking the semantics of the overwrite parameter entirely. With this change, when overwrite=True (caller explicitly wants to replace an existing file) the function now skips the write and does nothing. Conversely, when overwrite=False (the default, meaning "don't replace existing files") the continue is never reached and every existing file gets silently overwritten. The base-class docstring in datastore_storage.py is clear: overwrite=False must leave existing objects untouched. Removing the not reverses both cases simultaneously.

Suggested change
if overwrite and os.path.exists(full_path):
continue
if not overwrite and os.path.exists(full_path):
continue

@talsperre

Copy link
Copy Markdown
Collaborator

Can you show how your reprodcuded the bug and how you fixed this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants