Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 36 additions & 8 deletions metaflow/client/filecache.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,12 @@
from tempfile import NamedTemporaryFile
from hashlib import sha1

from urllib.parse import urlparse
try:
# Python 3
from urllib.parse import urlparse
except ImportError:
# Python 2
from urlparse import urlparse
Comment on lines +10 to +15

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Python 2 dead code — urlparse fallback unreachable

Metaflow requires Python 3.6+, so the except ImportError branch importing from urlparse import urlparse can never execute. The original single-line import was correct. This try/except adds noise without benefit.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The duplicate tmpfile.close() is required. If an i/p exception occurs during write(), flush(), or fsync(), the file is never closed in the try block.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your comment seems to be addressing a different change than what my comment was about — I flagged the urlparse try/except as dead code (since Metaflow requires Python 3.6+), while you're discussing the tmpfile.close() in the except block of create_file.

On the tmpfile.close() point: you're correct. If write(), flush(), or fsync() raises, the tmpfile.close() inside the try block is never reached, so the handle would leak without the one in except. That's a valid and necessary pattern.

My original comment stands separately — the urlparse try/except wrapping from urllib.parse import urlparse is still unreachable dead code given Metaflow's Python 3.6+ requirement, and the original single-line import was fine.

Tip: You can customize Greptile's behavior for this repo with .greptile/rules.md and .greptile/config.json.


from metaflow.datastore import FlowDataStore
from metaflow.datastore.content_addressed_store import BlobCache
Expand All @@ -24,6 +29,22 @@

NEW_FILE_QUARANTINE = 10

if sys.version_info[0] >= 3:
_replace = os.replace
else:
import errno

def _replace(src, dst):
try:
os.rename(src, dst)
except OSError as e:
if e.errno == errno.EEXIST or (os.name == "nt" and e.errno in (13, 17)):
os.remove(dst)
os.rename(src, dst)
else:
raise
Comment on lines +32 to +45

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Python 2 _replace fallback is dead code

The else branch (Python < 3) and the custom _replace function are unreachable given Metaflow's Python 3.6+ requirement. os.replace (added in Python 3.3) is always available, handles atomic overwrite on all platforms including Windows, and can be called directly. The sys.version_info guard and the fallback implementation just add complexity. Consider replacing all uses of _replace with os.replace directly.



if sys.version_info[0] >= 3 and sys.version_info[1] >= 2:

def od_move_to_end(od, key):
Expand Down Expand Up @@ -113,7 +134,8 @@ def get_log_legacy(

log = task_ds.load_log_legacy(logtype, attempt_override=attempt)
# Store this in the file cache as well
self.create_file(path, log)
if log is not None:
self.create_file(path, log)
return log

def get_legacy_log_size(
Expand Down Expand Up @@ -263,16 +285,22 @@ def create_file(self, path, value):
dirname = os.path.dirname(path)
try:
FileCache._makedirs(dirname)
except: # noqa E722
except Exception: # noqa E722
raise FileCacheException("Could not create directory: %s" % dirname)
tmpfile = NamedTemporaryFile(dir=dirname, prefix="dlobj", delete=False)
# Now write out the file
try:
tmpfile.write(value)
tmpfile.flush()
os.rename(tmpfile.name, path)
except: # noqa E722
os.unlink(tmpfile.name)
os.fsync(tmpfile.fileno())
tmpfile.close()

_replace(tmpfile.name, path)

except Exception: # noqa E722
tmpfile.close()
if os.path.exists(tmpfile.name):
os.unlink(tmpfile.name)
Comment on lines +300 to +303

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Redundant close() call in the exception handler

tmpfile.close() is already called in the try block (line 296) before _replace. If _replace raises, the except block calls tmpfile.close() again on an already-closed file. While this is safe (Python's file close() is idempotent), it may be clearer to restructure with finally or remove the duplicate call.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

raise
size = os.path.getsize(path)
self._total += size
Expand Down Expand Up @@ -305,8 +333,8 @@ def _index_objects(self):
sha, ext = os.path.splitext(obj)
if ext in ["cached", "blob"]:
path = os.path.join(root, obj)
objects.insert(
0, (os.path.getctime(path), os.path.getsize(path), path)
objects.append(
(os.path.getctime(path), os.path.getsize(path), path)
)

self._total = sum(size for _, size, _ in objects)
Expand Down
Loading