mywhoosh: add file hash to .uploaded_files.json#77
Conversation
jat255
left a comment
There was a problem hiding this comment.
Thanks for the contribution! In general, I like this idea, as it adds support for "upload all" for MyWhoosh, which wasn't added when I added support for MyWhoosh (since I always use the app in "monitor" mode).
There are a few problems that need to be addressed, though. There are more details in the in-line comments, but generally:
Backward compatibility (needs fixing before merge)
Existing .uploaded_files.json files contain plain strings. After the upgrade, the deduplication comparison:
files = [i for i in files if i not in uploaded_files]...compares dicts against strings, which never match. Every previously-uploaded file will be re-uploaded on the first run after upgrading. A migration step is needed after loading uploaded_files in both upload_all and on_modified:
# Migrate legacy string entries (pre-hash format)
uploaded_files = [
e if isinstance(e, dict) else {"name": e, "hash": None}
for e in uploaded_files
]And a name-only match for migrated entries where hash is None.
Minor: unclosed file handles
Both the on_modified handler and upload_all open files without a context manager:
open(dir.joinpath(i), "rb").read()Since this project uses pathlib throughout, the idiomatic fix is:
with source_file.open("rb") as fh: # in on_modified
file_hash = sha256(fh.read()).hexdigest()
with dir.joinpath(i).open("rb") as fh: # in upload_all
file_hash = sha256(fh.read()).hexdigest()Minor: misplaced log in upload_all
The debug log Adding "{f["name"]}" with hash... sits inside if output: but uploaded_files.append(f) runs unconditionally outside that block. The log should move to just before the append so it fires regardless of which branch executed:
_logger.debug(
f'Adding "{f["name"]}" with hash "{f["hash"]}" to "uploaded_files"'
)
uploaded_files.append(f)No tests
Given the project requires 100% test coverage, please add tests for:
- A file with the same name but different hash gets uploaded (the core bug fix)
- Backward compatibility: a legacy string-format
.uploaded_files.jsondoes not cause re-uploads after upgrade
Also, I have some concerns that these changes were not tested at all, as running the ./run_tests.py script revealed the following failures:
FAILED tests/test_app.py::TestUploadAllFunction::test_upload_all_preinitialize - AssertionError: assert 'test1.fit' in [{'hash': '2c8c8e333842924d127d1c1cc267f51b0018659ff9307b08591b01bfdf34ff19', 'name': 'test1.fit'}, {'hash': '2c8c8e333842924d127d1c1cc267f51b0018659ff9307b08591b01b...
FAILED tests/test_app.py::TestNewFileEventHandler::test_mywhoosh_version_update_no_duplicate - AttributeError: 'builtin_function_or_method' object has no attribute 'joinpath'
FAILED tests/test_app.py::TestUploadAllFunction::test_upload_all_new_files - AssertionError: assert 'test_activity.fit' in [{'hash': '2c8c8e333842924d127d1c1cc267f51b0018659ff9307b08591b01bfdf34ff19', 'name': 'test_activity.fit'}]
FAILED tests/test_app.py::TestNewFileEventHandler::test_on_modified_processes_mywhoosh_file - AttributeError: 'builtin_function_or_method' object has no attribute 'joinpath'
FAILED tests/test_app.py::TestNewFileEventHandler::test_on_modified_adds_to_uploaded_files - AttributeError: 'builtin_function_or_method' object has no attribute 'joinpath'
FAILED tests/test_app.py::TestNewFileEventHandler::test_on_modified_tracking_idempotent - AttributeError: 'builtin_function_or_method' object has no attribute 'joinpath'
FAILED tests/test_app.py::TestUploadAllFunction::test_upload_all_skips_already_uploaded - AssertionError: Expected 'edit_fit' to not have been called. Called 1 times.
FAILED tests/test_integration_profiles.py::TestMultiProfileWorkflows::test_batch_upload_with_profile - AssertionError: assert 'activity1.fit' in [{'hash': '2c8c8e333842924d127d1c1cc267f51b0018659ff9307b08591b01bfdf34ff19', 'name': 'activity1.fit'}, {'hash': '2c8c8e333842924d127d1c1cc267f51b0018659ff9307b0...
Please add the tests for the new behavior and ensure the existing test suite passes locally.
| file = { | ||
| "name": source_file.name, | ||
| "hash": sha256( | ||
| open(dir.joinpath(source_file.name), "rb").read() |
There was a problem hiding this comment.
This should use Pathlib.Path and a context manager to ensure the file is properly closed. Also, I'm assuming this was copied from the change in upload_all() without testing? dir doesn't exist in this scope, and will be the built-in dir() function, so this will raise a NameError.
| files = [i for i in files if not i.endswith("_modified.fit")] | ||
| # append file hash for each file | ||
| files = [ | ||
| {"name": i, "hash": sha256(open(dir.joinpath(i), "rb").read()).hexdigest()} |
There was a problem hiding this comment.
This should use pathlib.Path and a context manager to ensure the file is properly closed. Something like:
with open(dir.joinpath(source_file.name), "rb") as fh:
file_hash = sha256(fh.read()).hexdigest()
| original_path=Path(f["name"]), | ||
| dryrun=dryrun, | ||
| ) | ||
| _logger.debug( |
There was a problem hiding this comment.
This was a pre-existing issue, but should be fixed here. The message saysAdding "{f["name"]}" with hash "{f["hash"]}" to "uploaded_files", but this log appears inside if output: (i.e. only when upload succeeds). However, uploaded_files.append(f) at line 413 runs unconditionally (outside that if block), which is consistent with existing behavior. The debug log placement is misleading but not broken. The log should go at line 433 right before uploaded_files.append(f)
| for i in files | ||
| ] | ||
| # remove files found in the "already uploaded" list | ||
| files = [i for i in files if i not in uploaded_files] |
There was a problem hiding this comment.
Backward compatibility break
When a user upgrades, their existing .uploaded_files.json will contain plain strings from the current code:
["MyNewActivity-5.6.1.fit", "MyNewActivity-5.7.0.fit"]The new code in this PR converts files to a list of dicts, then runs:
files = [i for i in files if i not in uploaded_files]A dict {"name": "MyNewActivity-5.6.1.fit", "hash": "..."} will never equal the string "MyNewActivity-5.6.1.fit", so all previously-uploaded files will be re-uploaded on the first run after upgrading.
A migration step is needed after loading uploaded_files:
# Migrate legacy string entries (pre-hash format)
uploaded_files = [
e if isinstance(e, dict) else {"name": e, "hash": None}
for e in uploaded_files
]Then adjust deduplication to match on name-only for legacy entries where hash is None:
def _already_uploaded(f: dict, uploaded: list) -> bool:
for entry in uploaded:
if entry["name"] == f["name"]:
if entry["hash"] is None or entry["hash"] == f["hash"]:
return True
return False
files = [f for f in files if not _already_uploaded(f, uploaded_files)]The same migration is needed in on_modified, which loads the same file independently.
Problem
When uploading activities with
fit-file-faker -uaonly the first MyWhoosh activity is uploaded. Subsequent activity uploads are removed from the list of FIT files here https://github.com/jat255/Fit-File-Faker/blob/main/fit_file_faker/app.py#L387-L388 and therefore skipped.This is because MyWhoosh does not store FIT files with a unique name but one per app version which overwrites previous FIT files:
Fix
Instead of pushing the FIT filename to
FILES_UPLOADED_NAMEonly, this PR adds objects per uploaded FIT file including a hash of the FIT file content.