Skip to content

refactor: Move IO specifics out of Schema, Collection#109

Merged
Andreas Albert (AndreasAlbertQC) merged 27 commits intomainfrom
2025-08-06_refactor-io
Aug 29, 2025
Merged

refactor: Move IO specifics out of Schema, Collection#109
Andreas Albert (AndreasAlbertQC) merged 27 commits intomainfrom
2025-08-06_refactor-io

Conversation

@AndreasAlbertQC
Copy link
Copy Markdown
Collaborator

@AndreasAlbertQC Andreas Albert (AndreasAlbertQC) commented Aug 6, 2025

Motivation

The current implementation of parquet-based storage is strongly coupled to the implementation of the Schema and Collection classes. This makes it hard to reuse parts of the logic that would also be useful for other future storage backends. To prepare the implementation of such future storage backends, this PR refactors the current logic to introduce a clearer interface for storage backends.

Changes

  • Introduced the notion of a StorageBackend, i.e. a piece of code encapsulating a way of storing data and metadata.
  • Refactored the parquet storage logic to use the StorageBackend interface

@codecov
Copy link
Copy Markdown

codecov Bot commented Aug 6, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (d8916e8) to head (f68fbfc).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##              main      #109    +/-   ##
==========================================
  Coverage   100.00%   100.00%            
==========================================
  Files           42        45     +3     
  Lines         2450      2573   +123     
==========================================
+ Hits          2450      2573   +123     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread dataframely/schema.py Outdated
@AndreasAlbertQC
Copy link
Copy Markdown
Collaborator Author

ping Oliver Borchert (@borchero)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some cosmetic comments, thanks! 🚀

Comment thread dataframely/_serialization.py Outdated
Comment thread dataframely/_serialization.py Outdated
Comment thread dataframely/schema.py
Comment thread dataframely/_serialization.py Outdated
Comment thread dataframely/_serialization.py Outdated
@AndreasAlbertQC
Copy link
Copy Markdown
Collaborator Author

Thanks Oliver Borchert (@borchero)! While refactoring into multiple files, I realized I had completely missed that FailureInfo also needs to be serialized. I implemented this analogously now.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go :D

Comment thread dataframely/_storage/base.py
@AndreasAlbertQC Andreas Albert (AndreasAlbertQC) merged commit e22274b into main Aug 29, 2025
20 checks passed
@AndreasAlbertQC Andreas Albert (AndreasAlbertQC) deleted the 2025-08-06_refactor-io branch August 29, 2025 08:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants