Improvements for PR #100 - ignore_metadata for assert_approx_df_equality#184
Improvements for PR #100 - ignore_metadata for assert_approx_df_equality#184alexott wants to merge 2 commits into
Conversation
…df_equality The MrPowers#100 started to add support for `ignore_metadata` in `assert_approx_df_equality`, but most of the work was already merged in MrPowers#182. This PR fixes the missing piece in the `assert_approx_df_equality` implementation
There was a problem hiding this comment.
Pull request overview
This PR completes support for the ignore_metadata option in assert_approx_df_equality, aligning approximate DataFrame comparisons with existing schema-comparison capabilities (notably the schema comparer updates introduced in earlier work).
Changes:
- Add an
ignore_metadataflag toassert_approx_df_equality. - Pass
ignore_metadatathrough toassert_schema_equalityduring approx comparisons. - Add a unit test verifying metadata differences can be ignored for approx DataFrame equality.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
chispa/dataframe_comparer.py |
Wires ignore_metadata into assert_approx_df_equality by threading it into schema equality checks. |
tests/test_dataframe_comparer.py |
Adds a regression test ensuring approx equality can ignore schema metadata differences. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def assert_approx_df_equality( | ||
| df1: DataFrame, | ||
| df2: DataFrame, | ||
| precision: float, | ||
| ignore_nullable: bool = False, | ||
| transforms: list[Callable] | None = None, # type: ignore[type-arg] | ||
| allow_nan_equality: bool = False, | ||
| ignore_column_order: bool = False, | ||
| ignore_row_order: bool = False, | ||
| ignore_columns: list[str] | None = None, | ||
| ignore_metadata: bool = False, | ||
| formats: FormattingConfig | None = None, | ||
| ) -> None: |
There was a problem hiding this comment.
Fixed by moving ignore_metadata after the existing positional tail in both DataFrame assertion functions and the Chispa.assert_df_equality wrapper. Added regression coverage for legacy positional formats calls so the old call order stays compatible.
There was a problem hiding this comment.
I'm not sure about this - this change will make it compatible with https://github.com/MrPowers/chispa/blob/main/chispa/dataframe_comparer.py#L69 and https://github.com/MrPowers/chispa/blob/main/chispa/__init__.py#L43, although we need to unify it to put ignore_metadata before ignore_columns
There was a problem hiding this comment.
I would say the back-compatibility is the first priority. It may be a good idea in theory to prepare and make something like 1.0 release, but due to lack of maintenance I'm against it now.
|
@alexott sorry, I lost this one. Could you tell me please, what is the status? I'm kinda agreed with Copilot about breaking nature of this change. What do you think? |
PR Checklist
docsis updatedDescription of changes
The #100 started to add support for
ignore_metadatainassert_approx_df_equality, but most of the work was already merged in #182. This PR fixes the missing piece in theassert_approx_df_equalityimplementation