Skip to content

Preserve within-file duplicates and simplify matching logic#34

Merged
danischm merged 7 commits intomainfrom
ds-merge
Mar 10, 2026
Merged

Preserve within-file duplicates and simplify matching logic#34
danischm merged 7 commits intomainfrom
ds-merge

Conversation

@danischm
Copy link
Copy Markdown
Member

@danischm danischm commented Nov 12, 2025

Summary

Improves YAML list merging behavior to preserve within-file duplicates while simplifying the
matching logic. This makes the behavior more predictable and intuitive.

Changes

1. Preserve Within-File Duplicates (Order-Independent)

Previously, deduplicate=True would remove ALL duplicate list items globally, including
duplicates within a single file.

New behavior: If ANY file contains duplicates in a list, that entire list is concatenated
(no merging) to preserve all duplicates. This ensures:

  • ✅ All within-file duplicates preserved (from any file)
  • ✅ Order-independent results (same output regardless of file load order)
  • ✅ Predictable behavior ("duplicates disable merging")

Example:

# file1.yaml
devices:
  - name: switch1
  - name: switch1  # duplicate

# file2.yaml
devices:
  - name: switch1
    ip: 192.168.1.1

Before: 1 device (file1 duplicates lost)
After: 3 devices (all preserved, no merging)

2. Relaxed Matching Logic

Previously, dict items would NOT merge if both sides had unique primitive keys not present in
the other. This prevented useful scenarios like combining complementary configuration data.

New behavior: Items merge when all shared primitive keys have matching values — unique
keys present on only one side don't prevent merging and are combined in the result. At least
one primitive key must be shared for matching to occur.

Example:

# file1.yaml
devices:
  - name: switch1
    vlan: 100

# file2.yaml
devices:
  - name: switch1
    port: eth0

Before: 2 separate items (both have unique keys)
After: 1 merged item {name: switch1, vlan: 100, port: eth0}

Implementation Details

  • Added _extract_primitives() helper to extract non-dict/non-list key-value pairs from a dict
  • Rewrote _has_duplicates_in_list() with inverted index ((key, value)[indices]) instead of O(n²) pairwise comparison
  • Added _merge_list_items_indexed() using inverted index for batch list merging in merge_dict()
  • Kept _items_would_merge() and merge_list_item() unchanged for public API compatibility
  • Modified merge_dict() to check both source and destination lists for duplicates before merging
  • Updated docstrings with clear examples of new behavior

Breaking Changes

  1. Duplicate preservation: Lists containing duplicates in ANY file will now be concatenated
    instead of merged, preserving all within-file duplicates. This may result in more items than
    before if you have duplicates and previously relied on cross-file merging.

  2. Relaxed matching: Items now merge when all shared primitive keys match, even if each side
    also has unique keys not present in the other. Complementary configuration data will be combined
    instead of kept separate. Items that previously stayed separate may now merge.

Testing

  • ✅ All existing unit tests pass (with updated expectations)
  • ✅ New test cases for both-sides-unique-keys scenario
  • ✅ Verified duplicate preservation from any file
  • ✅ Verified order-independence
  • ✅ Ruff linting passes
  • ✅ Mypy type checking passes

@danischm danischm marked this pull request as draft November 12, 2025 10:42
@danischm danischm marked this pull request as ready for review March 10, 2026 08:56
@danischm danischm merged commit 48598cd into main Mar 10, 2026
7 checks passed
@danischm danischm deleted the ds-merge branch March 10, 2026 08:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant