Skip to content

Support .gitattributes export-ignore in distribution archives#515

Open
jberdine wants to merge 9 commits into
tarides:mainfrom
jberdine:export-ignore
Open

Support .gitattributes export-ignore in distribution archives#515
jberdine wants to merge 9 commits into
tarides:mainfrom
jberdine:export-ignore

Conversation

@jberdine
Copy link
Copy Markdown

Parse .gitattributes files and exclude paths marked with the export-ignore
attribute from distribution tarballs. This is the mechanism used by
git-archive to exclude files, allowing projects to exclude dev-only files
like dune-workspace from releases.

Supported patterns:

  • Exact matches: dune-workspace
  • Directory patterns: .github/**
  • Glob patterns: .log, test_, file?.txt
  • Double star: /build, src//test.ml

Paths are normalized before matching (handles ./ and ../).

Not supported:

  • Escaped patterns (\! for literal !)
  • Quoted patterns ("a b" for patterns with spaces)
  • Case insensitivity (core.ignorecase)
  • Negation patterns (!pattern)
  • Subdirectory .gitattributes files

Testing:

  • Unit tests comparing behavior against git check-attr
  • Tests cover pattern matching and parsing edge cases
  • .gitattributes content is generated from test cases to ensure sync
  • Archive integration test verifies end-to-end exclusion without requiring git

Signed-off-by: Josh Berdine josh@berdine.net

Parse .gitattributes files and exclude paths marked with the export-ignore
attribute from distribution tarballs. This is the mechanism used by
git-archive to exclude files, allowing projects to exclude dev-only files
like dune-workspace from releases.

Supported patterns:
- Exact matches: dune-workspace
- Directory patterns: .github/**
- Glob patterns: *.log, test_*, file?.txt
- Double star: **/build, src/**/test.ml

Paths are normalized before matching (handles ./ and ../).

Not supported:
- Escaped patterns (\\! for literal \!)
- Quoted patterns ("a b" for patterns with spaces)
- Case insensitivity (core.ignorecase)
- Negation patterns (\!pattern)
- Subdirectory .gitattributes files

Testing:
- Unit tests comparing behavior against git check-attr
- Tests cover pattern matching and parsing edge cases
- .gitattributes content is generated from test cases to ensure sync
- Archive integration test verifies end-to-end exclusion without requiring git

Signed-off-by: Josh Berdine <josh@berdine.net>
Comment thread CHANGES.md Outdated
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Copy link
Copy Markdown
Member

@Leonidas-from-XIV Leonidas-from-XIV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice PR, the extensive test coverage is definitely a highlight and good to see as the glob to regex code is somewhat hairy.

I have some suggestions for improvements here.

Comment thread lib/archive.ml Outdated
let path_set_of_dir dir ~exclude_paths =
let not_excluded p = Ok (not (Fpath.Set.mem (Fpath.base p) exclude_paths)) in
let path_set_of_dir dir ~exclude_paths ~export_ignore =
let not_excluded p =
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid the negation, maybe rename it to included. It's kind of hard to wrap one's head around the double negated name and then the negation in the code.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed. My thinking was that conceptually what is being done is "excluding" and so kept that base term, but fair point about double-negation.

Comment thread lib/archive.ml
(* Skip directories - they will be created implicitly when their
contents are added. This ensures that directories whose contents
are excluded via export-ignore patterns don't appear as empty
directories in the archive. *)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But isn't this what one would expect to do? If I exclude a directory, I would expect the directory to show up, but not its contents.

Git does this but its more to do with the fact that Git doesn't track directories at all, so an empty directory is unrepresentable. But dune-release doesn't have this issue.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking is that the context here is reading a .gitattributes file, and so the expectation is to interpret it the way that git archive does (as much as possible). As a user, if some workflows used git archive and some dune release, and I had to adjust the .gitattributes file because they interpreted it differently, I would assume that dune was wrong since the name .gitattributes indicates that git "owns" the format.

Comment thread lib/archive.mli
is preserved.
(** [tar dir ~exclude_paths ~export_ignore ~root ~mtime] is a (us)tar archive
that contains the file hierarchy [dir] except:
- relative hierarchies present in [exclude_paths] (basename matching)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it wouldn't make more sense to just translate exclude_paths to a Gitattributes.pattern and use the more generic mechanism. That way there's only one way to ignore paths.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could make sense. My worry is that exclude_paths excludes based on basename, so at any depth, while Gitattributes.patterns are path-anchored. Translating exclude_paths could have a semantic change if not done exactly right. So I think such a change ought to be in a separate PR, and I would want to have a much better idea for a good way to test it than I have at the moment.

Comment thread lib/gitattributes.ml Outdated
(any chars except /), [?] (single char except /), and [**] (any path
segments, but only when adjacent to /). *)
let glob_to_re pattern =
let buf = Buffer.create (String.length pattern * 2) in
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the initial size pattern * 2?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped the * 2. It is a guess, based on the observation that each char of the pattern might expand into 2 regex chars, but the buffer grows fine so it is simpler to just skip the * 2.

Comment thread lib/gitattributes.ml
let glob_to_re pattern =
let buf = Buffer.create (String.length pattern * 2) in
Buffer.add_char buf '^';
let len = String.length pattern in
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick but the length of the pattern is determined twice, so could be pulled up in the function an reused.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Comment thread lib/gitattributes.ml Outdated
(* Strip UTF-8 BOM if present at start of file *)
let content =
if String.is_prefix ~affix:"\xef\xbb\xbf" content then
String.Sub.to_string (String.sub ~start:3 content)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And use String.length utf8_bom here.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread lib/gitattributes.mli Outdated

(** {1 Patterns} *)

type pattern
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
type pattern
type t

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread tests/lib/test_gitattributes.ml Outdated
let files =
[
("CHANGES.md", "changes");
("foo.opam", "opam");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
("foo.opam", "opam");
("foo.opam", {|opam-version: "2.0"|});

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

in
List.fold_left
(fun acc file -> acc >>= fun () -> create_file file)
(Ok ()) files
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think List.iter followed by some kind of Result.List.all_unit (or the like) would be a bit easier to understand but it's of minor importance.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps. I like the way the current formulation uses >>=/Result.bind to chain the results just like the rest of the code.

Comment thread README.md Outdated
- Double star in path: `**/build`, `src/**/test.ml`
- Path normalization: handles `./` and `../` in paths

**Not supported:**
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to state what "not supported" means in practice. Will it error? Will it ignore the pattern?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I have pushed a commit that detects the unsupported pattern syntax, and updated the README to distinguish such cases (that are skipped with a warning) and other limitations (that are silent).

Avoid the double negation in the predicate name used by OS.Path.fold's
`Sat filter.

Signed-off-by: Josh Berdine <josh@berdine.net>
The Buffer grows on demand; the 2x factor was a guess with no
measurement behind it.

Signed-off-by: Josh Berdine <josh@berdine.net>
Compute the pattern length once and reuse for both the Buffer size
and the loop bound.

Signed-off-by: Josh Berdine <josh@berdine.net>
Bind `utf8_bom` so the byte sequence appears once and the skip length
is derived from it rather than a magic 3.

Signed-off-by: Josh Berdine <josh@berdine.net>
Follow the standard OCaml convention of naming the primary type of a
module `t`.

Signed-off-by: Josh Berdine <josh@berdine.net>
Signed-off-by: Josh Berdine <josh@berdine.net>
Detect negation (`\!pattern`), escape (`\\`), quoting (`"`), and
character class (`[...]`) syntax in parse_pattern and skip the pattern
with a Logs.warn rather than compiling to a regex that silently matches
no files. Change parse_pattern to return a t option to reflect this.

Split the README "not supported" list: syntactic cases are warned-and-
skipped uniformly, while case-insensitivity and subdirectory
.gitattributes are documented as separate limitations.

Signed-off-by: Josh Berdine <josh@berdine.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants