Package Specification and Ecosystem #700
Replies: 6 comments 1 reply
-
|
I like the proposal overall. I would also suggest that this new feature also deprecate the Although you don't advocate for an NF-Core style repository, that effort does have the advantage that it is trusted to be curated and reasonably reliable. I suspect one reason that dockstore hasn't taken off is that there is no requirement for the workflows to be complete (all artifacts available) or syntactically correct (who knows how many are even runnable). I have been doing a lot of analysis of WDL workflows in dockstore and many are not runnable by miniwdl and therefore arguably not spec compliant, several of the repos don't correctly identify the "main" entrypoint wdl for the workflow and several don't contain required dependencies. I'd love to see something in this RFC or related RFCs to standardize ways to identify and import artifacts that can be relied on. |
Beta Was this translation helpful? Give feedback.
-
|
This is a fantastic write up and I really feel like the spec is in need of something like this. Dependency resolution has been a challenge in many cases allowing one of two options
In reality, the ecosystem is much more complex then that and there is no good mechanism to ensure dependencies are properly resolved. I have a few general questions before launching into more ideas for things that I have encountered in the coding world
|
Beta Was this translation helpful? Give feedback.
-
|
At BioWDL we did some prior packaging work in a tool called "wdl-packager" which made a zip file that could be used by cromwell. In miniwdl this was then further refined with the That however packages whole workflows and their dependencies. So it is not that "elegant". But it works well for distributing workflows as zip packages. We are very interested in having a better solution for WDL. I propose we do some hacking to get some things working alongside making the spec so we can quickly iterate when some things in the spec are either unwieldly or things need to be added. Perhaps there is some prior art for packaging repositories that we can already use to quickly get something running. |
Beta Was this translation helpful? Give feedback.
-
|
I am 100% on board with this concept and would love to see it reach fruition. However there is one major hurdle I have yet to see addressed in a way that satisfies me, which is the proper versioning schema the WDL ecosystem should adopt. Clay's proposal mentions semantic versioning (SEMVER) which is a popular schema adopted in many fields of software engineering, though I don't think it's an ideal fit for WDL. I've found in my time writing and maintaining WDL that many of the relatively routine and minor changes to WDL source happen at the API layer (inputs and outputs), which according the SEMVER count as breaking changes that necessitate a major version bump. This contrasts with many other languages, where the API layer of packages is defined via things like exported functions, data structures, etc, which are more static and robust to breaking changes than a WDL task or workflow. WDL development is most often concerned with the API layer of the called tooling in a way that can break backward compatibility. This is something me and @adthrasher have discussed at length while working on https://github.com/stjudecloud/workflows , which has been in development and in production for years, but has still not yet adopted an official versioning scheme. We typically run historic cemented git tags/commits that rarely (if ever) change, or just run whatever is on HEAD of I'd love to hear what other groups and WDL authors think of this. I am short on proposed solutions for WDL versioning, and would appreciate other ideas from the community! Maybe SEMVER is the best we can do and the WDL ecosystem will simply have more major version churn than other language's ecosystems, or maybe there's a solution I haven't thought of. |
Beta Was this translation helpful? Give feedback.
-
|
I love this idea, especially since we're working from a similar mindset with our WILDS WDL Library. Some thoughts:
|
Beta Was this translation helpful? Give feedback.
-
|
Side note: I find GitHub discussions to be a really difficult format for design collaboration. Perhaps in the future we should solicit feedback on designs like this one via a pull request? I'll try to give some high-level feedback here. Reliance on
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Module Specification and Ecosystem
Overview
This RFC proposes a module specification and ecosystem for WDL. It defines the structure of a WDL module, the manifest format (
module.json) that describes each module and its dependencies, and how modules are imported in consumer workflows.As a motivating example, here is what importing a
fastptask from a hypothetical BioWDL module might look like:{ "version": "1.0.0", "license": "MIT", "dependencies": { "biowdl": { "git": "https://github.com/biowdl/tasks", "branch": "develop" } } }Motivation
Importing in WDL today is limited—far too limited to support distributed development and a module ecosystem. Only local file references and concrete URLs are supported, neither of which is sufficient.
main/masterbranch. Since no conventions or backwards compatibility guarantees are made, documents can easily change out from under you at any time.https://raw.githubusercontent.com), one can implement some level of versioned importing. The result is still undesirable though, as the version of dependencies is baked into the source code itself. This means that a change fromv2.0.1tov2.0.2of a dependency means that all URLs pointing to this dependency need to be updated. This is unnecessary work and added noise for the git commit history.http://URLs are vulnerable to interception and tampering. The module system proposed here addresses all three problems through Git-based sources with immutable references (tags, commits) and a required lockfile.All of this is made a bit easier through the use of relative imports (e.g.,
import "../task.wdl"), which ensure that imports are resolved relative to the current document location. This means that, whether you run a workflow from a local document (sprocket run ./workflow.wdl) or a remote document (sprocket run https://example.com/workflow.wdl), imports will be sourced correctly. That being said, this does not solve the issue of importing from external modules.A more ideal state
The following are, in our view, the minimum requirements for a functioning module ecosystem.
Goals
Given the above ideal state, this RFC sets out the following goals.
module-lock.json) should pin the fully resolved dependency tree so that builds are reproducible without additional effort. Immutable Git references (tags, commits) should guarantee that dependencies do not change out from under you.openwdl.github.io/registryshould aggregate metadata from registered repositories to make them searchable—without becoming a single point of failure for resolution.Antigoals
Prior Art
#226
As far as I can tell, this is the earliest official proposal to add some level of package management/versioning to WDL. The issue is rather short and describes a syntax like the following.
As previously stated, I'm not in support of stopping at the proposed mechanism because it assumes a centralized package system (something I don't think is feasible with our relatively small contributor team nor advisable based on our emphasis on enabling distributed development and maintenance).
#493
Revived in early 2022, this proposal similarly focuses on a centralized package repository. For the pieces that overlap with this proposal (essentially just import syntax), the two overlap quite a bit—just a difference in the order of the "import" and "from" clauses.
#499
#499 outlined a more concrete package format. We drew inspiration from it—particularly the metadata fields (name, author, version, license) and the reproducibility concerns, which we folded into this proposal's goals. Much of the remaining discussion in #499 centers on centralized distribution (e.g., tar vs. zip), which is irrelevant here given our Git-based approach.
#698
#698 proposes that we relax the constraints around what document versions can be imported, essentially advocating that any WDL v1.x version must be able to load any WDL documents with versions v1.x or lower. This adheres to common expectations regarding backwards compatibility of software with the same major version and would be required for this proposal to be manageable (else, the entire ecosystem could become deadlocked waiting for root modules in the ecosystem to update their WDL version when a new minor revision of WDL is released).
Proposal
Definition of a WDL module
A WDL module is a directory containing a
module.jsonmanifest and one or more.wdlfiles. There is no other organizational concept—no "workspace" type, no special grouping files, no hierarchy requirements.When the resolver encounters a dependency, it scans the entire source tree for
module.jsonfiles. Each one it finds is a module, registered at whatever path it sits at relative to the source root. This means a Git repository can contain one module at the root, many modules in subdirectories, or both. The resolver does not care about the shape of the repository—it discovers what is there.A single-module repository:
A multi-module repository:
Both are valid dependencies. The resolution logic is identical for both.
Manifest file
The manifest file for a module always lives at the root of the module directory with the name
module.json. It contains the following fields.Core fields
name(string, required). A human-readable display name for the module (e.g.,"fastp","samtools-sort"). This is used by the registry and tooling for display purposes—it is not used for dependency resolution. Consumers still name dependencies in their ownmodule.json, so there is no global namespace to manage and no squatting problem.version(string, required). The module version, following the SemVer v2.0.0 specification. The versioning contract is: if the version doesn't change, the expected output doesn't change. This means the module version must reflect changes to the WDL interface (inputs, outputs, behavior) and changes to the wrapped tool that alter expected output. Thetoolsfield (below) tracks the upstream tool version separately for metadata and provenance, but the module version is ultimately what consumers rely on for compatibility.license(string, required). An SPDX license expression (e.g.,"MIT","Apache-2.0","MIT OR Apache-2.0","MIT AND (Apache-2.0 WITH LLVM-exception)").authors(array of strings, optional). Author descriptions. The convention for individual authors is"First Last <first.last@example.com>", but this is not enforced.description(string, optional). A brief description of what the module does.repository(string, optional). The canonical Git URL for the module's source repository (e.g.,"https://github.com/biowdl/tasks"). The registry uses this to link back to source.homepage(string, optional). A URL for the module's documentation or landing page, if distinct from the repository.readme(string, optional). Path to a markdown file relative to the module root. If omitted, engines and the registry should look forREADME.mdin the module directory. If explicitly set tofalse, no readme is associated with the module.Tools
The
toolsfield is an array of objects that tracks the upstream software wrapped by the module. Each entry records:name(string, required). The tool name.version(string, required). The version of the tool.license(string, required). The tool's SPDX license identifier.homepage(string, optional). URL for the tool's homepage or repository.doi(string, optional). DOI for the tool's publication.biotools(string, optional). bio.tools registry identifier.WDL tasks sit at the boundary between the workflow language and the tools they call. The
toolsarray tracks which version of the upstream software the module wraps, but this is metadata—it does not replace the module's own semver version. Module authors are responsible for bumping the module version whenever the wrapped tool changes in a way that alters expected output. If you updatefastpfrom0.23.4to0.24.0and that changes the default trimming behavior, the module version must change even if no WDL inputs or outputs were added or removed. Thetoolsfield exists so that downstream consumers have machine-readable provenance (which tool, which version, which license) without overloading the module version with that information. See also Guidance on API stability and optional inputs for how to minimize interface-breaking changes when upstream tools evolve.Dependencies
The
dependenciesfield is a JSON object that contains one key per dependency. Each key must be a valid WDL identifier. The key is a consumer-chosen name—it does not need to match the module'snamefield. The importer names the dependency however they like, and two consumers can refer to the same module by different local names without ambiguity.Each dependency must specify a source and a version selector. A dependency with a
gitURL but noversion,tag,branch, orcommitis invalid—you must be explicit about what you want.Version requirements (default)
The recommended way to declare a dependency is with a
versionfield containing a semver version requirement. The resolver lists Git tags from the repository, parses them as semver (stripping a leadingvif present, e.g.,v1.2.0→1.2.0), and selects the highest version that satisfies the constraint.The version requirement syntax is as follows:
^1.2.0— compatible updates:>=1.2.0, <2.0.0. This is the default behavior if no operator is specified (i.e.,"1.2.0"is equivalent to"^1.2.0").~1.2.0— patch-level updates only:>=1.2.0, <1.3.0.=1.2.0— exactly this version.>=1.0.0, <2.0.0— explicit range using comparison operators (>=,>,<=,<), combined with commas.*— any version. This is allowed but discouraged.{ "dependencies": { "samtools": { "git": "https://github.com/someone/samtools-wdl", "version": "^1.2.0" }, "biowdl": { "git": "https://github.com/biowdl/tasks", "version": ">=2.0.0, <3.0.0" } } }Alternative version selectors
For cases where semver version requirements are not sufficient (e.g., pre-release testing, pinning to a specific commit for debugging, or tracking a development branch), the following alternatives are available:
tag— a specific Git tag name (e.g.,"v1.2.0-rc1"). Does not go through semver resolution.branch— a Git branch name (e.g.,"main","develop"). The resolved commit will vary over time; the lockfile pins the exact commit at resolution time.commit— a full Git commit SHA. The most precise and immutable selector.The four selectors—
version,tag,branch, andcommit—are mutually exclusive. Specifying more than one on the same dependency is invalid.{ "dependencies": { "bleeding_edge": { "git": "https://github.com/org/tool", "branch": "main" }, "pinned": { "git": "https://github.com/org/tool", "commit": "abc123d" }, "prerelease": { "git": "https://github.com/org/tool", "tag": "v2.0.0-rc1" } } }Local path dependencies
A dependency with a
pathkey points to a local filesystem directory. No version selector is needed—the module is used as-is from the local path.{ "dependencies": { "local_utils": { "path": "../../shared/utils" } } }Path within a repository
For any Git dependency, an optional
pathkey can be included to set the root directory for module scanning within the repository. This is useful when WDL modules live in a subdirectory alongside other files (e.g., Docker files, CI configs, documentation).{ "dependencies": { "mytool": { "git": "https://github.com/org/mytool", "version": "^1.0.0", "path": "wdl" } } }Full example
{ "name": "fastp", "version": "1.2.0", "license": "MIT OR Apache-2.0", "authors": ["Jane Doe <jane.doe@example.com>"], "description": "WDL wrapper for fastp quality control", "repository": "https://github.com/someone/fastp-wdl", "homepage": "https://someone.github.io/fastp-wdl", "tools": [ { "name": "fastp", "version": "0.23.4", "license": "MIT", "homepage": "https://github.com/OpenGene/fastp", "doi": "10.1093/bioinformatics/bty560", "biotools": "fastp" } ], "dependencies": {} }Notes on the manifest format
namefield is for display only (e.g., in the registry and tooling output). It plays no role in dependency resolution—consumers choose their own local names for dependencies. This means there is no global namespace to manage and no squatting problem.readmefield defaults toREADME.mdif omitted. The registry renders this file as the module's landing page.module.json—at the top level, withintoolsentries, withindependenciesentries, and in any other nested object—rather than treating them as errors. This allows the manifest format to evolve over time; new optional fields can be added without breaking older engines that don't understand them.Symbolic imports
Symbolic imports use unquoted identifiers to refer to dependencies declared in
module.json. Quoted imports remain for relative file paths. Because imports today must be enclosed in quotes, a backwards-compatible approach is that any non-quoted import is assumed to be a symbolic import.Resolution
When the parser encounters
import X from foo/bar/baz, the resolution proceeds as follows:foois the dependency name,bar/bazis the module path within the dependency.fooin the current module'smodule.jsondependencies.module.jsonfiles, registering each module at its relative path.bar/bazin the discovered modules.Xavailable.For
import X from foowith no path component, the module must exist at the source root (i.e., amodule.jsonat the top level of the dependency).The steps above describe the logical resolution behavior that all compliant engines must produce. Engines are free to implement the mechanics however they choose (e.g., caching strategies, scan ordering, lazy vs. eager fetching).
Version discovery
How available versions are discovered depends on the source type.
For Git-based dependencies (the common case), the resolver lists the repository's Git tags and parses each as a semver version, stripping a leading
vif present (e.g., tagv1.2.0→ version1.2.0). Tags that do not parse as valid semver are ignored. The resulting set of versions is what the resolver matches against when evaluating aversionrequirement. This means module authors publish a new version by tagging a commit—there is no separate publish step, no upload, no registry submission. The Git tag is the release.For local path dependencies, the resolver reads the
versionfield from the module'smodule.jsonat the given path. If the dependency declaration includes aversionrequirement, the local module's version must satisfy it—otherwise resolution fails.Transitive dependencies
Dependencies are fully transitive. If module A depends on module B, and B depends on C, the resolver walks the full tree.
Version precedence
Version precedence follows SemVer v2.0.0, section 11. When multiple tags satisfy a version requirement, the resolver selects the highest version according to semver precedence rules. Build metadata (i.e., anything after
+) is ignored for precedence purposes.Version resolution and conflicts
When multiple modules in the dependency tree require the same dependency with compatible version constraints (e.g.,
^1.2.0and^1.5.0), the resolver should attempt to find a single version that satisfies all constraints (e.g.,1.5.0or higher). This avoids unnecessary duplication.When the constraints are incompatible (e.g.,
^1.0.0and^2.0.0), both versions are fetched and used independently. No deduplication, no warnings. WDL modules are lightweight text files, and the tasks they define execute in isolated containers with no shared runtime state. There is nothing that can conflict. This sidesteps the diamond dependency problem entirely, at the cost of minor storage duplication—a few kilobytes of WDL source per duplicate.Lockfile
The specification requires a
module-lock.jsonfile at the module root. This file pins the fully resolved dependency tree, including any duplicates. It must be committed to version control—it is what makes builds reproducible regardless of upstream changes.Cached module sources (i.e., the local directories where resolved modules are downloaded or cloned) should not be committed. These are ephemeral and can be reconstructed from the lockfile. Engines should store cached modules in a location outside the project directory (e.g., a user-level cache) or in a directory that is
.gitignored by convention.Lockfile format
The
module-lock.jsonfile is a JSON object with the following structure:{ "version": 1, "dependencies": { "biowdl": { "source": { "git": "https://github.com/biowdl/tasks", "commit": "a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2" }, "modules": { "fastp": { "version": "1.2.0", "checksum": "sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855", "dependencies": { "common": { "source": { "git": "https://github.com/biowdl/common", "commit": "d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5" }, "modules": { ".": { "version": "0.3.0", "checksum": "sha256:4355a46b19d348dc2f57c046f8ef63d4538ebb936000f3c9ee954a27460dd865", "dependencies": {} } } } } } } }, "samtools": { "source": { "git": "https://github.com/someone/samtools-wdl", "commit": "b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3" }, "modules": { ".": { "version": "3.0.1", "checksum": "sha256:d7a8fbb307d7809469ca9abcb0082e4f8d5651e46d3cdb762d02d0bf37c9e592", "dependencies": {} } } }, "local_utils": { "source": { "path": "../../shared/utils" }, "modules": { ".": { "version": "0.5.0", "checksum": "sha256:9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08", "dependencies": {} } } } } }The structure is recursive—each module's
dependenciesfield has the same shape as the top-leveldependenciesobject, mirroring the full dependency tree.The fields are as follows:
version(integer, required). The lockfile format version. Currently1. Engines should reject lockfiles with an unrecognized version.dependencies(object, required). A map from consumer-chosen dependency name (matching the key inmodule.jsondependencies) to its resolved state.Each dependency entry contains:
source(object, required). The resolved source. For Git sources, this containsgit(the repository URL) andcommit(the full 40-character SHA that thetag,branch, orcommitreference resolved to at lock time). For local path sources, this contains onlypath.modules(object, required). A map from module path within the dependency source to that module's locked state. The key is the relative path from the source root to the directory containingmodule.json. For modules at the source root, the key is".".Each module entry contains:
version(string, required). The version from the module'smodule.jsonat lock time.checksum(string, required). The module's content hash in the formatsha256:<hex_digest>, computed using the content hashing algorithm specified below.dependencies(object, required). The module's own transitive dependencies, in the same format as the top-leveldependenciesobject. Empty if the module has no dependencies.When two modules in the dependency tree require different versions of the same source, both resolved versions appear in the tree at whatever point in the nesting they were required.
Content hashing
Both the lockfile checksum and module signatures depend on the same deterministic content hash. All compliant engines must produce the same digest for the same module contents.
The algorithm, following the approach used by Sprocket for call caching:
module.sigandmodule-lock.json./as the path separator, regardless of the host operating system.a. Hash the relative path (UTF-8 bytes).
b. Hash the file contents (raw bytes).
The entry count in step 6 ensures that a module with files
aandbcproduces a different digest than a module with filesabandc, even if the concatenation of paths and contents happens to collide.The lockfile records this digest in the format
sha256:<hex_digest>.Integrity: lockfile checksums
The
module-lock.jsonchecksum field provides tamper detection. Once a module is resolved and its checksum recorded, any modification to the cached content (whether by a compromised cache, a man-in-the-middle, or a corrupted download) is detectable. Engines must verify checksums against the lockfile before using cached modules. If the checksum does not match, the engine must refuse to proceed.Module signing and supply chain security
Module ecosystems are targets for supply chain attacks—compromised repositories, force-pushed tags, impersonated maintainers. The signing model here addresses content tampering and maintainer impersonation without requiring centralized infrastructure.
Module signatures
Module authors can sign their modules by producing a
module.sigfile at the module root. This is a JSON file containing an Ed25519 signature computed over the module's content hash (i.e., the SHA-256 digest produced by the content hashing algorithm above).Signature file format
{ "algorithm": "ed25519", "public_key": "base64-encoded-32-byte-public-key", "signature": "base64-encoded-64-byte-signature" }The fields:
algorithm(string, required). The signing algorithm. Currently the only permitted value is"ed25519". Future specification versions may add additional algorithms, at which point engines must reject unrecognized values.public_key(string, required). The signer's Ed25519 public key, base64-encoded.signature(string, required). The Ed25519 signature over the module's content hash (the raw 32-byte SHA-256 digest, not the hex-encoded string), base64-encoded.A signed module looks like:
Ed25519 was chosen because it is fast, produces small signatures (64 bytes) and small keys (32 bytes), and has mature implementations in every major language—engines can verify signatures in-process without shelling out to external tools or depending on any system keychain. No SSH infrastructure, no GPG, no platform-specific credential stores.
Signing is optional but encouraged. Unsigned modules (i.e., modules without a
module.sigfile) are valid—they simply skip the provenance verification flow.Why out-of-band rather than Git-native signing?
Git tag and commit signing would be simpler today—authors already sign tags, and engines could verify them directly. But it couples the security model to the transport mechanism. If modules are ever distributed as tarballs, through a package server, or through any non-Git mechanism, Git signatures don't travel with the content. A
module.sigfile does. The cost is that authors need a separate signing step, but the benefit is a security model that survives changes to the distribution infrastructure.Trust on first use (TOFU)
The trust model follows "trust on first use":
module.sigis present, the engine verifies the signature and records the signer's public key inmodule-lock.json.sprocket module trust biowdl/fastp). This protects against compromised repositories where an attacker replaces both the content and the signature.The lockfile records the signer's public key on each signed module entry:
{ "version": 1, "dependencies": { "biowdl": { "source": { "git": "https://github.com/biowdl/tasks", "commit": "a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2" }, "modules": { "fastp": { "version": "1.2.0", "checksum": "sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855", "signer": "base64-encoded-32-byte-public-key", "dependencies": {} } } } } }For unsigned modules, the
signerfield is absent.Engine policy
Engines are encouraged to:
require_signed = true) that rejects unsigned modules entirely. Whether this defaults to on or off is left to the engine implementor.Deprecation of remote URL imports
Remote URL imports (
http://,https://) are deprecated. Compliant engines should emit a warning when encountering them. Removal is targeted for a future WDL specification version.Raw URL imports have several problems: the content at a URL can change without notice (breaking reproducibility), there are no versioning guarantees, and plain
http://URLs are a security concern. The module system with Git-based sources and lockfile pinning addresses all of these. Relative file imports (import "../path/to/file.wdl") are unaffected by this deprecation.Guidance on API stability and optional inputs
A common source of unnecessary major version bumps in WDL modules is the addition of new task inputs. Under semver, adding a new required input is a breaking change—every downstream consumer that calls the task must be updated. This leads to version churn that ripples through the dependency tree and erodes the usefulness of version constraints.
The remedy is straightforward: module authors should make liberal use of optional inputs with sensible defaults. WDL already supports this well. A task that wraps
fastp, for example, might initially expose only the required flags. Whenfastpadds a new--cut_rightoption, the WDL wrapper can add an optional input with a default offalserather than a required input. Existing consumers continue to work without changes, and the version bump is minor rather than major.We believe optional inputs are underused in WDL today. In a module ecosystem with transitive dependencies and semver constraints, the difference between a required and optional input is the difference between a breaking change and a compatible one. Module authors should default to optional inputs for any parameter that has a reasonable default value, reserving required inputs for the genuinely mandatory ones (e.g., input files).
Discoverability
Modules are resolved directly from Git repositories—there is no central package server. No single organization controls module availability, institutional and private modules work identically to public ones, and resolution has no external runtime dependency beyond the Git host.
The tradeoff is that distributed systems are harder to search. To address this, the community will maintain a module index at
openwdl.github.io/registry, backed by theopenwdl/registryGitHub repository.openwdl/registryadding their repository URL to the index.module.json, and pass structural validation.module.json(version, description, tools, license, authors), and builds a static searchable site.The index never hosts code. It points to code. If the index goes down, every existing import still works because imports resolve from Git, not from the index. The index is a convenience for discovery, not infrastructure for resolution.
Engine tooling
The WDL specification defines module format and resolution behavior; individual engines build their own CLI tooling on top. As an example, Sprocket plans a
module validatecommand with four levels of checks:module.jsonexists, required fields present,versionis valid semver,licenseis a valid SPDX expression,toolsentries have their required fields..wdlfiles in the module parse without errors.Other engines may implement equivalent or different commands. The
openwdl/registryCI could use any compliant engine's validation tooling to check submissions.Design tradeoffs
Distributed hosting vs. centralized registry. We chose Git-based resolution over a central package server. The cost is discoverability, which we address with the community index. The benefit is that no single organization can become a bottleneck or point of failure for the ecosystem. Environments that cannot depend on third-party SaaS can use this system without modification.
Duplicate dependencies over conflict resolution. Other ecosystems (npm, Go) invest significant complexity in version resolution strategies. We skip all of that. WDL tasks run in isolated containers—there is no shared memory, no symbol table, no binary linking. Two versions of the same dependency coexist without interference. The duplication cost is negligible for text files.
Separate tool versioning. The module version and the upstream tool version are tracked in different fields, but they are not independent. The module version contract is that if the version doesn't change, the expected output doesn't change—so a tool update that alters output requires a module version bump. The
toolsarray exists for provenance and license tracking, not as a substitute for proper semver on the module itself. The separation avoids conflating "which tool am I wrapping?" with "is my WDL interface stable?"—two questions that consumers care about for different reasons (cf. the discussion in the comments on this RFC).Display name, not resolution name. The
namefield exists for human consumption—the registry, tooling output, search results. It is not used for dependency resolution. The importer names each dependency locally in their ownmodule.json, so there is no global namespace to manage, no squatting problem, and no need for a naming authority. Two teams can independently wrap the same tool and consumers pick whichever they prefer.Lockfile as a specification requirement. Making
module-lock.jsonoptional would undermine reproducibility. If the lockfile is required, every module is reproducible by default. Authors who want to live on the edge can regenerate it; authors who want stability commit it and move on.Out-of-band signing over Git-native signing. Git tag signing would be simpler today, but it couples security to the transport mechanism. A
module.sigfile travels with the module regardless of how it's distributed—Git clone, tarball, or something we haven't built yet. The cost is a separate signing step for authors, but engines can make this a single command (e.g.,sprocket module sign).Trust on first use over a certificate authority. TOFU has known downsides: the first resolution is unverified (if the repo is already compromised, you trust the attacker's key), key rotation requires manual acceptance from every consumer, and there is no revocation mechanism. A PKI would address all of these but would require infrastructure and governance that a small open-source community cannot realistically sustain. TOFU protects against the most common attack—a repository compromised after you started using it—and we accept the tradeoff that it cannot protect against pre-existing compromise. The
openwdl/registrycan partially mitigate this by recording maintainer keys at submission time, giving new consumers a cross-reference point.Optional signing with encouraged adoption. Requiring signatures would be more secure but would raise the friction bar for every module author. Making it optional with engine-level policy (e.g.,
require_signed_packages) lets security-conscious environments enforce signing while keeping the barrier to entry low for everyone else.Soft URL deprecation. A hard break would strand existing workflows. Warnings give the ecosystem time to migrate while making the direction clear.
Auto-discovery over workspace configuration. We considered a separate workspace manifest (i.e., a
wdl-workspace.jsonthat lists member modules) and decided against it. One concept—the module—and one file format (module.json) is easier to explain, easier to implement, and sufficient for all repository layouts we examined.Concerns left to address
.npmrc-style config or reliance on the user's existing Git credential helpers).Beta Was this translation helpful? Give feedback.
All reactions