docs: add CONTRIBUTING.md and data-source contribution guides#2705
docs: add CONTRIBUTING.md and data-source contribution guides#2705m1n0 wants to merge 4 commits into
Conversation
Adds top-level CONTRIBUTING.md covering dev setup, code style, PR workflow, and a brief overview of the two extension surfaces (data sources and check types). Adds CONTRIBUTING-DATA-SOURCE.md as a human-paced guide to adding a new data source, mirroring the v3 doc's structure but written purely against v4 (DataSourceImpl + SqlDialect + DataSourceConnection, pydantic connection models, entry-point auto-registration). Adds the dense CONTRIBUTING-DATA-SOURCE.AGENT.md companion intended for coding agents: every base class, every override, common failure modes, verification commands. Both human-readable docs cross-link to the agent reference so contributors and AI assistants can navigate naturally between overview and dense detail. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Correctness fixes:
- Drop AVG/ from sql_ast cheat sheet (only AVERAGE exists)
- Correct the plugin-discovery story in both docs: the entry-point name
is not used for matching. Loading the entry point triggers
__init_subclass__, which keys the registry on the Literal[...] default;
matching at runtime is YAML type: ↔ Literal[...]. Convention keeps the
entry-point suffix aligned, but the load-bearing match is elsewhere.
- Drop the brittle MetricImpl line-number range; use a grep hint instead.
- Qualify the pydantic config guidance: top-level models use extra="forbid",
connection-properties subclasses use extra="allow".
- Fix Postgres connection-variant count: three concrete variants
(ConnectionString, ConnectionPassword, ConnectionPasswordFile), not two.
AGENT.md example now shows a multi-auth infer_connection_type.
- Replace "11 working examples" with future-proof phrasing.
- Acknowledge the MissingAncValidityCheckYaml typo inline so readers don't
second-guess the doc (impl base is correctly MissingAndValidityCheckImpl).
Completeness:
- Add SodaDataTypeName location + an enumeration one-liner.
- Add an ASCII runtime-flow diagram at the top of the AGENT doc.
- Add a "what not to commit" mini-section and a version-matching one-liner
(grep -m1 '^version' soda-core/pyproject.toml).
- Clarify pip fallback list installs only the packages you name, with an
inline note to add -e soda-{driver} for every driver you need.
- Add "when NOT to use base X" notes to the check-type section.
Doc-practice:
- Add a 10-line TOC to the AGENT doc (~670 lines now).
- Mirror the human guide's "end-to-end" framing in the AGENT preamble.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The "Data source type 'X' not available" row claimed an entry-point name mismatch could cause this error. The loader never reads the entry-point name (the prose in §2.6 / §5 already says so); the error fires when __implementation_classes.get(type_name) returns None. Rewritten to list the real causes, ordered roughly by frequency: package not installed, silent ep.load() import failure, missing model_class= kwarg, or Literal[...] / YAML type: mismatch. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds three new top-level documentation files aimed at lowering the barrier for contributing to Soda Core v4, both for human contributors and for coding agents. No code changes.
Changes:
- New
CONTRIBUTING.mdcovering setup, style, and PR workflow. - New
CONTRIBUTING-DATA-SOURCE.mdas a human-paced guide for adding a new data source. - New
CONTRIBUTING-DATA-SOURCE.AGENT.mdas a dense reference for coding agents.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| CONTRIBUTING.md | Top-level contributor onboarding guide. |
| CONTRIBUTING-DATA-SOURCE.md | Step-by-step human guide for adding a new data source. |
| CONTRIBUTING-DATA-SOURCE.AGENT.md | Dense agent-oriented reference covering base classes, overrides, and verification. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Niels-b
left a comment
There was a problem hiding this comment.
Skimmed over these documents. LGTM
…ides CONTRIBUTING-DATA-SOURCE.AGENT.md: - §2.0 "Research the engine first" — a question->override checklist to run before scaffolding - override dependency-ordering note in §2.4 - escape_string backslash gotcha detail - §7.1 "Base-class assumptions that don't always hold" punchlist CONTRIBUTING-DATA-SOURCE.md: - "Before you start: research your engine" section linking to the agent reference's full checklist Engine references kept to the set the docs already use (postgres, duckdb, snowflake, databricks); no commit hashes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5d38302 to
74b332d
Compare
|
There was a problem hiding this comment.
I have a Claude skill for making new connectors which lives here in my personal dev/journal repo: https://github.com/sodadata/soda-core-dev/blob/main/.claude/skills/new-connector/SKILL.md
I had Claude review your docs against the skill, and they broadly agreed, I added a couple of things which were in the skill and not in your docs which might be helpful. Already committed and pushed to your PR, so back to you for review.



Summary
Adds three documentation files to lower the bar for contributing to Soda Core v4 — both for human contributors and for coding agents (Claude Code, Cursor, …) that may drive parts of a contribution end-to-end.
CONTRIBUTING.md— new top-level contributing guide. Covers where to ask questions, what kinds of contributions land where, a brief description of the two extension surfaces (data sources, check types), dev setup, code style, and PR workflow. Open-source only — no references to internal/extension content.CONTRIBUTING-DATA-SOURCE.md— human-paced guide to adding a new data source. Structured to mirror the v3 doc but written purely against v4 (DataSourceImpl+SqlDialect+DataSourceConnection, pydantic connection models, entry-point auto-registration via__init_subclass__).CONTRIBUTING-DATA-SOURCE.AGENT.md— denser companion intended for coding agents. Catalogues every base class, the required and frequently-overridden hooks, SQL-AST builders, plugin-discovery internals, verification commands, and common failure modes.The human-paced docs cross-link to the agent reference so contributors (and AI assistants) can navigate between overview and dense detail.
Notes
soda-postgres/,soda-duckdb/,soda-snowflake/, etc.). No mention of any private extension packages.Test plan
soda-core/sourceCONTRIBUTING-DATA-SOURCE.mdreads naturally for a new data-source contributor🤖 Generated with Claude Code