Skip to content

docs: add CONTRIBUTING.md and data-source contribution guides#2705

Open
m1n0 wants to merge 4 commits into
mainfrom
docs/contributing-extensibility
Open

docs: add CONTRIBUTING.md and data-source contribution guides#2705
m1n0 wants to merge 4 commits into
mainfrom
docs/contributing-extensibility

Conversation

@m1n0

@m1n0 m1n0 commented May 14, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds three documentation files to lower the bar for contributing to Soda Core v4 — both for human contributors and for coding agents (Claude Code, Cursor, …) that may drive parts of a contribution end-to-end.

  • CONTRIBUTING.md — new top-level contributing guide. Covers where to ask questions, what kinds of contributions land where, a brief description of the two extension surfaces (data sources, check types), dev setup, code style, and PR workflow. Open-source only — no references to internal/extension content.
  • CONTRIBUTING-DATA-SOURCE.md — human-paced guide to adding a new data source. Structured to mirror the v3 doc but written purely against v4 (DataSourceImpl + SqlDialect + DataSourceConnection, pydantic connection models, entry-point auto-registration via __init_subclass__).
  • CONTRIBUTING-DATA-SOURCE.AGENT.md — denser companion intended for coding agents. Catalogues every base class, the required and frequently-overridden hooks, SQL-AST builders, plugin-discovery internals, verification commands, and common failure modes.

The human-paced docs cross-link to the agent reference so contributors (and AI assistants) can navigate between overview and dense detail.

Notes

  • All examples reference only OSS packages already in this repo (soda-postgres/, soda-duckdb/, soda-snowflake/, etc.). No mention of any private extension packages.
  • No code changes. Docs only.

Test plan

  • Render the three files on GitHub and verify formatting + cross-links
  • Skim for accuracy: confirm class names, paths, and entry-point patterns match current soda-core/ source
  • Confirm the v3-style structure of CONTRIBUTING-DATA-SOURCE.md reads naturally for a new data-source contributor

🤖 Generated with Claude Code

m1n0 and others added 3 commits May 14, 2026 17:56
Adds top-level CONTRIBUTING.md covering dev setup, code style, PR workflow,
and a brief overview of the two extension surfaces (data sources and check
types). Adds CONTRIBUTING-DATA-SOURCE.md as a human-paced guide to adding
a new data source, mirroring the v3 doc's structure but written purely
against v4 (DataSourceImpl + SqlDialect + DataSourceConnection, pydantic
connection models, entry-point auto-registration). Adds the dense
CONTRIBUTING-DATA-SOURCE.AGENT.md companion intended for coding agents:
every base class, every override, common failure modes, verification
commands.

Both human-readable docs cross-link to the agent reference so contributors
and AI assistants can navigate naturally between overview and dense detail.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Correctness fixes:
- Drop AVG/ from sql_ast cheat sheet (only AVERAGE exists)
- Correct the plugin-discovery story in both docs: the entry-point name
  is not used for matching. Loading the entry point triggers
  __init_subclass__, which keys the registry on the Literal[...] default;
  matching at runtime is YAML type: ↔ Literal[...]. Convention keeps the
  entry-point suffix aligned, but the load-bearing match is elsewhere.
- Drop the brittle MetricImpl line-number range; use a grep hint instead.
- Qualify the pydantic config guidance: top-level models use extra="forbid",
  connection-properties subclasses use extra="allow".
- Fix Postgres connection-variant count: three concrete variants
  (ConnectionString, ConnectionPassword, ConnectionPasswordFile), not two.
  AGENT.md example now shows a multi-auth infer_connection_type.
- Replace "11 working examples" with future-proof phrasing.
- Acknowledge the MissingAncValidityCheckYaml typo inline so readers don't
  second-guess the doc (impl base is correctly MissingAndValidityCheckImpl).

Completeness:
- Add SodaDataTypeName location + an enumeration one-liner.
- Add an ASCII runtime-flow diagram at the top of the AGENT doc.
- Add a "what not to commit" mini-section and a version-matching one-liner
  (grep -m1 '^version' soda-core/pyproject.toml).
- Clarify pip fallback list installs only the packages you name, with an
  inline note to add -e soda-{driver} for every driver you need.
- Add "when NOT to use base X" notes to the check-type section.

Doc-practice:
- Add a 10-line TOC to the AGENT doc (~670 lines now).
- Mirror the human guide's "end-to-end" framing in the AGENT preamble.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The "Data source type 'X' not available" row claimed an entry-point name
mismatch could cause this error. The loader never reads the entry-point
name (the prose in §2.6 / §5 already says so); the error fires when
__implementation_classes.get(type_name) returns None. Rewritten to list
the real causes, ordered roughly by frequency: package not installed,
silent ep.load() import failure, missing model_class= kwarg, or
Literal[...] / YAML type: mismatch.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds three new top-level documentation files aimed at lowering the barrier for contributing to Soda Core v4, both for human contributors and for coding agents. No code changes.

Changes:

  • New CONTRIBUTING.md covering setup, style, and PR workflow.
  • New CONTRIBUTING-DATA-SOURCE.md as a human-paced guide for adding a new data source.
  • New CONTRIBUTING-DATA-SOURCE.AGENT.md as a dense reference for coding agents.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
CONTRIBUTING.md Top-level contributor onboarding guide.
CONTRIBUTING-DATA-SOURCE.md Step-by-step human guide for adding a new data source.
CONTRIBUTING-DATA-SOURCE.AGENT.md Dense agent-oriented reference covering base classes, overrides, and verification.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Niels-b Niels-b left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skimmed over these documents. LGTM

…ides

CONTRIBUTING-DATA-SOURCE.AGENT.md:
- §2.0 "Research the engine first" — a question->override checklist to run
  before scaffolding
- override dependency-ordering note in §2.4
- escape_string backslash gotcha detail
- §7.1 "Base-class assumptions that don't always hold" punchlist

CONTRIBUTING-DATA-SOURCE.md:
- "Before you start: research your engine" section linking to the agent
  reference's full checklist

Engine references kept to the set the docs already use
(postgres, duckdb, snowflake, databricks); no commit hashes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@paulteehan paulteehan force-pushed the docs/contributing-extensibility branch from 5d38302 to 74b332d Compare May 21, 2026 16:41
@sonarqubecloud

Copy link
Copy Markdown

@paulteehan paulteehan left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a Claude skill for making new connectors which lives here in my personal dev/journal repo: https://github.com/sodadata/soda-core-dev/blob/main/.claude/skills/new-connector/SKILL.md

I had Claude review your docs against the skill, and they broadly agreed, I added a couple of things which were in the skill and not in your docs which might be helpful. Already committed and pushed to your PR, so back to you for review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants