[BETA] feat(nodespec): the single unified dataflow spec (replaces standard/flow/materialized_view) by haillew · Pull Request #104 · databricks-solutions/lakeflow_framework

haillew · 2026-06-26T07:24:50Z

READ FIRST: nodespec is NOT a fourth spec type

nodespec is the single, unified dataflow spec that replaces standard, flow, and materialized_view going forward.

This is not a fourth option to pick from. It is the one and only spec every new pipeline should use. The three legacy formats are kept working only for backward compatibility during migration and are on a path to be retired. If you are writing a new pipeline: write nodespec.

What this PR adds (BETA)

A single, unified, node-based dataflow spec — nodespec — that supersedes the standard, flow, and materialized-view formats.

A nodespec is a graph of nodes that chain together:

source  ->  transformation  ->  target

source — where data comes from (table, files, stream)
transformation — how data is reshaped (SQL or Python)
target — where data lands, carrying its own table-level settings (CDC, data quality, quarantine, clustering, …)

Targets declare what feeds them via an explicit input list; sources and transformations reference their upstream within their own definition. The node graph lowers into the framework's existing flow-spec representation, so no engine changes are required and every current capability (CDC, snapshots, data quality, quarantine, sinks, table migration, materialized views) is preserved. A single nodespec can mix streaming-table and materialized-view targets, including chains where one feeds the other.

Legacy formats continue to work, and scripts/migrate_to_nodespec.py converts existing standard/flow/materialized_view specs.

Why

The three separate formats force users to learn three layouts, leak engine internals (flow groups, view registration, staging tables), nest settings deeply, hide pipeline topology, and force streaming tables and materialized views into separate files. nodespec collapses all of that into one readable, chainable model — easier for newcomers writing their first pipeline and for large teams reading and editing each other's pipelines.

Details

See the design decision record: docs/decisions/0007-unified-nodespec-dataflow-spec.md (full rationale, before/after examples, and the key decisions).

Introduce the nodester dataflow spec: a single, node-graph format (source -> transformation -> target) that replaces the separate standard, flow, and materialized-view formats. The transformer lowers a node graph into the framework's existing flow spec, so all current capabilities (CDC, snapshots, data quality, quarantine, sinks, table migration, materialized views) are preserved. Highlights: - Target nodes wire inputs via an `input` list; each item is a node name (auto flow name) or `{view, flow}` to define the SDP flow name explicitly and keep it stable across edits (renaming a flow forces a full refresh). - A single spec may contain both streaming-table and materialized-view targets, including chains where one feeds the other. - Python transformation nodes become their own view that applies apply_transform to their inferred upstream; inline python_transform on a source is still supported for backward compatibility. - Inline SQL/Python sources (and append_sql) remain supported but warn, in both the nodester and legacy formats; the recommended alternative is a dedicated transformation node. - Materialized view targets no longer accept an inline source_view (breaking); chain a source node via `input` instead. - All 38 nodester samples updated and verified end to end on Databricks. - Adds ADR-0007 and a rewritten nodester spec reference. Co-authored-by: Isaac

The local .claude directory (Claude Code commands/settings) should not be part of the repository. Add .claude/ to .gitignore and remove the previously tracked command files from version control. The files remain on disk locally (now ignored), so they persist across branch switches. Co-authored-by: Isaac

Move the feature-test GitHub workflow and the pattern-samples validation notebooks (validate_run_1..4 + validation_utils) out of this branch. They are kept on a local-only branch and intentionally not published upstream. Co-authored-by: Isaac

Rename the spec type, schema (spec_nodespec.json + nodespecSpec), transformer (NodespecSpecTransformer), migration script, sample bundle (nodespec_sample), pipelines, docs, and the data_flow_type value from "nodester" to "nodespec". No behavioural change — purely a rename. Co-authored-by: Isaac

Rewrite the nodespec transformer around a single snake_case -> camelCase key map plus flat/recursive converters, replacing the per-context allowlist maps and per-source-type branches. Builders now copy every non-structural config key as passthrough detail. Output is unchanged (verified identical across all 38 samples and end to end on Databricks); ~593 -> 440 lines. Co-authored-by: Isaac

haillew added 3 commits June 21, 2026 19:20

update json schema for nodester

9c9c74a

haillew requested a review from rederik76 as a code owner June 26, 2026 07:24

haillew added 2 commits June 26, 2026 17:30

haillew changed the title ~~[BETA] feat(nodester): Unified dataflow spec~~ [BETA] feat(nodespec): the single unified dataflow spec (replaces standard/flow/materialized_view) Jun 26, 2026

haillew linked an issue Jun 26, 2026 that may be closed by this pull request

[FEATURE]: nodespec — the single unified dataflow spec (replaces standard/flow/materialized_view) #102

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BETA] feat(nodespec): the single unified dataflow spec (replaces standard/flow/materialized_view)#104

[BETA] feat(nodespec): the single unified dataflow spec (replaces standard/flow/materialized_view)#104
haillew wants to merge 6 commits into
mainfrom
feature/uniform-dataflow-spec-poc

haillew commented Jun 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

haillew commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

READ FIRST: nodespec is NOT a fourth spec type

What this PR adds (BETA)

Why

Details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

haillew commented Jun 26, 2026 •

edited

Loading