databricks-solutions · liamperritt · May 27, 2026 · May 27, 2026 · May 27, 2026
diff --git a/VERSION b/VERSION
@@ -1 +1 @@
-v0.15.3
+v0.16.0
diff --git a/pipeline_bundle_template/README.md b/pipeline_bundle_template/README.md
@@ -1,54 +1,125 @@
-# bronze_sample
+# `pipeline_bundle_template` — Databricks Asset Bundle custom template
 
-The 'bronze_sample' project was generated by using the default-python template.
+This folder is a [DAB custom template][custom-templates] for scaffolding new Lakeflow Framework
+pipeline bundles. End users **don't edit files here** — they run `databricks bundle init` against
+this folder and get a new bundle populated from their answers.
 
-## Prerequisites:
-1. Execute the setup_data Notebook once bundle is deployed, to setup the Staging source tables and data.
+[custom-templates]: https://docs.databricks.com/aws/en/dev-tools/bundles/templates#custom-templates
 
-## Getting started
+## Initializing a new bundle
 
-1. Update the databricks.yml file with appropriate details (line 4 and line 23 and 25).
+From the repo root:
 
-1. Update the pipelines yml's in the resources folder accordingly:
-   - Change schemas.
+```bash
+databricks bundle init ./pipeline_bundle_template --output-dir /path/to/output
+```
 
-1. Install the Databricks CLI from https://docs.databricks.com/dev-tools/cli/databricks-cli.html
+Or against this folder hosted at a Git URL:
 
-1. Authenticate to your Databricks workspace, if you have not done so already:
-    ```
-    $ databricks configure
-    ```
+```bash
+databricks bundle init https://github.com/liamperritt/lakeflow_framework --template-dir pipeline_bundle_template
+```
 
-1. To deploy a development copy of this project, type:
-    ```
-    $ databricks bundle deploy --target dev
-    ```
-    (Note that "dev" is the default target, so the `--target` parameter
-    is optional here.)
+The CLI will prompt for the values declared in `databricks_template_schema.json` (see below)
+and emit a new bundle under `<output-dir>/<project_name>/`.
 
-    This deploys everything that's defined for this project.
-    For example, the default template would deploy a job called
-    `[dev yourname] silver_ar_job` to your workspace.
-    You can find that job by opening your workpace and clicking on **Workflows**.
+Requires Databricks CLI `>= 0.218.0`.
 
-1. Similarly, to deploy a production copy, type:
-   ```
-   $ databricks bundle deploy --target prod
-   ```
+## Folder layout
 
-   Note that the default job from the template has a schedule that runs every day
-   (defined in resources/silver_ar_job.yml). The schedule
-   is paused when deploying in development mode (see
-   https://docs.databricks.com/dev-tools/bundles/deployment-modes.html).
+```
+pipeline_bundle_template/
+├── databricks_template_schema.json     # prompt definitions
+└── template/                           # Go-templated source tree
+    └── {{.project_name}}/              # root folder is named from the project_name prompt
+        ├── databricks.yml.tmpl
+        ├── README.md.tmpl
+        ├── .skip.tmpl                  # conditional file-skip rules
+        ├── resources/
+        │   └── {{.pipeline_name}}_pipeline.yml.tmpl
+        └── src/
+            ├── dataflows/{{.pipeline_name}}/
+            │   ├── dataflowspec/[flow]{{.example_target_table}}_main.json.tmpl
+            │   ├── dataflowspec/[standard]{{.example_target_table}}_main.json.tmpl
+            │   ├── schemas/{{.example_target_table}}_schema.json
+            │   └── expectations/{{.example_target_table}}_dqe.json
+            └── pipeline_configs/dev_substitutions.json.tmpl
+```
 
-1. To run a job or pipeline, use the "run" command:
-   ```
-   $ databricks bundle run
-   ```
+The Databricks CLI runs Go's `text/template` engine over every file under `template/` (and over
+the path segments themselves). Files with a `.tmpl` suffix have their contents substituted and the
+suffix stripped; non-`.tmpl` files are copied verbatim (path segments are still substituted).
 
-1. Optionally, install developer tools such as the Databricks extension for Visual Studio Code from
-   https://docs.databricks.com/dev-tools/vscode-ext.html.
+## Prompts (`databricks_template_schema.json`)
 
-1. For documentation on the Databricks asset bundles format used
-   for this project, and for CI/CD configuration, see
-   https://docs.databricks.com/dev-tools/bundles/index.html.
+| Property | Type | Default | Purpose |
+|---|---|---|---|
+| `project_name` | string | _required_ | bundle name + output root folder |
+| `pipeline_name` | string | `my_pipeline` | first pipeline; drives `resources/*.yml` and `src/dataflows/*` folder names |
+| `layer` | enum (bronze/silver/gold) | `bronze` | medallion layer; baked into `layer` DAB variable default |
+| `catalog` | string | `main` | UC catalog; baked into `catalog` DAB variable default |
+| `schema` | string | `{{.project_name}}` | UC schema; baked into `schema` DAB variable default |
+| `include_example_dataflows` | enum (yes/no) | `yes` | if `no`, `.skip.tmpl` omits the `src/dataflows/{{.pipeline_name}}` folder |
+| `example_target_table` | string | `my_target_table` | (skipped if no examples) target table; drives `dataFlowId`, `flowGroupId`, filenames |
+| `example_source_table` | string | `my_source_table` | (skipped if no examples) upstream source table |
+| `source_catalog` | string | `{{.catalog}}` | (skipped if no examples) pre-populated into `dev_substitutions.json` as the `SOURCE_CAT_SCHEMA` token |
+| `source_schema` | string | `{{.schema}}` | (skipped if no examples) pre-populated into `dev_substitutions.json` |
+
+## What gets derived vs. what stays as scaffolding
+
+Every single-value placeholder in the source dataflow JSON files is **derived** from the prompts
+above (no extra typing). For example, in the rendered `[flow]<target>_main.json`:
+- `dataFlowId` = `<example_target_table>_flow`
+- `dataFlowGroup` = `<pipeline_name>`
+- `flowGroupId` = `fg_<example_target_table>`
+- `view` key = `v_<example_source_table>`
+- `sourceDetails.database` = `{SOURCE_CAT_SCHEMA}` (resolved at pipeline runtime via `dev_substitutions.json`)
+
+A few values are **hardcoded sensible defaults** the user edits if their data source differs:
+- `sourceType` = `delta`
+- `quarantineMode` = `off`
+
+A few variable-length lists **stay as literal `<...>` scaffolding** because they can't be cleanly
+prompted (the count varies):
+- Schema fields in `{{.example_target_table}}_schema.json`
+- DQE constraints in `{{.example_target_table}}_dqe.json`
+- `selectExp` column list in `[standard]{{.example_target_table}}_main.json`
+- Extra tokens / `prefix_suffix` entries in `dev_substitutions.json`
+
+## Extending the template
+
+To add a new prompt:
+
+1. Add a property entry to `databricks_template_schema.json` (set `type`, `description`, `default`,
+   `order`, plus optional `enum`, `pattern`, `pattern_match_failure_message`, `skip_prompt_if`).
+2. Reference it in any `.tmpl` file as `{{.your_new_property}}`.
+3. Test with `databricks bundle init ./pipeline_bundle_template --output-dir /tmp/init-test` and
+   inspect the generated bundle.
+
+To conditionally skip files based on user answers, extend `template/{{.project_name}}/.skip.tmpl`:
+
+```
+{{- if eq .some_property "value" -}}
+{{ skip (printf "path/to/%s" .other_property) }}
+{{- end -}}
+```
+
+The `skip` function takes a glob pattern relative to `template/{{.project_name}}/`. To compose
+paths from other properties, use Go template's `printf` — `{{...}}` inside string literals is
+**not** re-processed.
+
+## Verification (manual)
+
+```bash
+# Init with examples
+databricks bundle init ./pipeline_bundle_template --output-dir /tmp/test-init
+
+# Validate
+cd /tmp/test-init/<project_name>
+databricks bundle validate --target dev
+
+# Init without examples (verify skip path)
+databricks bundle init ./pipeline_bundle_template --output-dir /tmp/test-init-skip
+# answer 'no' to include_example_dataflows
+# confirm src/dataflows/ is absent
+```
diff --git a/pipeline_bundle_template/databricks.yml b/pipeline_bundle_template/databricks.yml
diff --git a/pipeline_bundle_template/databricks_template_schema.json b/pipeline_bundle_template/databricks_template_schema.json
@@ -0,0 +1,93 @@
+{
+    "welcome_message": "\nWelcome to the Lakeflow Framework pipeline bundle template.\n\nYou'll be prompted for a few details to scaffold a new pipeline bundle.\nDefaults are provided in [brackets]; press Enter to accept them.\n",
+    "properties": {
+        "project_name": {
+            "type": "string",
+            "description": "Project Name (used as the DAB bundle name and the root folder of the generated project)",
+            "default": "my_project",
+            "order": 1,
+            "pattern": "^[a-z][a-z0-9_]{2,}$",
+            "pattern_match_failure_message": "Project name must start with a lowercase letter and contain only lowercase letters, digits, and underscores (minimum 3 characters)."
+        },
+        "pipeline_name": {
+            "type": "string",
+            "description": "Pipeline Name (used in the initial pipeline resource yml filename and as the dataflow group folder under src/dataflows/)",
+            "default": "{{.project_name}}",
+            "order": 2,
+            "pattern": "^[a-z][a-z0-9_]+$",
+            "pattern_match_failure_message": "Pipeline name must start with a lowercase letter and contain only lowercase letters, digits, and underscores."
+        },
+        "layer": {
+            "type": "string",
+            "description": "Layer (medallion layer for this bundle's pipeline)",
+            "enum": ["bronze", "silver", "gold"],
+            "default": "bronze",
+            "order": 3
+        },
+        "catalog": {
+            "type": "string",
+            "description": "Catalog (target Unity Catalog catalog for this bundle's outputs - baked into the catalog DAB variable default)",
+            "default": "main",
+            "order": 4
+        },
+        "schema": {
+            "type": "string",
+            "description": "Schema (target Unity Catalog schema for this bundle's outputs - baked into the schema DAB variable default)",
+            "default": "{{.project_name}}",
+            "order": 5
+        },
+        "include_example_dataflows": {
+            "type": "string",
+            "description": "Include Example Dataflow? (recommended for new users)",
+            "enum": ["yes", "no"],
+            "default": "yes",
+            "order": 6
+        },
+        "example_target_table": {
+            "type": "string",
+            "description": "Example Target Table (name of the target table this example dataflow produces - drives dataFlowId, flowGroupId, filenames, etc.)",
+            "default": "my_target_table",
+            "order": 7,
+            "skip_prompt_if": {
+                "properties": {
+                    "include_example_dataflows": { "const": "no" }
+                }
+            }
+        },
+        "example_source_table": {
+            "type": "string",
+            "description": "Example Source Table (name of the upstream source table the example dataflow reads from)",
+            "default": "my_source_table",
+            "order": 8,
+            "skip_prompt_if": {
+                "properties": {
+                    "include_example_dataflows": { "const": "no" }
+                }
+            }
+        },
+        "source_catalog": {
+            "type": "string",
+            "description": "Source Catalog (Unity Catalog catalog where the example_source_table lives - pre-populated into dev_substitutions.json so the bundle works without manual edits)",
+            "default": "{{.catalog}}",
+            "order": 9,
+            "skip_prompt_if": {
+                "properties": {
+                    "include_example_dataflows": { "const": "no" }
+                }
+            }
+        },
+        "source_schema": {
+            "type": "string",
+            "description": "Source Schema (Unity Catalog schema where the example_source_table lives - pre-populated into dev_substitutions.json)",
+            "default": "{{.schema}}",
+            "order": 10,
+            "skip_prompt_if": {
+                "properties": {
+                    "include_example_dataflows": { "const": "no" }
+                }
+            }
+        }
+    },
+    "success_message": "\nProject '{{.project_name}}' created.\n\nNext steps:\n  cd {{.project_name}}\n  databricks bundle validate --target dev\n  databricks bundle deploy --target dev\n\nWhat's left for you to fill in (the variable-length scaffolding):\n  - src/dataflows/{{.pipeline_name}}/schemas/{{.example_target_table}}_schema.json.example\n      (replace the <FIELD NAME> / <FIELD TYPE> placeholders with your actual table columns, then remove the '.example' file suffix)\n  - src/dataflows/{{.pipeline_name}}/expectations/{{.example_target_table}}_dqe.json.example\n      (define your data quality constraints then remove the '.example' file suffix, or delete the file if not needed)\n  - src/pipeline_configs/dev_substitutions.json\n      (the SOURCE_CAT_SCHEMA token is already wired up; add more tokens here if you need them)\n  - src/pipeline_configs/prod_substitutions.json.example\n      (add prefix/suffix config and include more tokens here if you need them, then remove the '.example' file suffix)\n\nThe framework_source_path default in databricks.yml assumes the Lakeflow Framework's\n'dev' target is deployed. Override per-environment in your DAB targets if needed.\n",
+    "min_databricks_cli_version": "v0.218.0"
+}
diff --git a/pipeline_bundle_template/fixtures/.gitkeep b/pipeline_bundle_template/fixtures/.gitkeep
diff --git a/...e_bundle_template/src/dataflows/PIPELINE_NAME_1/dataflowspec/[flow]TARGET TABLE_main.json b/...e_bundle_template/src/dataflows/PIPELINE_NAME_1/dataflowspec/[flow]TARGET TABLE_main.json
diff --git a/...ndle_template/src/dataflows/PIPELINE_NAME_1/dataflowspec/[standard]TARGET TABLE_main.json b/...ndle_template/src/dataflows/PIPELINE_NAME_1/dataflowspec/[standard]TARGET TABLE_main.json