Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -44,5 +44,9 @@ COPY --chown=ksml:0 build-output/NOTICE.txt /licenses/THIRD-PARTY-LICENSES.txt
COPY --chown=ksml:0 build-output/LICENSE.txt /licenses/LICENSE.txt
COPY --chown=ksml:0 build-output/libs/ /opt/ksml/libs/
COPY --chown=ksml:0 build-output/ksml-runner*.jar /opt/ksml/ksml.jar
COPY --chown=ksml:0 build-output/ksml-test-runner*.jar /opt/ksml/ksml-test.jar

# Default entrypoint runs the KSML pipeline runner.
# To run the test runner instead, override the entrypoint:
# docker run --entrypoint java <image> -Djava.security.manager=allow -jar /opt/ksml/ksml-test.jar /tests/my-test.yaml
ENTRYPOINT ["java", "-Djava.security.manager=allow", "-jar", "/opt/ksml/ksml.jar"]
9 changes: 6 additions & 3 deletions build-local-docker.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,18 @@

set -e # Exit on any error

mvn package -DskipITs=true
mvn clean package -DskipITs=true

# Prepare build artifacts
echo " - Creating build-output/ directory"
echo " - Copying ksml-runner JAR, libraries, and license files"
echo " - Copying ksml-runner JAR, ksml-test-runner JAR, libraries, and license files"

mkdir -p build-output
cp ksml-runner/target/ksml-runner*.jar build-output/
cp ksml-test-runner/target/ksml-test-runner*.jar build-output/
cp -r ksml-runner/target/libs build-output/
# Copy test-runner libs on top, so both manifests find the JARs they reference
cp ksml-test-runner/target/libs/*.jar build-output/libs/
cp ksml-runner/NOTICE.txt build-output/
cp LICENSE.txt build-output/
GRAALVM_JDK_VERSION=${GRAALVM_JDK_VERSION:-23.0.2}
Expand Down Expand Up @@ -42,4 +45,4 @@ docker buildx build \
-t axual/ksml:local \
--target ksml \
-f Dockerfile \
.
.
87 changes: 79 additions & 8 deletions docs/getting-started/schema-validation.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# KSML Schema Validation

KSML provides two JSON Schema specification files that enable IDE validation, autocompletion, and error checking for your KSML files. These schemas help you write correct syntax and catch errors early in development.
KSML provides three JSON Schema specification files that enable IDE validation, autocompletion, and error checking for your KSML files. These schemas help you write correct syntax and catch errors early in development.

The KSML project is available at: [https://github.com/Axual/ksml](https://github.com/Axual/ksml)

## Understanding KSML Schemas

KSML provides **two separate JSON schemas** for different types of configuration files:
KSML provides **three separate JSON schemas** for different types of configuration files:

### 1. KSML Language Specification Schema

Expand Down Expand Up @@ -50,6 +50,24 @@ This schema validates the runner configuration that controls how KSML executes:
%}
```

### 3. KSML Test Definition Schema

**File:** `docs/ksml-test-spec.json`
**Validates:** KSML test definition files used by the [test runner](testing-your-pipeline.md)

This schema validates test definitions that describe how to test your pipelines:

- Test metadata (`name`, `pipeline`, `schemaDirectory`)
- Produce blocks with topic, key/value types, and inline messages
- Assert blocks with output topic, state store, and Python assertion code
- Required vs optional fields and default values (e.g. `keyType` defaults to `string`)

??? info "Example KSML test definition file"

```yaml
--8<-- "sample-filter-test.yaml"
```

### Schema Benefits

Using KSML schemas in your IDE provides:
Expand Down Expand Up @@ -99,6 +117,19 @@ kafka:
# ... rest of your configuration
```

#### For KSML Test Definition Files

Add this line at the top of your test YAML files:

```yaml
# yaml-language-server: $schema=https://axual.github.io/ksml/latest/ksml-test-spec.json

test:
name: "My pipeline test"
pipeline: my-pipeline.yaml
# ... rest of your test definition
```

#### How Inline Schemas Work

The special comment `# yaml-language-server: $schema=URL` tells your IDE:
Expand Down Expand Up @@ -164,7 +195,20 @@ Configure validation for KSML Runner configuration files:
- **For specific files**: `ksml-runner.yaml`, `application.yaml`
- **For file patterns**: `ksml-runner*.yaml`, `*-runner.yaml`

**Important**: Make sure the file patterns for each schema don't overlap. KSML definition files should map to the Language Specification schema, while runner configuration files should map to the Runner Configuration schema.
**Step 4: Add KSML Test Definition Schema**

Configure validation for KSML test definition files:

1. Click the **+** (plus) button again to add another schema mapping
2. Configure the mapping:
- **Name**: `KSML Test Definition`
- **Schema file or URL**: Browse to `docs/ksml-test-spec.json` in your KSML project directory
- **Schema version**: Select **JSON Schema version 7**

3. Add file mappings by clicking **+** in the mappings section:
- **For file patterns**: `*-test.yaml`, `test-*.yaml`

**Important**: Make sure the file patterns for each schema don't overlap. KSML definition files should map to the Language Specification schema, runner configuration files to the Runner Configuration schema, and test files to the Test Definition schema.

#### Visual Studio Code Setup

Expand Down Expand Up @@ -200,6 +244,10 @@ You need to map different file patterns to the appropriate schema.
"**/ksml-runner.yaml",
"**/*-runner.yaml",
"**/application.yaml"
],
"file:///path/to/ksml/docs/ksml-test-spec.json": [
"**/*-test.yaml",
"**/test-*.yaml"
]
}
}
Expand All @@ -221,6 +269,10 @@ Create a `.vscode/settings.json` file in your project root for project-specific
"./docs/ksml-runner-spec.json": [
"**/ksml-runner.yaml",
"examples/**/ksml-runner.yaml"
],
"./docs/ksml-test-spec.json": [
"**/*-test.yaml",
"**/test-*.yaml"
]
}
}
Expand Down Expand Up @@ -307,11 +359,21 @@ To generate both the KSML Language Specification and Runner Configuration schema
mvn package -DskipTests -pl ksml-runner -am
```

This builds the module with its dependencies and generates both schema files:
This builds the module with its dependencies and generates the pipeline and runner schema files:

- `docs/ksml-language-spec.json` for KSML definitions
- `docs/ksml-runner-spec.json` for runner configuration

To also generate the test definition schema:

```bash
mvn package -DskipTests -pl ksml-test-runner -am
```

This generates:

- `docs/ksml-test-spec.json` for test definitions

**Note:** The `-am` (also-make) flag is required to build all dependencies needed for schema generation.

### Generating Individual Schemas
Expand All @@ -336,9 +398,10 @@ java -jar ksml-runner/target/ksml-runner-*.jar --runner-schema docs/ksml-runner-

Schemas are automatically regenerated when running:

- `mvn clean package` for a full build (recommended)
- `mvn package -DskipTests -pl ksml-runner -am` for quick schema generation without tests
- Any Maven build that includes the `process-classes` phase for `ksml-runner`
- `mvn clean package` for a full build (recommended, generates all three schemas)
- `mvn package -DskipTests -pl ksml-runner -am` for pipeline and runner schemas
- `mvn package -DskipTests -pl ksml-test-runner -am` for the test definition schema
- Any Maven build that includes the `process-classes` phase for `ksml-runner` or `ksml-test-runner`

The schemas are always kept in sync with the codebase, ensuring your IDE validation matches the current KSML version.

Expand All @@ -362,9 +425,17 @@ docs/ksml-runner-spec.json
**Purpose:** Validates KSML Runner configuration files (Kafka settings, error handling, observability)
**Schema Version:** JSON Schema Draft 2019-09

### KSML Test Definition
```
docs/ksml-test-spec.json
```

**Purpose:** Validates KSML test definition files (test data, assertions, pipeline references)
**Schema Version:** JSON Schema Draft-07

### Schema Characteristics

Both schema files share these characteristics:
All schema files share these characteristics:

- Updated and version-controlled with each KSML release
- Comprehensive coverage of all features and configuration options
Expand Down
175 changes: 175 additions & 0 deletions docs/getting-started/testing-your-pipeline.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
# Testing Your Pipeline

KSML includes a test runner that lets you verify your pipeline logic without a running Kafka broker. You write a YAML test definition that describes what data to send and what to assert, and the test runner handles the rest using Kafka's `TopologyTestDriver`.

## How It Works

The test runner:

1. Parses your KSML pipeline definition and builds a Kafka Streams topology
2. Sends test messages into the topology's input topics
3. Runs Python assertions against output topics and/or state stores
4. Reports pass/fail results

No Kafka broker, no Schema Registry, no infrastructure required.

## Test Definition Format

A test definition is a YAML file with a `test` root element:

```yaml
test:
name: "Human-readable test name"
pipeline: path/to/pipeline.yaml
schemaDirectory: path/to/schemas # optional, for Avro schemas

produce:
- topic: input-topic-name
keyType: string # optional, defaults to "string"
valueType: "avro:SensorData" # optional, defaults to "string"
messages:
- key: "my-key"
value: { field: "value" }
timestamp: 1709200000000 # optional, epoch millis

assert:
- topic: output-topic-name
code: |
assert len(records) == 1
assert records[0]["key"] == "my-key"
```

### Produce Blocks

Each produce block targets one input topic. You can define multiple produce blocks to feed data into different topics (e.g. for join tests).

| Field | Required | Default | Description |
|---|---|---|---|
| `topic` | yes | | Kafka topic name |
| `keyType` | no | `string` | Key serialization type (e.g. `string`, `avro:MySchema`) |
| `valueType` | no | `string` | Value serialization type |
| `messages` | yes | | List of messages with `key`, `value`, and optional `timestamp` |

### Assert Blocks

Each assert block runs Python code with injected variables. At least one of `topic` or `stores` must be specified.

| Field | Required | Description |
|---|---|---|
| `topic` | no | Output topic to read records from. Injects a `records` list variable |
| `stores` | no | List of state store names to inject as Python variables |
| `code` | yes | Python assertion code using `assert` statements |

When `topic` is set, `records` is a list of dicts with `key`, `value`, and `timestamp` fields.
When `stores` is set, each store is available as a Python variable with the same API as in pipeline functions (e.g. `store.get(key)`).

## Example: Testing a Filter Pipeline

Let's walk through testing a pipeline that filters sensor data, keeping only sensors with color "blue".

### The Pipeline

??? info "Pipeline definition: `test-filter.yaml` (click to expand)"

```yaml
--8<-- "pipelines/test-filter.yaml"
```

This pipeline reads from `ksml_sensordata_avro`, filters messages where the sensor color is "blue", and writes the matching messages to `ksml_sensordata_filtered`.

### The Test

??? info "Test definition: `sample-filter-test.yaml` (click to expand)"

```yaml
--8<-- "sample-filter-test.yaml"
```

The test sends three sensor messages (two blue, one red) and asserts that only the two blue sensors appear in the output topic.

## Running Tests with Docker

The KSML Docker image includes the test runner at `/opt/ksml/ksml-test.jar`. Mount your test files and override the entrypoint:

```bash
docker run --rm \
-v ./my-tests:/tests \
--entrypoint java \
axual/ksml:latest \
-Djava.security.manager=allow -jar /opt/ksml/ksml-test.jar \
/tests/my-test.yaml
```

You can pass multiple test files:

```bash
docker run --rm \
-v ./my-tests:/tests \
--entrypoint java \
axual/ksml:latest \
-Djava.security.manager=allow -jar /opt/ksml/ksml-test.jar \
/tests/filter-test.yaml /tests/join-test.yaml /tests/store-test.yaml
```

### Example Output

```
=== KSML Test Results ===

PASS Filter pipeline passes blue sensors

1 passed, 0 failed, 0 errors
```

The exit code is `0` when all tests pass, `1` otherwise. This makes it easy to integrate into CI/CD pipelines.

## Writing Assertions

Assertions use Python's `assert` statement. Some common patterns:

### Check record count

```python
assert len(records) == 3, f"Expected 3 records, got {len(records)}"
```

### Check specific record values

```python
assert records[0]["key"] == "sensor-1"
assert records[0]["value"]["color"] == "blue"
```

### Check timestamps

```python
assert records[0]["timestamp"] == 1709200000000
```

### Check state store contents

```python
# With stores: [my_store] in the assert block
value = my_store.get("sensor-1")
assert value is not None, "Expected sensor-1 in store"
assert value["temperature"] == "25.0"
```

## Schema Validation for Test Files

A JSON Schema is available for test definition files at `docs/ksml-test-spec.json`. See the [Schema Validation](schema-validation.md) page for instructions on setting up editor auto-completion and validation.

## Logging

The test runner ships with a default Logback configuration that keeps output quiet: `WARN` for everything, `INFO` for the test runner itself so you still see the `Running test: ...` progress lines and the final results table.

To get verbose output for one run, point Logback at a custom logback configuration file at invocation time:

```bash
docker run --rm \
-v ./my-tests:/tests \
--entrypoint java \
axual/ksml:latest \
-Dlogback.configurationFile=/tests/logback-debug.xml \
-jar /opt/ksml/ksml-test.jar /tests/my-test.yaml
```
Loading
Loading