Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 36 additions & 12 deletions content/self-hosting/deployment/infrastructure/clickhouse.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -24,17 +24,17 @@
Langfuse accepts the following environment variables to fine-tune your ClickHouse usage.
They need to be provided for the Langfuse Web and Langfuse Worker containers.

| Variable | Required / Default | Description |
| --------------------------------------------- | ------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `CLICKHOUSE_MIGRATION_URL` | Required | Migration URL (TCP protocol) for the ClickHouse instance. Pattern: `clickhouse://<hostname>:(9000/9440)` |
| `CLICKHOUSE_MIGRATION_SSL` | `false` | Set to true to establish an SSL connection to ClickHouse for the database migration. |
| `CLICKHOUSE_URL` | Required | Hostname of the ClickHouse instance. Pattern: `http(s)://<hostname>:(8123/8443)` |
| `CLICKHOUSE_USER` | Required | Username of the ClickHouse database. Needs SELECT, ALTER, INSERT, CREATE, DELETE grants. |
| `CLICKHOUSE_PASSWORD` | Required | Password of the ClickHouse user. |
| `CLICKHOUSE_DB` | `default` | Name of the ClickHouse database to use. |
| `CLICKHOUSE_CLUSTER_ENABLED` | `true` | Whether to run ClickHouse commands `ON CLUSTER`. Set to `false` for single-container setups. |
| `LANGFUSE_AUTO_CLICKHOUSE_MIGRATION_DISABLED` | `false` | Whether to disable automatic ClickHouse migrations. |
| `CLICKHOUSE_READ_ONLY_URL` | | Optional read-only endpoint used for UI and public-API read queries on traces, observations, scores, and sessions. Falls back to `CLICKHOUSE_URL` when unset. Reuses `CLICKHOUSE_USER`, `CLICKHOUSE_PASSWORD`, and `CLICKHOUSE_DB`. Only useful on [compute-compute separated](https://clickhouse.com/docs/cloud/reference/warehouses) clusters. See [Scaling](/self-hosting/configuration/scaling#clickhouse-read-only-url). |
| Variable | Required / Default | Description |
| --------------------------------------------- | ------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `CLICKHOUSE_MIGRATION_URL` | Required | Migration URL (TCP protocol) for the ClickHouse instance. Pattern: `clickhouse://<hostname>:(9000/9440)` |
| `CLICKHOUSE_MIGRATION_SSL` | `false` | Set to true to establish an SSL connection to ClickHouse for the database migration. |
| `CLICKHOUSE_URL` | Required | Hostname of the ClickHouse instance. Pattern: `http(s)://<hostname>:(8123/8443)` |
| `CLICKHOUSE_USER` | Required | Username of the ClickHouse database. Needs SELECT, ALTER, INSERT, CREATE, DELETE grants. |
| `CLICKHOUSE_PASSWORD` | Required | Password of the ClickHouse user. |
| `CLICKHOUSE_DB` | `default` | Name of the ClickHouse database to use. |
| `CLICKHOUSE_CLUSTER_ENABLED` | `true` | Whether to run ClickHouse commands `ON CLUSTER`. Set to `false` for single-container setups. |
| `LANGFUSE_AUTO_CLICKHOUSE_MIGRATION_DISABLED` | `false` | Whether to disable automatic ClickHouse migrations. |
| `CLICKHOUSE_READ_ONLY_URL` | | Optional read-only endpoint used for public-API reads and selected UI/filter read queries. Falls back to `CLICKHOUSE_URL` when unset. Reuses `CLICKHOUSE_USER`, `CLICKHOUSE_PASSWORD`, and `CLICKHOUSE_DB`. Only useful on [compute-compute separated](https://clickhouse.com/docs/cloud/reference/warehouses) clusters. See [Scaling](/self-hosting/configuration/scaling#clickhouse-read-only-url). |

Check failure on line 37 in content/self-hosting/deployment/infrastructure/clickhouse.mdx

View check run for this annotation

Claude / Claude Code Review

Narrowed CLICKHOUSE_READ_ONLY_URL description contradicts linked docs

The CLICKHOUSE_READ_ONLY_URL description is narrowed here (lines 37 and 120) to 'public-API reads and selected UI/filter read queries', but two sibling docs still carry the original broader wording and now contradict this page: `content/self-hosting/configuration/index.mdx:25` (master env-var index) still says 'UI and public-API read queries on traces, observations, scores, and sessions', and `content/self-hosting/configuration/scaling.mdx:77` — the page this row links to as 'See [Scaling]' — st
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The CLICKHOUSE_READ_ONLY_URL description is narrowed here (lines 37 and 120) to 'public-API reads and selected UI/filter read queries', but two sibling docs still carry the original broader wording and now contradict this page: content/self-hosting/configuration/index.mdx:25 (master env-var index) still says 'UI and public-API read queries on traces, observations, scores, and sessions', and content/self-hosting/configuration/scaling.mdx:77 — the page this row links to as 'See [Scaling]' — still says 'route UI and public-API read queries to the given endpoint'. Please update those two locations to match the new narrower wording so users following the link don't get conflicting answers about what the env var actually routes.

Extended reasoning...

The bug

This PR intentionally narrows the description of CLICKHOUSE_READ_ONLY_URL in content/self-hosting/deployment/infrastructure/clickhouse.mdx in two places:

  • Line 37 (env-var table): 'UI and public-API read queries on traces, observations, scores, and sessions' → 'public-API reads and selected UI/filter read queries'
  • Line 120 (Cloud/BYOC narrative): 'read-heavy UI and public-API traffic' → 'public-API reads and selected UI/filter read traffic'

But the same variable is described in two other docs that were not touched and now disagree with the new wording.

Where the contradictions live

  1. content/self-hosting/configuration/index.mdx:25 — the master self-hosting env-var index table — still says:

    Optional read-only endpoint used for UI and public-API read queries on traces, observations, scores, and sessions.

  2. content/self-hosting/configuration/scaling.mdx:72-77 — the deep-dive page that this PR's new row explicitly links to as the canonical reference via See [Scaling](/self-hosting/configuration/scaling#clickhouse-read-only-url) — still says:

    This keeps dashboard and public-API read traffic from contending with ingestion inserts… Set CLICKHOUSE_READ_ONLY_URL and Langfuse will route UI and public-API read queries to the given endpoint while writes, migrations, and ingestion continue to use CLICKHOUSE_URL.

Step-by-step proof a reader gets contradictory information

  1. A self-hosting operator lands on /self-hosting/deployment/infrastructure/clickhouse.mdx and reads the new row at line 37: only public-API reads and selected UI/filter read queries are routed.
  2. They click the in-row link See [Scaling](/self-hosting/configuration/scaling#clickhouse-read-only-url) to learn more.
  3. The target section on scaling.mdx claims all UI and public-API read queries are routed — a strictly broader scope than what the page they came from just promised.
  4. If they cross-check the central env-var index at /self-hosting/configuration, they see yet a third variant of the same description (broader scope plus a specific entity list 'traces, observations, scores, and sessions' that the new wording dropped).

Impact

Documentation-only inconsistency, no runtime behavior change. But it is directly caused by this PR (the prior state of the docs was self-consistent at the broader wording), and the inconsistency is one hop away from the edited row via a link the PR itself places. A user trying to decide whether to provision a separate ClickHouse read compute group gets three different answers about what reads will be served from it.

Fix

Update the two sibling occurrences to the narrowed wording introduced here:

  • content/self-hosting/configuration/index.mdx:25 — replace the env-var row description with wording consistent with the new clickhouse.mdx text.
  • content/self-hosting/configuration/scaling.mdx:72-77 — adjust 'dashboard and public-API read traffic' and 'route UI and public-API read queries' to reflect that only the public API plus selected UI/filter reads are routed.

Decide whether the dropped entity list ('traces, observations, scores, and sessions') should be retained anywhere, or whether 'selected UI/filter read queries' is sufficiently specific in the new framing.


Langfuse uses `default` as the cluster name if CLICKHOUSE_CLUSTER_ENABLED is set to `true`.
You can overwrite this by setting `CLICKHOUSE_CLUSTER_NAME` to a different value.
Expand Down Expand Up @@ -76,6 +76,30 @@

Replace `'user'` with your actual ClickHouse username and adjust the database name if you're using a different database than `default`.

## Direct ClickHouse Access for Custom Tools [#direct-clickhouse-access]

Self-hosted deployments give you control over the underlying ClickHouse database.
You can query it directly for internal dashboards, audits, migrations, or one-off debugging.

For production integrations and custom applications, prefer the [Public API](/docs/api-and-data-platform/features/public-api), [SDK query helpers](/docs/api-and-data-platform/features/query-via-sdk), [MCP server](/docs/api-and-data-platform/features/mcp-server), or [Blob Storage Export](/docs/api-and-data-platform/features/export-to-blob-storage).
These interfaces are the compatibility targets across Langfuse releases.

<Callout type="warning">

The ClickHouse schema is not a stable API contract.
Major Langfuse upgrades, background migrations, and performance work such as [Simplify Langfuse for Scale](/changelog/2026-03-10-simplify-for-scale) can change tables, columns, deduplication behavior, or join patterns.
Custom queries that read `traces`, `observations`, `scores`, or internal materialized views should be validated as part of every Langfuse upgrade.

</Callout>

If you do query ClickHouse directly:

- Use a dedicated read-only user or a dedicated read-only compute group where available.
- Keep direct analytical traffic away from the primary ingestion path. On ClickHouse Cloud or BYOC, use separate compute groups and configure `CLICKHOUSE_READ_ONLY_URL` for supported Langfuse read paths, especially Public API and filter/helper reads.
- Always include project and time filters where possible. Langfuse tracing data is optimized around project and time access patterns.
- Do not write directly to Langfuse tables. Use the Public API or SDKs for creating and updating Langfuse data.
- If a missing API filter or field forces you to query ClickHouse directly, please open a [GitHub issue](https://github.com/langfuse/langfuse/issues) with your use case; APIs are the preferred long-term extension point.

## Deployment Options

This section covers different deployment options and provides example environment variables.
Expand All @@ -93,7 +117,7 @@

If you need the operational model of ClickHouse Cloud while keeping the ClickHouse data plane in your own cloud account, consider [ClickHouse BYOC](https://clickhouse.com/cloud/bring-your-own-cloud). BYOC is a fully managed ClickHouse Cloud deployment on infrastructure in your cloud account and is designed for large-scale deployments with strict data residency, compliance, or VPC-boundary requirements.

We recommend ClickHouse Cloud or BYOC for larger Langfuse deployments because they provide cloud-native scaling primitives that are not available in the self-managed OSS ClickHouse setup used by Langfuse. ClickHouse Cloud and BYOC separate storage from compute through [SharedMergeTree](https://clickhouse.com/docs/cloud/reference/shared-merge-tree), which helps scale compute independently of stored data, reduces replica storage overhead, and avoids manual shard planning for growth. They also support compute-compute separation through [warehouses](https://clickhouse.com/docs/cloud/reference/warehouses), so you can isolate ingestion writes, UI reads, analytical queries, or ad-hoc workloads on separate compute groups that share the same data but do not compete for the same CPU and memory. Langfuse can use this pattern via `CLICKHOUSE_READ_ONLY_URL` for read-heavy UI and public-API traffic.
We recommend ClickHouse Cloud or BYOC for larger Langfuse deployments because they provide cloud-native scaling primitives that are not available in the self-managed OSS ClickHouse setup used by Langfuse. ClickHouse Cloud and BYOC separate storage from compute through [SharedMergeTree](https://clickhouse.com/docs/cloud/reference/shared-merge-tree), which helps scale compute independently of stored data, reduces replica storage overhead, and avoids manual shard planning for growth. They also support compute-compute separation through [warehouses](https://clickhouse.com/docs/cloud/reference/warehouses), so you can isolate ingestion writes, supported Langfuse reads, analytical queries, or ad-hoc workloads on separate compute groups that share the same data but do not compete for the same CPU and memory. Langfuse can use this pattern via `CLICKHOUSE_READ_ONLY_URL` for public-API reads and selected UI/filter read traffic.

If you need assistance or want to talk to the ClickHouse team, you can reach out to them [here](https://clickhouse.com/company/contact).

Expand Down
30 changes: 19 additions & 11 deletions content/self-hosting/security/data-masking.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ For comprehensive documentation including code examples, advanced patterns, and

---

## Server-Side Ingestion Masking (EE)
## Server-Side Ingestion Masking (EE) [#server-side-ingestion-masking-ee]

<Callout type="info">
This feature requires an Enterprise license. Please add your [license
Expand All @@ -52,6 +52,14 @@ For comprehensive documentation including code examples, advanced patterns, and
Server-side ingestion masking allows self-hosted Langfuse administrators to define custom callback logic for masking or redacting sensitive data from tracing events as they are ingested.
This feature operates at the Langfuse Worker container level, providing centralized data masking across all clients.

<Callout type="warning">

Server-side masking is a centralized safety net, not a replacement for client-side masking when sensitive data must never leave the application boundary.
In the self-hosted ingestion pipeline, events are written to the event blob storage bucket before the Worker calls the masking callback.
The callback masks data before it is processed into ClickHouse and downstream Langfuse views.

</Callout>
Comment on lines +55 to +61
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 security Blob storage data fate after masking is unspecified

The new warning correctly flags that events land in blob storage before the masking callback runs, but it doesn't say whether that unmasked blob-storage copy is ever replaced or deleted after the Worker processes the masked version. Looking at the sequence diagram (SDK → Web → S3 (unmasked) → Worker → Callback → ClickHouse), there is no "update S3" or "delete from S3" step. If unmasked data persists in blob storage indefinitely, users deploying this feature for GDPR, HIPAA, or PCI DSS compliance may believe their storage layer is protected when only ClickHouse and downstream views receive masked data. A single sentence clarifying the blob-storage retention behavior (e.g., whether it is retained as-is, or whether client-side masking is required to prevent PII reaching blob storage at all) would prevent a compliance blind-spot for operators who read this warning and draw the wrong conclusion.

Prompt To Fix With AI
This is a comment left during a code review.
Path: content/self-hosting/security/data-masking.mdx
Line: 55-61

Comment:
**Blob storage data fate after masking is unspecified**

The new warning correctly flags that events land in blob storage *before* the masking callback runs, but it doesn't say whether that unmasked blob-storage copy is ever replaced or deleted after the Worker processes the masked version. Looking at the sequence diagram (SDK → Web → S3 (unmasked) → Worker → Callback → ClickHouse), there is no "update S3" or "delete from S3" step. If unmasked data persists in blob storage indefinitely, users deploying this feature for GDPR, HIPAA, or PCI DSS compliance may believe their storage layer is protected when only ClickHouse and downstream views receive masked data. A single sentence clarifying the blob-storage retention behavior (e.g., whether it is retained as-is, or whether client-side masking is required to prevent PII reaching blob storage at all) would prevent a compliance blind-spot for operators who read this warning and draw the wrong conclusion.

How can I resolve this? If you propose a fix, please make it concise.


**Key benefits:**

- Single point of configuration for all tracing data
Expand Down Expand Up @@ -134,19 +142,19 @@ Your callback must return:

<Callout type="warning">
The response object must maintain the same structure as the input.
Only modify the values you want to mask—do not add, remove, or rename fields.
Only modify the values you want to mask—do not add, remove, or rename fields. Langfuse parses the callback response as JSON and expects the OpenTelemetry shape during downstream processing, but it does not run a separate structural validation at the callback boundary.
</Callout>

#### Error Handling

Error handling behavior is configured via `LANGFUSE_INGESTION_MASKING_CALLBACK_FAIL_CLOSED`:

| Scenario | Fail Closed (default) | Fail Open |
| ----------------------- | ----------------------------- | ---------------------------------------- |
| Callback timeout | Event dropped, warning logged | Event processed unmasked, warning logged |
| HTTP error (4xx, 5xx) | Event dropped, warning logged | Event processed unmasked, warning logged |
| Invalid response schema | Event dropped, warning logged | Event processed unmasked, warning logged |
| Network error | Event dropped, warning logged | Event processed unmasked, warning logged |
| Scenario | Fail open (default, `false`) | Fail closed (`true`) |
| --------------------- | ---------------------------------------- | ----------------------------- |
| Callback timeout | Event processed unmasked, warning logged | Event dropped, warning logged |
| HTTP error (4xx, 5xx) | Event processed unmasked, warning logged | Event dropped, warning logged |
| Invalid JSON response | Event processed unmasked, warning logged | Event dropped, warning logged |
| Network error | Event processed unmasked, warning logged | Event dropped, warning logged |

### Limitations

Expand Down Expand Up @@ -232,14 +240,14 @@ LANGFUSE_INGESTION_MASKING_CALLBACK_FAIL_CLOSED=true
- Events are being dropped unexpectedly
1. Check that your masking service is responding within the configured timeout.
2. Verify the response schema matches the input schema exactly.
3. Review Langfuse Web container logs for warning messages.
3. Review Langfuse Worker container logs for warning messages.
4. Temporarily set `LANGFUSE_INGESTION_MASKING_CALLBACK_FAIL_CLOSED=false` to diagnose issues.
- High latency on trace ingestion
1. Monitor your masking service response times.
2. Consider increasing `LANGFUSE_INGESTION_MASKING_CALLBACK_TIMEOUT_MS` if your masking logic requires more time.
3. Optimize your masking logic or add caching where appropriate.
4. Ensure network latency between Langfuse and your masking service is minimal.
- Masking not being applied
1. Verify `LANGFUSE_INGESTION_MASKING_CALLBACK_URL` is correctly set on the Langfuse Web container.
2. Check that your masking service is reachable from the Langfuse Web container.
1. Verify `LANGFUSE_INGESTION_MASKING_CALLBACK_URL` is correctly set on the Langfuse Worker container.
2. Check that your masking service is reachable from the Langfuse Worker container.
3. Ensure your masking logic is correctly modifying the data and returning it.
Loading