From 9c837260c1499645bb5236a53d42f40af6012192 Mon Sep 17 00:00:00 2001 From: Marc Klingen Date: Fri, 22 May 2026 13:25:54 +0200 Subject: [PATCH] improvements to clickhouse and data masking --- .../deployment/infrastructure/clickhouse.mdx | 48 ++++++++++++++----- .../self-hosting/security/data-masking.mdx | 30 +++++++----- 2 files changed, 55 insertions(+), 23 deletions(-) diff --git a/content/self-hosting/deployment/infrastructure/clickhouse.mdx b/content/self-hosting/deployment/infrastructure/clickhouse.mdx index 864d529cc5..073edb30e6 100644 --- a/content/self-hosting/deployment/infrastructure/clickhouse.mdx +++ b/content/self-hosting/deployment/infrastructure/clickhouse.mdx @@ -24,17 +24,17 @@ Langfuse supports ClickHouse versions >= 24.3. Langfuse accepts the following environment variables to fine-tune your ClickHouse usage. They need to be provided for the Langfuse Web and Langfuse Worker containers. -| Variable | Required / Default | Description | -| --------------------------------------------- | ------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `CLICKHOUSE_MIGRATION_URL` | Required | Migration URL (TCP protocol) for the ClickHouse instance. Pattern: `clickhouse://:(9000/9440)` | -| `CLICKHOUSE_MIGRATION_SSL` | `false` | Set to true to establish an SSL connection to ClickHouse for the database migration. | -| `CLICKHOUSE_URL` | Required | Hostname of the ClickHouse instance. Pattern: `http(s)://:(8123/8443)` | -| `CLICKHOUSE_USER` | Required | Username of the ClickHouse database. Needs SELECT, ALTER, INSERT, CREATE, DELETE grants. | -| `CLICKHOUSE_PASSWORD` | Required | Password of the ClickHouse user. | -| `CLICKHOUSE_DB` | `default` | Name of the ClickHouse database to use. | -| `CLICKHOUSE_CLUSTER_ENABLED` | `true` | Whether to run ClickHouse commands `ON CLUSTER`. Set to `false` for single-container setups. | -| `LANGFUSE_AUTO_CLICKHOUSE_MIGRATION_DISABLED` | `false` | Whether to disable automatic ClickHouse migrations. | -| `CLICKHOUSE_READ_ONLY_URL` | | Optional read-only endpoint used for UI and public-API read queries on traces, observations, scores, and sessions. Falls back to `CLICKHOUSE_URL` when unset. Reuses `CLICKHOUSE_USER`, `CLICKHOUSE_PASSWORD`, and `CLICKHOUSE_DB`. Only useful on [compute-compute separated](https://clickhouse.com/docs/cloud/reference/warehouses) clusters. See [Scaling](/self-hosting/configuration/scaling#clickhouse-read-only-url). | +| Variable | Required / Default | Description | +| --------------------------------------------- | ------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `CLICKHOUSE_MIGRATION_URL` | Required | Migration URL (TCP protocol) for the ClickHouse instance. Pattern: `clickhouse://:(9000/9440)` | +| `CLICKHOUSE_MIGRATION_SSL` | `false` | Set to true to establish an SSL connection to ClickHouse for the database migration. | +| `CLICKHOUSE_URL` | Required | Hostname of the ClickHouse instance. Pattern: `http(s)://:(8123/8443)` | +| `CLICKHOUSE_USER` | Required | Username of the ClickHouse database. Needs SELECT, ALTER, INSERT, CREATE, DELETE grants. | +| `CLICKHOUSE_PASSWORD` | Required | Password of the ClickHouse user. | +| `CLICKHOUSE_DB` | `default` | Name of the ClickHouse database to use. | +| `CLICKHOUSE_CLUSTER_ENABLED` | `true` | Whether to run ClickHouse commands `ON CLUSTER`. Set to `false` for single-container setups. | +| `LANGFUSE_AUTO_CLICKHOUSE_MIGRATION_DISABLED` | `false` | Whether to disable automatic ClickHouse migrations. | +| `CLICKHOUSE_READ_ONLY_URL` | | Optional read-only endpoint used for public-API reads and selected UI/filter read queries. Falls back to `CLICKHOUSE_URL` when unset. Reuses `CLICKHOUSE_USER`, `CLICKHOUSE_PASSWORD`, and `CLICKHOUSE_DB`. Only useful on [compute-compute separated](https://clickhouse.com/docs/cloud/reference/warehouses) clusters. See [Scaling](/self-hosting/configuration/scaling#clickhouse-read-only-url). | Langfuse uses `default` as the cluster name if CLICKHOUSE_CLUSTER_ENABLED is set to `true`. You can overwrite this by setting `CLICKHOUSE_CLUSTER_NAME` to a different value. @@ -76,6 +76,30 @@ GRANT DROP TABLE ON default.* TO 'user'; Replace `'user'` with your actual ClickHouse username and adjust the database name if you're using a different database than `default`. +## Direct ClickHouse Access for Custom Tools [#direct-clickhouse-access] + +Self-hosted deployments give you control over the underlying ClickHouse database. +You can query it directly for internal dashboards, audits, migrations, or one-off debugging. + +For production integrations and custom applications, prefer the [Public API](/docs/api-and-data-platform/features/public-api), [SDK query helpers](/docs/api-and-data-platform/features/query-via-sdk), [MCP server](/docs/api-and-data-platform/features/mcp-server), or [Blob Storage Export](/docs/api-and-data-platform/features/export-to-blob-storage). +These interfaces are the compatibility targets across Langfuse releases. + + + +The ClickHouse schema is not a stable API contract. +Major Langfuse upgrades, background migrations, and performance work such as [Simplify Langfuse for Scale](/changelog/2026-03-10-simplify-for-scale) can change tables, columns, deduplication behavior, or join patterns. +Custom queries that read `traces`, `observations`, `scores`, or internal materialized views should be validated as part of every Langfuse upgrade. + + + +If you do query ClickHouse directly: + +- Use a dedicated read-only user or a dedicated read-only compute group where available. +- Keep direct analytical traffic away from the primary ingestion path. On ClickHouse Cloud or BYOC, use separate compute groups and configure `CLICKHOUSE_READ_ONLY_URL` for supported Langfuse read paths, especially Public API and filter/helper reads. +- Always include project and time filters where possible. Langfuse tracing data is optimized around project and time access patterns. +- Do not write directly to Langfuse tables. Use the Public API or SDKs for creating and updating Langfuse data. +- If a missing API filter or field forces you to query ClickHouse directly, please open a [GitHub issue](https://github.com/langfuse/langfuse/issues) with your use case; APIs are the preferred long-term extension point. + ## Deployment Options This section covers different deployment options and provides example environment variables. @@ -93,7 +117,7 @@ ClickHouse Cloud clusters will be provisioned outside your cloud environment and If you need the operational model of ClickHouse Cloud while keeping the ClickHouse data plane in your own cloud account, consider [ClickHouse BYOC](https://clickhouse.com/cloud/bring-your-own-cloud). BYOC is a fully managed ClickHouse Cloud deployment on infrastructure in your cloud account and is designed for large-scale deployments with strict data residency, compliance, or VPC-boundary requirements. -We recommend ClickHouse Cloud or BYOC for larger Langfuse deployments because they provide cloud-native scaling primitives that are not available in the self-managed OSS ClickHouse setup used by Langfuse. ClickHouse Cloud and BYOC separate storage from compute through [SharedMergeTree](https://clickhouse.com/docs/cloud/reference/shared-merge-tree), which helps scale compute independently of stored data, reduces replica storage overhead, and avoids manual shard planning for growth. They also support compute-compute separation through [warehouses](https://clickhouse.com/docs/cloud/reference/warehouses), so you can isolate ingestion writes, UI reads, analytical queries, or ad-hoc workloads on separate compute groups that share the same data but do not compete for the same CPU and memory. Langfuse can use this pattern via `CLICKHOUSE_READ_ONLY_URL` for read-heavy UI and public-API traffic. +We recommend ClickHouse Cloud or BYOC for larger Langfuse deployments because they provide cloud-native scaling primitives that are not available in the self-managed OSS ClickHouse setup used by Langfuse. ClickHouse Cloud and BYOC separate storage from compute through [SharedMergeTree](https://clickhouse.com/docs/cloud/reference/shared-merge-tree), which helps scale compute independently of stored data, reduces replica storage overhead, and avoids manual shard planning for growth. They also support compute-compute separation through [warehouses](https://clickhouse.com/docs/cloud/reference/warehouses), so you can isolate ingestion writes, supported Langfuse reads, analytical queries, or ad-hoc workloads on separate compute groups that share the same data but do not compete for the same CPU and memory. Langfuse can use this pattern via `CLICKHOUSE_READ_ONLY_URL` for public-API reads and selected UI/filter read traffic. If you need assistance or want to talk to the ClickHouse team, you can reach out to them [here](https://clickhouse.com/company/contact). diff --git a/content/self-hosting/security/data-masking.mdx b/content/self-hosting/security/data-masking.mdx index 1f3cf01632..732d0648c9 100644 --- a/content/self-hosting/security/data-masking.mdx +++ b/content/self-hosting/security/data-masking.mdx @@ -42,7 +42,7 @@ For comprehensive documentation including code examples, advanced patterns, and --- -## Server-Side Ingestion Masking (EE) +## Server-Side Ingestion Masking (EE) [#server-side-ingestion-masking-ee] This feature requires an Enterprise license. Please add your [license @@ -52,6 +52,14 @@ For comprehensive documentation including code examples, advanced patterns, and Server-side ingestion masking allows self-hosted Langfuse administrators to define custom callback logic for masking or redacting sensitive data from tracing events as they are ingested. This feature operates at the Langfuse Worker container level, providing centralized data masking across all clients. + + +Server-side masking is a centralized safety net, not a replacement for client-side masking when sensitive data must never leave the application boundary. +In the self-hosted ingestion pipeline, events are written to the event blob storage bucket before the Worker calls the masking callback. +The callback masks data before it is processed into ClickHouse and downstream Langfuse views. + + + **Key benefits:** - Single point of configuration for all tracing data @@ -134,19 +142,19 @@ Your callback must return: The response object must maintain the same structure as the input. - Only modify the values you want to mask—do not add, remove, or rename fields. + Only modify the values you want to mask—do not add, remove, or rename fields. Langfuse parses the callback response as JSON and expects the OpenTelemetry shape during downstream processing, but it does not run a separate structural validation at the callback boundary. #### Error Handling Error handling behavior is configured via `LANGFUSE_INGESTION_MASKING_CALLBACK_FAIL_CLOSED`: -| Scenario | Fail Closed (default) | Fail Open | -| ----------------------- | ----------------------------- | ---------------------------------------- | -| Callback timeout | Event dropped, warning logged | Event processed unmasked, warning logged | -| HTTP error (4xx, 5xx) | Event dropped, warning logged | Event processed unmasked, warning logged | -| Invalid response schema | Event dropped, warning logged | Event processed unmasked, warning logged | -| Network error | Event dropped, warning logged | Event processed unmasked, warning logged | +| Scenario | Fail open (default, `false`) | Fail closed (`true`) | +| --------------------- | ---------------------------------------- | ----------------------------- | +| Callback timeout | Event processed unmasked, warning logged | Event dropped, warning logged | +| HTTP error (4xx, 5xx) | Event processed unmasked, warning logged | Event dropped, warning logged | +| Invalid JSON response | Event processed unmasked, warning logged | Event dropped, warning logged | +| Network error | Event processed unmasked, warning logged | Event dropped, warning logged | ### Limitations @@ -232,7 +240,7 @@ LANGFUSE_INGESTION_MASKING_CALLBACK_FAIL_CLOSED=true - Events are being dropped unexpectedly 1. Check that your masking service is responding within the configured timeout. 2. Verify the response schema matches the input schema exactly. - 3. Review Langfuse Web container logs for warning messages. + 3. Review Langfuse Worker container logs for warning messages. 4. Temporarily set `LANGFUSE_INGESTION_MASKING_CALLBACK_FAIL_CLOSED=false` to diagnose issues. - High latency on trace ingestion 1. Monitor your masking service response times. @@ -240,6 +248,6 @@ LANGFUSE_INGESTION_MASKING_CALLBACK_FAIL_CLOSED=true 3. Optimize your masking logic or add caching where appropriate. 4. Ensure network latency between Langfuse and your masking service is minimal. - Masking not being applied - 1. Verify `LANGFUSE_INGESTION_MASKING_CALLBACK_URL` is correctly set on the Langfuse Web container. - 2. Check that your masking service is reachable from the Langfuse Web container. + 1. Verify `LANGFUSE_INGESTION_MASKING_CALLBACK_URL` is correctly set on the Langfuse Worker container. + 2. Check that your masking service is reachable from the Langfuse Worker container. 3. Ensure your masking logic is correctly modifying the data and returning it.