-
Notifications
You must be signed in to change notification settings - Fork 234
chore: improvements to clickhouse and data masking #2985
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -42,7 +42,7 @@ For comprehensive documentation including code examples, advanced patterns, and | |
|
|
||
| --- | ||
|
|
||
| ## Server-Side Ingestion Masking (EE) | ||
| ## Server-Side Ingestion Masking (EE) [#server-side-ingestion-masking-ee] | ||
|
|
||
| <Callout type="info"> | ||
| This feature requires an Enterprise license. Please add your [license | ||
|
|
@@ -52,6 +52,14 @@ For comprehensive documentation including code examples, advanced patterns, and | |
| Server-side ingestion masking allows self-hosted Langfuse administrators to define custom callback logic for masking or redacting sensitive data from tracing events as they are ingested. | ||
| This feature operates at the Langfuse Worker container level, providing centralized data masking across all clients. | ||
|
|
||
| <Callout type="warning"> | ||
|
|
||
| Server-side masking is a centralized safety net, not a replacement for client-side masking when sensitive data must never leave the application boundary. | ||
| In the self-hosted ingestion pipeline, events are written to the event blob storage bucket before the Worker calls the masking callback. | ||
| The callback masks data before it is processed into ClickHouse and downstream Langfuse views. | ||
|
|
||
| </Callout> | ||
|
Comment on lines
+55
to
+61
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The new warning correctly flags that events land in blob storage before the masking callback runs, but it doesn't say whether that unmasked blob-storage copy is ever replaced or deleted after the Worker processes the masked version. Looking at the sequence diagram (SDK → Web → S3 (unmasked) → Worker → Callback → ClickHouse), there is no "update S3" or "delete from S3" step. If unmasked data persists in blob storage indefinitely, users deploying this feature for GDPR, HIPAA, or PCI DSS compliance may believe their storage layer is protected when only ClickHouse and downstream views receive masked data. A single sentence clarifying the blob-storage retention behavior (e.g., whether it is retained as-is, or whether client-side masking is required to prevent PII reaching blob storage at all) would prevent a compliance blind-spot for operators who read this warning and draw the wrong conclusion. Prompt To Fix With AIThis is a comment left during a code review.
Path: content/self-hosting/security/data-masking.mdx
Line: 55-61
Comment:
**Blob storage data fate after masking is unspecified**
The new warning correctly flags that events land in blob storage *before* the masking callback runs, but it doesn't say whether that unmasked blob-storage copy is ever replaced or deleted after the Worker processes the masked version. Looking at the sequence diagram (SDK → Web → S3 (unmasked) → Worker → Callback → ClickHouse), there is no "update S3" or "delete from S3" step. If unmasked data persists in blob storage indefinitely, users deploying this feature for GDPR, HIPAA, or PCI DSS compliance may believe their storage layer is protected when only ClickHouse and downstream views receive masked data. A single sentence clarifying the blob-storage retention behavior (e.g., whether it is retained as-is, or whether client-side masking is required to prevent PII reaching blob storage at all) would prevent a compliance blind-spot for operators who read this warning and draw the wrong conclusion.
How can I resolve this? If you propose a fix, please make it concise. |
||
|
|
||
| **Key benefits:** | ||
|
|
||
| - Single point of configuration for all tracing data | ||
|
|
@@ -134,19 +142,19 @@ Your callback must return: | |
|
|
||
| <Callout type="warning"> | ||
| The response object must maintain the same structure as the input. | ||
| Only modify the values you want to mask—do not add, remove, or rename fields. | ||
| Only modify the values you want to mask—do not add, remove, or rename fields. Langfuse parses the callback response as JSON and expects the OpenTelemetry shape during downstream processing, but it does not run a separate structural validation at the callback boundary. | ||
| </Callout> | ||
|
|
||
| #### Error Handling | ||
|
|
||
| Error handling behavior is configured via `LANGFUSE_INGESTION_MASKING_CALLBACK_FAIL_CLOSED`: | ||
|
|
||
| | Scenario | Fail Closed (default) | Fail Open | | ||
| | ----------------------- | ----------------------------- | ---------------------------------------- | | ||
| | Callback timeout | Event dropped, warning logged | Event processed unmasked, warning logged | | ||
| | HTTP error (4xx, 5xx) | Event dropped, warning logged | Event processed unmasked, warning logged | | ||
| | Invalid response schema | Event dropped, warning logged | Event processed unmasked, warning logged | | ||
| | Network error | Event dropped, warning logged | Event processed unmasked, warning logged | | ||
| | Scenario | Fail open (default, `false`) | Fail closed (`true`) | | ||
| | --------------------- | ---------------------------------------- | ----------------------------- | | ||
| | Callback timeout | Event processed unmasked, warning logged | Event dropped, warning logged | | ||
| | HTTP error (4xx, 5xx) | Event processed unmasked, warning logged | Event dropped, warning logged | | ||
| | Invalid JSON response | Event processed unmasked, warning logged | Event dropped, warning logged | | ||
| | Network error | Event processed unmasked, warning logged | Event dropped, warning logged | | ||
|
|
||
| ### Limitations | ||
|
|
||
|
|
@@ -232,14 +240,14 @@ LANGFUSE_INGESTION_MASKING_CALLBACK_FAIL_CLOSED=true | |
| - Events are being dropped unexpectedly | ||
| 1. Check that your masking service is responding within the configured timeout. | ||
| 2. Verify the response schema matches the input schema exactly. | ||
| 3. Review Langfuse Web container logs for warning messages. | ||
| 3. Review Langfuse Worker container logs for warning messages. | ||
| 4. Temporarily set `LANGFUSE_INGESTION_MASKING_CALLBACK_FAIL_CLOSED=false` to diagnose issues. | ||
| - High latency on trace ingestion | ||
| 1. Monitor your masking service response times. | ||
| 2. Consider increasing `LANGFUSE_INGESTION_MASKING_CALLBACK_TIMEOUT_MS` if your masking logic requires more time. | ||
| 3. Optimize your masking logic or add caching where appropriate. | ||
| 4. Ensure network latency between Langfuse and your masking service is minimal. | ||
| - Masking not being applied | ||
| 1. Verify `LANGFUSE_INGESTION_MASKING_CALLBACK_URL` is correctly set on the Langfuse Web container. | ||
| 2. Check that your masking service is reachable from the Langfuse Web container. | ||
| 1. Verify `LANGFUSE_INGESTION_MASKING_CALLBACK_URL` is correctly set on the Langfuse Worker container. | ||
| 2. Check that your masking service is reachable from the Langfuse Worker container. | ||
| 3. Ensure your masking logic is correctly modifying the data and returning it. | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔴 The CLICKHOUSE_READ_ONLY_URL description is narrowed here (lines 37 and 120) to 'public-API reads and selected UI/filter read queries', but two sibling docs still carry the original broader wording and now contradict this page:
content/self-hosting/configuration/index.mdx:25(master env-var index) still says 'UI and public-API read queries on traces, observations, scores, and sessions', andcontent/self-hosting/configuration/scaling.mdx:77— the page this row links to as 'See [Scaling]' — still says 'route UI and public-API read queries to the given endpoint'. Please update those two locations to match the new narrower wording so users following the link don't get conflicting answers about what the env var actually routes.Extended reasoning...
The bug
This PR intentionally narrows the description of
CLICKHOUSE_READ_ONLY_URLincontent/self-hosting/deployment/infrastructure/clickhouse.mdxin two places:But the same variable is described in two other docs that were not touched and now disagree with the new wording.
Where the contradictions live
content/self-hosting/configuration/index.mdx:25— the master self-hosting env-var index table — still says:content/self-hosting/configuration/scaling.mdx:72-77— the deep-dive page that this PR's new row explicitly links to as the canonical reference viaSee [Scaling](/self-hosting/configuration/scaling#clickhouse-read-only-url)— still says:Step-by-step proof a reader gets contradictory information
/self-hosting/deployment/infrastructure/clickhouse.mdxand reads the new row at line 37: only public-API reads and selected UI/filter read queries are routed.See [Scaling](/self-hosting/configuration/scaling#clickhouse-read-only-url)to learn more./self-hosting/configuration, they see yet a third variant of the same description (broader scope plus a specific entity list 'traces, observations, scores, and sessions' that the new wording dropped).Impact
Documentation-only inconsistency, no runtime behavior change. But it is directly caused by this PR (the prior state of the docs was self-consistent at the broader wording), and the inconsistency is one hop away from the edited row via a link the PR itself places. A user trying to decide whether to provision a separate ClickHouse read compute group gets three different answers about what reads will be served from it.
Fix
Update the two sibling occurrences to the narrowed wording introduced here:
content/self-hosting/configuration/index.mdx:25— replace the env-var row description with wording consistent with the new clickhouse.mdx text.content/self-hosting/configuration/scaling.mdx:72-77— adjust 'dashboard and public-API read traffic' and 'route UI and public-API read queries' to reflect that only the public API plus selected UI/filter reads are routed.Decide whether the dropped entity list ('traces, observations, scores, and sessions') should be retained anywhere, or whether 'selected UI/filter read queries' is sufficiently specific in the new framing.