feat(lookml): distinct/post-SQL measures and fiscal timeframes#209
Conversation
Add support for several LookML measure types and dimension_group timeframes that the adapter previously dropped or mishandled: - Distinct aggregate measures (sum_distinct, average_distinct, median_distinct, percentile_distinct), emitted as derived measures with explicit DISTINCT aggregation and honoring sql_distinct_key (preserved in meta). - Post-SQL / table-calculation measures: running_total maps to a cumulative metric over the base measure; percent_of_total and percent_of_previous map to derived window-function measures. - Non-standard / fiscal dimension_group timeframes: fiscal_quarter, fiscal_year, fiscal_month_num, fiscal_quarter_of_year, day_of_week, day_of_week_index, month_name, month_num, week_of_year, quarter_of_year, hour_of_day, day_of_month, day_of_year and related, producing numeric/categorical dimensions with portable extraction SQL while truncation timeframes remain time dimensions.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f51863230d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…cal timeframes
- sum_distinct/average_distinct with sql_distinct_key now dedupe by the key
entity via a symmetric aggregate instead of SUM(DISTINCT value), so two
distinct entities sharing a value are both counted (previously collapsed).
Uses DECIMAL casts to avoid float overflow on the hash offset.
- percent_of_total / percent_of_previous qualify their base measure ref with
{model} and wrap it in the base measure's own aggregate, so the generator
resolves it to the base measure's _raw CTE column. Previously the emitted SQL
referenced a bare, out-of-scope column and failed to compile.
- fiscal_quarter / fiscal_year honor fiscal_month_offset by shifting the
timestamp before the calendar truncation, so non-calendar fiscal years bucket
into correct fiscal periods instead of calendar boundaries.
Adds end-to-end regression tests that execute the compiled SQL against DuckDB.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: faa2b0dbe7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…nct sums, count_distinct post-SQL base - percentile_distinct now emits a parseable ordered-set PERCENTILE_CONT instead of the SQLGlot-rejected ORDER BY DISTINCT form, so the metric compiles and runs - keyed sum_distinct/average_distinct bound the HASH offset (% 1<<61) so the DECIMAL symmetric aggregate no longer overflows past ~100 distinct keys - percent_of_total/percent_of_previous over a count_distinct base measure now wrap the reference in COUNT(DISTINCT ...) via aggregate templates
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: df1915a0d6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
median_distinct and percentile_distinct with a sql_distinct_key emitted a plain ordered-set aggregate over the fanned-out rows, so a value repeated across joined rows skewed the quantile. Collapse (key, value) pairs to one value per distinct key via LIST + LIST_DISTINCT, then take the continuous quantile of that per-key list, matching Looker's fan-out-safe semantics.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 00321df7a4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
What changed
Extends the LookML import adapter (
sidemantic/adapters/lookml.py) to support several upstream measure types anddimension_grouptimeframes that were previously dropped or mishandled. Verified against the Looker LookML parameter reference (param-measure-types, param-field-dimension-group).Distinct aggregate measures
sum_distinct,average_distinct,median_distinct,percentile_distinctnow import as derived measures with an explicitDISTINCTaggregation (SUM(DISTINCT ...),AVG(DISTINCT ...),MEDIAN(DISTINCT ...),PERCENTILE_CONT(p) WITHIN GROUP (ORDER BY DISTINCT ...)). Thesql_distinct_keyparameter (which defines the unique entity used to avoid double counting under join fanout) is resolved and preserved in the metricmetaso the de-duplication entity is not lost.percentile_distincthonors thepercentile:parameter.Post-SQL / table-calculation measures
running_totalmaps to acumulativemetric over the referenced base measure.percent_of_totalmaps to a derived window measure:base / NULLIF(SUM(base) OVER (), 0).percent_of_previousmaps to a derived window measure usingLAG(...) OVER ().The referenced base measure (
sql: ${measure}) is resolved to its dependency name, and the originating table-calculation kind is recorded inmeta.Non-standard / fiscal dimension_group timeframes
Timeframes that truncate a timestamp (
time,date,week,month,quarter,year, plusminute/secondvariants andfiscal_quarter/fiscal_year) remaintype: timedimensions with the appropriate granularity. "Extracted part" timeframes now produce numeric or categorical dimensions with portable DuckDB-compatible extraction SQL:month_num,week_of_year,quarter_of_year,hour_of_day,day_of_month,day_of_year,day_of_week_index,fiscal_month_num,fiscal_quarter_of_year.day_of_week,month_name.fiscal_month_num/fiscal_quarter_of_yearhonorfiscal_month_offsetwhen present (defaulting to a calendar offset of 0).Why
An audit flagged these LookML constructs as unsupported. They appear in real-world Looker blocks already present in the test fixtures (e.g. GA360, redshift admin, pylookml), where extracted-part timeframes were previously coerced to day-granularity time dimensions and distinct/post-SQL measures were dropped or mis-typed.
Tests
New fixture
tests/fixtures/lookml/advanced_measure_types.lkmlandtests/adapters/lookml/test_advanced_measure_types.pycover each new measure type and timeframe, including end-to-end correctness of distinct de-duplication and timeframe extraction. All existing LookML tests remain green; backward compatibility for previously supported formats/fields is preserved.Known limitations
<AGG>(DISTINCT expr), which de-duplicates by the aggregated expression itself;sql_distinct_keyis preserved inmetafor fidelity but is not woven into a hash-based dedup of the exact form Looker generates. This is correct for the common case where the aggregated value is unique per entity.percent_of_total/percent_of_previoususe unpartitionedOVER ()window frames (column-wide), matching Looker's default table-calculation scope; query-time partitioning is not inferred.