Skip to content

feat(solana): add address_prefix predicates for account_activity partition pruning#9670

Merged
jeff-dude merged 2 commits into
mainfrom
andre/dwh-642-account-activity-prefix-predicate
May 14, 2026
Merged

feat(solana): add address_prefix predicates for account_activity partition pruning#9670
jeff-dude merged 2 commits into
mainfrom
andre/dwh-642-account-activity-prefix-predicate

Conversation

@a-monteiro
Copy link
Copy Markdown
Member

@a-monteiro a-monteiro commented May 14, 2026

solana.account_activity is partitioned by (year, month, address_prefix) where address_prefix = substring(address, 1, 2). Trino 479 cannot derive this predicate from address = '<literal>' (trinodb/trino#19455), so the planner enumerates all 3,364 prefix partitions per month and frequently hits the 600s query.max-planning-time ceiling — failure rate climbed from 3.8% in April to 76.2% by 2026-05-10, which is what caused the war room and DWH-642.

This inlines an explicit address_prefix predicate in 29 dbt models with hardcoded fee receiver addresses (20 single-address + 9 multi-address bot models, plus the lido Solana stSOL income query). Variable-based addresses use the Jinja slice form {{ var[:2] }} so the prefix stays in sync with the underlying address literal.

EXPLAIN ANALYZE on prod Trino over a 1-day window shows ~2.0x speedup for pepe_boost (single literal) and ~1.7x for banana_gun (8-address IN list). Splits generation drops from 80–130s to ~30s; physical I/O is unchanged (Parquet stats on address were already pruning files), confirming planner enumeration was the bottleneck.

Six models are intentionally excluded because the simple compile-time prefix derivation does not apply:

  • phantom_swapper_solana_fee_payments_raw — joins against 831k dynamic referral accounts spanning all 3,364 prefixes (Jupiter REFER4Zg... program)
  • readyswap_solana_bot_trades and bonkbot_solana_fee_payments_raw — dual SOL/SPL path where the second branch matches on token_balance_owner while address is the token account, so a single address_prefix constraint would drop valid token rows
  • base_app_swapper_solana_stg_sol_payments, ..._stg_token_payments_fees_paid, ..._stg_token_payments_fees_claimed — owner-based filtering with the same address vs. owner-account distinction

These will be handled in follow-ups, along with the underlying partition redesign (block_date partitioning + ZORDER on full address) needed for the ~93% of ad-hoc queries that filter on address directly.

Fixes DWH-642

…ition pruning

solana.account_activity is partitioned by (year, month, address_prefix) where
address_prefix = substring(address, 1, 2). Trino 479 cannot derive this
predicate from address = '<literal>' (trinodb/trino#19455), so partition
enumeration during planning scans all 3,364 prefix partitions and frequently
hits the 600s query.max-planning-time ceiling.

Introduces a shared account_activity_prefix_filter macro that emits the
address_prefix predicate from compile-time address literals, and applies it
to 29 dbt models with hardcoded fee receiver addresses (20 single-address
and 9 multi-address bot models, plus the lido Solana stSOL income query).

Benchmarks on prod Trino (EXPLAIN ANALYZE, 1-day window) show ~2x speedup
for the previously-failing pepe_boost and ~1.7x for the multi-address
banana_gun pattern, with splits generation dropping from 80-130s to ~30s
and physical I/O unchanged.

Six models cannot use this approach (phantom_swapper, readyswap, bonkbot
fee_payments_raw, base_app x3) because they either match the entire prefix
space via dynamic subqueries or use token_balance_owner JOINs where the
account_activity row's address column is a token account, not the fee
receiver. These will be handled separately.
Copy link
Copy Markdown
Member Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@github-actions github-actions Bot added WIP work in progress dbt: solana covers the Solana dbt subproject dbt: hourly covers the hourly dbt subproject labels May 14, 2026
@a-monteiro
Copy link
Copy Markdown
Member Author

@jeff-dude @0xRobin Can we review/sanity check plz?

@0xRobin
Copy link
Copy Markdown
Contributor

0xRobin commented May 14, 2026

Trino 479 cannot derive this predicate from address = '' so the planner enumerates all 3,364 prefix partitions per month and frequently hits timeouts.

wait what? This means the prefix partitioning is essentially useless unless explicitly filtered on?
Why don't min/max partition stats don't prune the partitionset correctly?

Copy link
Copy Markdown
Contributor

@0xRobin 0xRobin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using a macro for this is a bit of an anti-pattern introducing more coupling then necessary.
Would prefer if we just inline the logic explicitly in every model without to much jinja.

and address_prefix = '32'
and address_prefix in ('Zc', 'cb', '3G')
and address_prefix = '{{ address_variable[:2] }}'

@a-monteiro
Copy link
Copy Markdown
Member Author

a-monteiro commented May 14, 2026

Trino 479 cannot derive this predicate from address = '' so the planner enumerates all 3,364 prefix partitions per month and frequently hits timeouts.

wait what? This means the prefix partitioning is essentially useless unless explicitly filtered on? Why don't min/max partition stats don't prune the partitionset correctly?

min/max work for file skipping not partition skipping. Processing the delta log is still a problem in this case.

I'll show you the plans later

Per review feedback, removes the account_activity_prefix_filter macro and
inlines each predicate directly in the model. Variable-based addresses use
the Jinja slice form {{ var[:2] }} so the prefix stays in sync with the
underlying address literal; lido uses a hardcoded 'CY' since the address
is already inline in the SQL.
@a-monteiro a-monteiro requested a review from 0xRobin May 14, 2026 07:09
@a-monteiro
Copy link
Copy Markdown
Member Author

@0xRobin removed the macro

@a-monteiro a-monteiro marked this pull request as ready for review May 14, 2026 07:10
@cursor
Copy link
Copy Markdown

cursor Bot commented May 14, 2026

PR Summary

Low Risk
Low risk: changes are additional WHERE-clause filters derived from existing literal addresses to improve query planning, with minimal chance of altering results unless an address/prefix mismatch is introduced.

Overview
Adds explicit address_prefix predicates alongside existing address = ... filters across Solana bot trade/user dbt models and Lido’s Solana stSOL income query to force partition pruning on solana.account_activity.

Multi-fee-receiver models now also constrain address_prefix via IN (...) lists, with prefixes derived from the same Jinja address variables (e.g., {{ fee_receiver[:2] }}) to keep them consistent.

Reviewed by Cursor Bugbot for commit a8f60ee. Configure here.

@github-actions github-actions Bot added ready-for-review this PR development is complete, please review and removed WIP work in progress labels May 14, 2026
Copy link
Copy Markdown
Contributor

@0xRobin 0xRobin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeff-dude can you handle the deploy of this as amonit?

@jeff-dude jeff-dude merged commit a4967fd into main May 14, 2026
8 of 9 checks passed
@jeff-dude jeff-dude deleted the andre/dwh-642-account-activity-prefix-predicate branch May 14, 2026 14:42
@github-actions github-actions Bot locked and limited conversation to collaborators May 14, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

dbt: hourly covers the hourly dbt subproject dbt: solana covers the Solana dbt subproject ready-for-review this PR development is complete, please review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants