feat(solana): add address_prefix predicates for account_activity partition pruning#9670
Conversation
…ition pruning solana.account_activity is partitioned by (year, month, address_prefix) where address_prefix = substring(address, 1, 2). Trino 479 cannot derive this predicate from address = '<literal>' (trinodb/trino#19455), so partition enumeration during planning scans all 3,364 prefix partitions and frequently hits the 600s query.max-planning-time ceiling. Introduces a shared account_activity_prefix_filter macro that emits the address_prefix predicate from compile-time address literals, and applies it to 29 dbt models with hardcoded fee receiver addresses (20 single-address and 9 multi-address bot models, plus the lido Solana stSOL income query). Benchmarks on prod Trino (EXPLAIN ANALYZE, 1-day window) show ~2x speedup for the previously-failing pepe_boost and ~1.7x for the multi-address banana_gun pattern, with splits generation dropping from 80-130s to ~30s and physical I/O unchanged. Six models cannot use this approach (phantom_swapper, readyswap, bonkbot fee_payments_raw, base_app x3) because they either match the entire prefix space via dynamic subqueries or use token_balance_owner JOINs where the account_activity row's address column is a token account, not the fee receiver. These will be handled separately.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
|
@jeff-dude @0xRobin Can we review/sanity check plz? |
wait what? This means the prefix partitioning is essentially useless unless explicitly filtered on? |
0xRobin
left a comment
There was a problem hiding this comment.
I think using a macro for this is a bit of an anti-pattern introducing more coupling then necessary.
Would prefer if we just inline the logic explicitly in every model without to much jinja.
and address_prefix = '32'
and address_prefix in ('Zc', 'cb', '3G')
and address_prefix = '{{ address_variable[:2] }}'
min/max work for file skipping not partition skipping. Processing the delta log is still a problem in this case. I'll show you the plans later |
Per review feedback, removes the account_activity_prefix_filter macro and
inlines each predicate directly in the model. Variable-based addresses use
the Jinja slice form {{ var[:2] }} so the prefix stays in sync with the
underlying address literal; lido uses a hardcoded 'CY' since the address
is already inline in the SQL.
|
@0xRobin removed the macro |
PR SummaryLow Risk Overview Multi-fee-receiver models now also constrain Reviewed by Cursor Bugbot for commit a8f60ee. Configure here. |
0xRobin
left a comment
There was a problem hiding this comment.
@jeff-dude can you handle the deploy of this as amonit?

solana.account_activityis partitioned by(year, month, address_prefix)whereaddress_prefix = substring(address, 1, 2). Trino 479 cannot derive this predicate fromaddress = '<literal>'(trinodb/trino#19455), so the planner enumerates all 3,364 prefix partitions per month and frequently hits the 600squery.max-planning-timeceiling — failure rate climbed from 3.8% in April to 76.2% by 2026-05-10, which is what caused the war room and DWH-642.This inlines an explicit
address_prefixpredicate in 29 dbt models with hardcoded fee receiver addresses (20 single-address + 9 multi-address bot models, plus the lido Solana stSOL income query). Variable-based addresses use the Jinja slice form{{ var[:2] }}so the prefix stays in sync with the underlying address literal.EXPLAIN ANALYZE on prod Trino over a 1-day window shows ~2.0x speedup for pepe_boost (single literal) and ~1.7x for banana_gun (8-address IN list). Splits generation drops from 80–130s to ~30s; physical I/O is unchanged (Parquet stats on
addresswere already pruning files), confirming planner enumeration was the bottleneck.Six models are intentionally excluded because the simple compile-time prefix derivation does not apply:
phantom_swapper_solana_fee_payments_raw— joins against 831k dynamic referral accounts spanning all 3,364 prefixes (Jupiter REFER4Zg... program)readyswap_solana_bot_tradesandbonkbot_solana_fee_payments_raw— dual SOL/SPL path where the second branch matches ontoken_balance_ownerwhileaddressis the token account, so a singleaddress_prefixconstraint would drop valid token rowsbase_app_swapper_solana_stg_sol_payments,..._stg_token_payments_fees_paid,..._stg_token_payments_fees_claimed— owner-based filtering with the sameaddressvs. owner-account distinctionThese will be handled in follow-ups, along with the underlying partition redesign (block_date partitioning + ZORDER on full
address) needed for the ~93% of ad-hoc queries that filter onaddressdirectly.Fixes DWH-642