Skip to content

fix: prevent fatal Sentry escalations from /api/anchor IDL fetch#994

Merged
Woody4618 merged 8 commits into
solana-foundation:masterfrom
hoodieshq:fix-anchor-idl-fatals
Jun 1, 2026
Merged

fix: prevent fatal Sentry escalations from /api/anchor IDL fetch#994
Woody4618 merged 8 commits into
solana-foundation:masterfrom
hoodieshq:fix-anchor-idl-fatals

Conversation

@askov

@askov askov commented May 12, 2026

Copy link
Copy Markdown
Collaborator

Description

/api/anchor routed every IDL-fetch failure through Logger.panic, producing a steady stream of fatal-level Sentry events. The two largest sources we observed:

  1. JSON-RPC Internal error returned for the derived IDL PDA of built-in programs (notably the Simd-296 testnet) instead of a clean null. Every transaction touching System / Token / ComputeBudget / Vote / Stake on that cluster paged.
  2. A mainnet program (C7QLEmDz81Usvy2sYa4xZSdA8EwEcYvZo8iuYZMaqXmj) whose IDL bytes are present but undecodable — JSON inflation/parse threw a non-buffer-bounds error and tripped the catch-all panic branch.

This PR replaces the AnchorProvider / Program.fetchIdl path with a direct @solana/kit RPC call plus a manual Anchor IDL decode, and classifies the four real outcomes explicitly:

Outcome Status Body Cached
Parsed Anchor IDL 200 { "idl": { ... } } yes, 1h
Not an Anchor program / undecodable 200 { "idl": null } yes, 1h
Transient upstream RPC error 502 { "error": "Upstream RPC error" } no, Logger.warn (not escalated)
Truly unexpected error 502 { "error": "Failed to fetch IDL" } no, Logger.panic

Key behavioural changes:

  • Denylist short-circuit for well-known non-Anchor programs (System, Token, Token-2022, ATA, ComputeBudget, Memo v1/v2, Vote, Stake, Config, Ed25519/Secp256k1 sig-verify, AddressLookupTable, ZkToken/ZkE1Gama, BPF/Move/Native loaders). The route returns { idl: null } before any RPC call, so an upstream that errors instead of returning null for the IDL PDA can no longer escalate.
  • SolanaError from the RPC client is treated as a transient 502 with Logger.warn, not a panic. App layer can't act on it, but we don't escalate either.
  • Account bytes present but undecodable (sub-discriminator length, malformed borsh, bad zlib, bad JSON, wrong IDL shape) are logged at warn and cached as { idl: null }, so the client never re-fetches them.

Bonus refactors that ride along:

  • Client-side cache in useIdlFromAnchorProgramSeed replaced with useSWRImmutable + suspense. The old hand-rolled module-scope promise cache had a race where two concurrent callers for the same program could each fire a request; SWR dedupes natively. The hook also now logs non-OK API responses (previously dropped silently).
  • clusterFromParam helper extracted to app/entities/cluster/server.ts with a round-trip integer check. The old Number(value) in Cluster form let "" and " " coerce to Cluster.MainnetBeta.
  • toAddress / toPublicKey web3.js↔kit bridge helpers consolidated into app/shared/lib/web3js-compat.ts.

Type of change

  • Bug fix

Testing

Unit tests

  • app/api/anchor/__tests__/route.spec.ts — denylist short-circuit, no-account-at-PDA, valid IDL, garbage bytes, wrong-shape JSON, sub-discriminator-length, transient SolanaError, and unexpected-error escalation.
  • app/entities/cluster/lib/__tests__/cluster-from-param.spec.ts — known values, out-of-range, non-numeric inputs.

Manual verification

Open DevTools → Network, filter anchor, on the Vercel preview below.

Base URL: https://explorer-git-fix-anchor-idl-fatals-solana-foundation.vercel.app

1. Parsed Anchor IDL — expect 200 with parsed { idl: { ... } }, IDL card renders.

2. Denylisted built-ins — expect 200 with { idl: null }, no RPC roundtrip (server-side response sub-millisecond), no IDL card rendered.

3. Undecodable IDL — expect 200 with { idl: null }, server logs Logger.warn '[api:anchor] IDL account present but undecodable', no Sentry panic.

4. Built-in on Simd-296 testnet — expect 200 with { idl: null } instantly, regardless of whether the upstream RPC returns null cleanly or errors out (the denylist short-circuits before any RPC call).

5. Multi-instruction transaction (denylist + Anchor mixed) — expect one /api/anchor request per distinct programId, built-ins return 200/{idl:null} instantly, real Anchor programs return parsed IDLs, no 502s. Reloading the page serves all responses from HTTP cache.

  • Any recent Jupiter / Drift swap tx page, e.g. https://explorer-git-fix-anchor-idl-fatals-solana-foundation.vercel.app/tx/<sig>?cluster=mainnet-beta.

6. Bad-request sanity

  • GET https://explorer-git-fix-anchor-idl-fatals-solana-foundation.vercel.app/api/anchor400 { "error": "Invalid query params" }
  • GET https://explorer-git-fix-anchor-idl-fatals-solana-foundation.vercel.app/api/anchor?cluster=999&programAddress=…400 { "error": "Invalid cluster" }
  • GET https://explorer-git-fix-anchor-idl-fatals-solana-foundation.vercel.app/api/anchor?cluster=0&programAddress=not-a-pubkey400 { "error": "Invalid program address" }

Post-deploy verification

24h after release, in the related Sentry issue (SOLANA-EXPLORER-15 in our tracker):

  • Event count for the issue should plateau / stop accumulating.
  • New events (if any) should be level=error from the unexpected-error branch, not level=fatal from the old catch-all panic.
  • No events with extra.programAddress matching any address in section 2.

Related Issues

Closes HOO-531

Checklist

  • My code follows the project's style guidelines
  • I have added tests that prove my fix/feature works
  • All tests pass locally and in CI
  • I have run build:info script to update build information
  • CI/CD checks pass

@vercel

vercel Bot commented May 12, 2026

Copy link
Copy Markdown

@askov is attempting to deploy a commit to the Solana Foundation Team on Vercel.

A member of the Team first needs to authorize it.

@greptile-apps

greptile-apps Bot commented May 12, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR replaces the AnchorProvider/Program.fetchIdl path in /api/anchor with a direct @solana/kit RPC call plus manual Anchor IDL decode, classifying error outcomes explicitly to prevent fatal Sentry escalations from expected failure modes (built-in programs, undecodable IDL bytes, transient RPC errors).

  • API route rewrite: Adds a denylist for well-known non-Anchor programs (short-circuits before any RPC call), classifies SolanaError as transient vs. misconfiguration, and separates "account undecodable" from "unexpected error" with targeted Logger.warn/Logger.panic calls. A new clusterFromParam helper fixes a coercion bug where \"\" could silently resolve to Cluster.MainnetBeta.
  • Client-side hook refactor: useIdlFromAnchorProgramSeed replaces the hand-rolled module-scope promise cache with useSWRImmutable, returning { idl, isLoading } and propagating loading state through useAnchorProgramIdlCard so a LoadingCard is shown instead of the "no IDL" message during fetch.

Confidence Score: 5/5

Safe to merge. The error-classification logic is correct, the denylist short-circuit is well-tested, and the IDL decode path handles all the failure modes that caused the original panics.

The route rewrite is thoroughly tested with unit tests covering every classified outcome (denylist, null account, valid IDL, garbage bytes, wrong-shape JSON, sub-discriminator-length, transient vs. misconfig SolanaErrors, and unexpected throws). The clusterFromParam round-trip guard, the SWR migration, and the isLoading propagation through IdlCard all behave correctly. No incorrectly classified error paths or data-loss scenarios were found.

No files require special attention.

Important Files Changed

Filename Overview
app/api/anchor/route.ts Core route rewrite: direct RPC fetch, explicit error classification (transient vs. misconfig), denylist short-circuit, and decode-failure handling. Logic is correct and well-tested.
app/api/anchor/config.ts New config module exporting the NON_ANCHOR_PROGRAMS denylist and TRANSIENT_RPC_ERROR_CODES set; well-commented and easy to extend.
app/api/anchor/tests/route.spec.ts Comprehensive test suite covering denylist, null account, valid IDL, garbage bytes, wrong-shape JSON, sub-discriminator-length, transient RPC errors, and unexpected errors.
app/entities/cluster/lib/cluster-from-param.ts New helper replacing the n in Cluster coercion bug; round-trip integer check correctly rejects empty strings, whitespace, and non-numeric inputs.
app/entities/idl/model/use-idl-from-anchor-program-seed.ts Replaced hand-rolled promise cache with useSWRImmutable; non-OK API responses are now logged. Returns { idl, isLoading } instead of just Idl
app/features/idl/ui/IdlCard.tsx Adds LoadingCard render while any IDL source is fetching; uses tabs.length > 0 guard to prevent "no IDL" flash during the useEffect tick before activeTabIndex is set.
app/shared/lib/web3js-compat.ts Consolidated bridge helpers (toKitAddress, toLegacyPublicKey, toKitInstruction) into a single file; replaces the deleted app/shared/model/address.ts.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[GET /api/anchor] --> B{Valid params?}
    B -- No --> C[400 Invalid query params / cluster / address]
    B -- Yes --> D{In NON_ANCHOR_PROGRAMS denylist?}
    D -- Yes --> E[200 idl: null + Cache-Control]
    D -- No --> F[fetchIdlAccountData via @solana/kit RPC]
    F -- SolanaError thrown --> G{classifySolanaError}
    G -- transient --> H[Logger.warn · 502 Upstream RPC error · no cache]
    G -- misconfig / unexpected --> I[Logger.panic · 502 Failed to fetch IDL · no cache]
    F -- Non-SolanaError thrown --> I
    F -- value: null --> J[200 idl: null + Cache-Control]
    F -- accountData bytes --> K[decodeIdl]
    K -- throws --> L[Logger.warn · 200 idl: null + Cache-Control]
    K -- valid Idl --> M[200 idl: ... + Cache-Control]
Loading

Reviews (9): Last reviewed commit: "fix(idl-card): gate loader on all IDL so..." | Re-trigger Greptile

Comment thread app/api/anchor/__tests__/route.spec.ts
Comment thread app/api/anchor/route.ts Outdated
Comment thread app/entities/idl/model/use-idl-from-anchor-program-seed.ts
askov added a commit to hoodieshq/explorer that referenced this pull request May 12, 2026
…d Address for denylist lookup

Greptile review comments (PR solana-foundation#994):
- Add the missing Logger.warn assertion to the sub-discriminator-length
  test, matching the other decode-failure tests in the file.
- Use the validated `programId` (branded Address) for the denylist
  Set.has() lookup instead of the raw query-string `programAddress`.
  Behaviour is identical today since @solana/kit's address() validates
  without normalising, but keeps the lookup consistent with the
  validated value.
@askov

askov commented May 12, 2026

Copy link
Copy Markdown
Collaborator Author

@greptile-apps please review

askov added a commit to hoodieshq/explorer that referenced this pull request May 12, 2026
…d Address for denylist lookup

Greptile review comments (PR solana-foundation#994):
- Add the missing Logger.warn assertion to the sub-discriminator-length
  test, matching the other decode-failure tests in the file.
- Use the validated `programId` (branded Address) for the denylist
  Set.has() lookup instead of the raw query-string `programAddress`.
  Behaviour is identical today since @solana/kit's address() validates
  without normalising, but keeps the lookup consistent with the
  validated value.
@askov askov force-pushed the fix-anchor-idl-fatals branch from a2c6c58 to 25c6faa Compare May 12, 2026 10:57
@vercel

vercel Bot commented May 12, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
explorer Ready Ready Preview, Comment May 29, 2026 8:58am

Request Review

@rogaldh

rogaldh commented May 12, 2026

Copy link
Copy Markdown
Collaborator

I haven't checked the implementation yet, but a bit worried about this case: Truly unexpected error.

Having the panics at the sentry allows us to see issues, such as a wrong api token, for example, early. I'd consider still keeping unexpected errors as panics, as they might show up potential fixable cases or allow reacting quickly to misconfiguration

@rogaldh

rogaldh commented May 12, 2026

Copy link
Copy Markdown
Collaborator

This PR addresses this one #693 partially

@askov

askov commented May 13, 2026

Copy link
Copy Markdown
Collaborator Author

I haven't checked the implementation yet, but a bit worried about this case: Truly unexpected error.

Having the panics at the sentry allows us to see issues, such as a wrong api token, for example, early. I'd consider still keeping unexpected errors as panics, as they might show up potential fixable cases or allow reacting quickly to misconfiguration

I've made the errors even more granular to skip everything non-actionable vividly

@rogaldh rogaldh left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread app/api/anchor/route.ts Outdated
askov added a commit to hoodieshq/explorer that referenced this pull request May 21, 2026
…d Address for denylist lookup

Greptile review comments (PR solana-foundation#994):
- Add the missing Logger.warn assertion to the sub-discriminator-length
  test, matching the other decode-failure tests in the file.
- Use the validated `programId` (branded Address) for the denylist
  Set.has() lookup instead of the raw query-string `programAddress`.
  Behaviour is identical today since @solana/kit's address() validates
  without normalising, but keeps the lookup consistent with the
  validated value.
@askov askov force-pushed the fix-anchor-idl-fatals branch from 2eda597 to 49461cd Compare May 21, 2026 08:10
askov added a commit to hoodieshq/explorer that referenced this pull request May 27, 2026
…d Address for denylist lookup

Greptile review comments (PR solana-foundation#994):
- Add the missing Logger.warn assertion to the sub-discriminator-length
  test, matching the other decode-failure tests in the file.
- Use the validated `programId` (branded Address) for the denylist
  Set.has() lookup instead of the raw query-string `programAddress`.
  Behaviour is identical today since @solana/kit's address() validates
  without normalising, but keeps the lookup consistent with the
  validated value.
@askov askov force-pushed the fix-anchor-idl-fatals branch from 4ed787b to 3c22e96 Compare May 27, 2026 14:24
askov added a commit to hoodieshq/explorer that referenced this pull request May 28, 2026
…d Address for denylist lookup

Greptile review comments (PR solana-foundation#994):
- Add the missing Logger.warn assertion to the sub-discriminator-length
  test, matching the other decode-failure tests in the file.
- Use the validated `programId` (branded Address) for the denylist
  Set.has() lookup instead of the raw query-string `programAddress`.
  Behaviour is identical today since @solana/kit's address() validates
  without normalising, but keeps the lookup consistent with the
  validated value.
@askov askov force-pushed the fix-anchor-idl-fatals branch from c48ebf8 to 1a284fc Compare May 28, 2026 10:12
askov added 4 commits May 29, 2026 15:02
The route previously routed every IDL-fetch failure through Logger.panic,
producing a steady stream of fatal Sentry events. The two largest sources
were JSON-RPC errors on the derived IDL PDA for built-in programs (notably
the Simd-296 testnet returning "Internal error" instead of null), and a
mainnet program whose IDL bytes were present but undecodable.

Replace AnchorProvider.fetchIdl with a direct @solana/kit RPC call plus
manual Anchor IDL decode, and classify the three real outcomes:

- 200 + { idl: ... }  parsed Anchor IDL (cached 1h)
- 200 + { idl: null } non-Anchor program / undecodable bytes (cached 1h)
- 502 + Upstream RPC error   transient SolanaError (not cached, warn-only)
- 502 + Failed to fetch IDL  truly unexpected (escalates to Sentry)

Add a denylist of well-known non-Anchor programs (System, Token, Token-2022,
ATA, ComputeBudget, Memo, Vote, Stake, Config, sig-verify, BPF/Move/Native
loaders, address-lookup-table, zk-token) so the route short-circuits before
any RPC call. This is what previously panicked when the upstream RPC
returned a JSON error instead of null for the built-ins' IDL PDA.

Account bytes that exist but are undecodable (truncated, bad borsh, bad
zlib, bad JSON, wrong shape) are logged at warn level and cached as
{ idl: null }, so the client never re-fetches them.

Client-side cache in useIdlFromAnchorProgramSeed replaced with
useSWRImmutable + suspense. The old hand-rolled module-scope promise cache
had a race where two concurrent callers for the same program could each
fire a request; SWR dedupes natively. The hook also now logs non-OK API
responses (previously dropped silently).

Add clusterFromParam helper with a round-trip integer check (the old
`Number(value) in Cluster` form let "" and " " coerce to MainnetBeta).
Lives in the entity's server.ts entry point per FSD convention.

Consolidate toAddress / toPublicKey web3.js<->kit bridge helpers into
app/shared/lib/web3js-compat.ts.
…d Address for denylist lookup

Greptile review comments (PR solana-foundation#994):
- Add the missing Logger.warn assertion to the sub-discriminator-length
  test, matching the other decode-failure tests in the file.
- Use the validated `programId` (branded Address) for the denylist
  Set.has() lookup instead of the raw query-string `programAddress`.
  Behaviour is identical today since @solana/kit's address() validates
  without normalising, but keeps the lookup consistent with the
  validated value.
`useSWRImmutable(key, fetcher, { suspense: true })` throws a promise on
first render with undefined data. The transaction inspector renders the
hook (via InstructionsSection) without an enclosing <Suspense> boundary,
so the suspense throw rolls back to the previous committed state —
`inspectorData = undefined` — and the page is stuck on the empty input
view. This broke `InspectorPage.spec.tsx` and `InspectorPage-SystemProgram.spec.tsx`
on CI.

(Subtlety: the spec file does vi.mock('swr', ...), but `swr/immutable`
resolves its own internal `import useSWR from 'swr'` via the pnpm-internal
module path, bypassing the mock. The real `useSWR` is what actually
runs, which is why suspense fires at all in the test.)

Drop `suspense: true`. The hook now returns `data ?? null` while the
fetch is in flight; cards already handle `null` by hiding the Anchor
tab. Existing <Suspense> boundaries on the address layout still cover
their other suspending children — they just no longer trigger a fallback
specifically for the IDL request.
…panic

The previous catch-all `isSolanaError → warn` branch silenced real
misconfiguration (wrong RPC token, missing API plan, header-stripping
proxy, `Method not found`) by lumping it in with the Simd-296 "Internal
error" storm the original fix was about.

Adds `classifySolanaError`: codes in a small allowlist of ephemeral
JSON-RPC server states stay transient (warn); `RPC__TRANSPORT_HTTP_ERROR`
is bucketed by `context.statusCode` (5xx and 429 transient, rest of 4xx
misconfig); everything else (auth, plan, method-not-found, malformed
responses, programmer-bug-class codes) falls through to `Logger.panic`
so misconfig pages Sentry on the first request instead of vanishing
into a stream of warns.
askov added 3 commits May 29, 2026 15:02
Follows the route+config convention used by app/api/receipt/price and
app/api/verification — keeps route.ts focused on request handling and
makes the denylist / transient-error codes importable from tests.
…loads

The suspense removal in 04eb0af collapsed "loading" and "no IDL" into the
same null return, so IdlCard flashed the empty "no IDL" warning before the
Anchor IDL resolved. Surface SWR's isLoading through useIdlFromAnchorProgramSeed
and useAnchorProgram (still no suspense, so the inspector's boundary-less
render tree stays intact) and gate the empty-state branch on it.
The previous loader fix only tracked isAnchorIdlLoading. For programs
whose IDL comes from Program Metadata (the default/preferred tab), the
Anchor fetch resolves to null first while Program Metadata is still in
flight, so IdlCard flashed the empty no-IDL state instead of the loader.

Surface isLoading from useProgramCanonicalMetadata through
useProgramMetadataIdl/useProgramMetadataCodamaIdl and gate the
empty-state branch on the combined loading flag.

Also fix a pre-existing tsc failure in the anchor route spec
(SOLANA_ERROR__RPC__API_PLAN_MISSING_FOR_RPC_METHOD context uses
method/params, not rpcMethod/rpcParams).
@Woody4618 Woody4618 merged commit 75ede83 into solana-foundation:master Jun 1, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants