fix: prevent fatal Sentry escalations from /api/anchor IDL fetch#994
Conversation
|
@askov is attempting to deploy a commit to the Solana Foundation Team on Vercel. A member of the Team first needs to authorize it. |
Greptile SummaryThis PR replaces the
Confidence Score: 5/5Safe to merge. The error-classification logic is correct, the denylist short-circuit is well-tested, and the IDL decode path handles all the failure modes that caused the original panics. The route rewrite is thoroughly tested with unit tests covering every classified outcome (denylist, null account, valid IDL, garbage bytes, wrong-shape JSON, sub-discriminator-length, transient vs. misconfig SolanaErrors, and unexpected throws). The clusterFromParam round-trip guard, the SWR migration, and the isLoading propagation through IdlCard all behave correctly. No incorrectly classified error paths or data-loss scenarios were found. No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[GET /api/anchor] --> B{Valid params?}
B -- No --> C[400 Invalid query params / cluster / address]
B -- Yes --> D{In NON_ANCHOR_PROGRAMS denylist?}
D -- Yes --> E[200 idl: null + Cache-Control]
D -- No --> F[fetchIdlAccountData via @solana/kit RPC]
F -- SolanaError thrown --> G{classifySolanaError}
G -- transient --> H[Logger.warn · 502 Upstream RPC error · no cache]
G -- misconfig / unexpected --> I[Logger.panic · 502 Failed to fetch IDL · no cache]
F -- Non-SolanaError thrown --> I
F -- value: null --> J[200 idl: null + Cache-Control]
F -- accountData bytes --> K[decodeIdl]
K -- throws --> L[Logger.warn · 200 idl: null + Cache-Control]
K -- valid Idl --> M[200 idl: ... + Cache-Control]
Reviews (9): Last reviewed commit: "fix(idl-card): gate loader on all IDL so..." | Re-trigger Greptile |
…d Address for denylist lookup Greptile review comments (PR solana-foundation#994): - Add the missing Logger.warn assertion to the sub-discriminator-length test, matching the other decode-failure tests in the file. - Use the validated `programId` (branded Address) for the denylist Set.has() lookup instead of the raw query-string `programAddress`. Behaviour is identical today since @solana/kit's address() validates without normalising, but keeps the lookup consistent with the validated value.
|
@greptile-apps please review |
…d Address for denylist lookup Greptile review comments (PR solana-foundation#994): - Add the missing Logger.warn assertion to the sub-discriminator-length test, matching the other decode-failure tests in the file. - Use the validated `programId` (branded Address) for the denylist Set.has() lookup instead of the raw query-string `programAddress`. Behaviour is identical today since @solana/kit's address() validates without normalising, but keeps the lookup consistent with the validated value.
a2c6c58 to
25c6faa
Compare
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
I haven't checked the implementation yet, but a bit worried about this case: Truly unexpected error. Having the |
|
This PR addresses this one #693 partially |
I've made the errors even more granular to skip everything non-actionable vividly |
…d Address for denylist lookup Greptile review comments (PR solana-foundation#994): - Add the missing Logger.warn assertion to the sub-discriminator-length test, matching the other decode-failure tests in the file. - Use the validated `programId` (branded Address) for the denylist Set.has() lookup instead of the raw query-string `programAddress`. Behaviour is identical today since @solana/kit's address() validates without normalising, but keeps the lookup consistent with the validated value.
2eda597 to
49461cd
Compare
…d Address for denylist lookup Greptile review comments (PR solana-foundation#994): - Add the missing Logger.warn assertion to the sub-discriminator-length test, matching the other decode-failure tests in the file. - Use the validated `programId` (branded Address) for the denylist Set.has() lookup instead of the raw query-string `programAddress`. Behaviour is identical today since @solana/kit's address() validates without normalising, but keeps the lookup consistent with the validated value.
4ed787b to
3c22e96
Compare
…d Address for denylist lookup Greptile review comments (PR solana-foundation#994): - Add the missing Logger.warn assertion to the sub-discriminator-length test, matching the other decode-failure tests in the file. - Use the validated `programId` (branded Address) for the denylist Set.has() lookup instead of the raw query-string `programAddress`. Behaviour is identical today since @solana/kit's address() validates without normalising, but keeps the lookup consistent with the validated value.
c48ebf8 to
1a284fc
Compare
The route previously routed every IDL-fetch failure through Logger.panic,
producing a steady stream of fatal Sentry events. The two largest sources
were JSON-RPC errors on the derived IDL PDA for built-in programs (notably
the Simd-296 testnet returning "Internal error" instead of null), and a
mainnet program whose IDL bytes were present but undecodable.
Replace AnchorProvider.fetchIdl with a direct @solana/kit RPC call plus
manual Anchor IDL decode, and classify the three real outcomes:
- 200 + { idl: ... } parsed Anchor IDL (cached 1h)
- 200 + { idl: null } non-Anchor program / undecodable bytes (cached 1h)
- 502 + Upstream RPC error transient SolanaError (not cached, warn-only)
- 502 + Failed to fetch IDL truly unexpected (escalates to Sentry)
Add a denylist of well-known non-Anchor programs (System, Token, Token-2022,
ATA, ComputeBudget, Memo, Vote, Stake, Config, sig-verify, BPF/Move/Native
loaders, address-lookup-table, zk-token) so the route short-circuits before
any RPC call. This is what previously panicked when the upstream RPC
returned a JSON error instead of null for the built-ins' IDL PDA.
Account bytes that exist but are undecodable (truncated, bad borsh, bad
zlib, bad JSON, wrong shape) are logged at warn level and cached as
{ idl: null }, so the client never re-fetches them.
Client-side cache in useIdlFromAnchorProgramSeed replaced with
useSWRImmutable + suspense. The old hand-rolled module-scope promise cache
had a race where two concurrent callers for the same program could each
fire a request; SWR dedupes natively. The hook also now logs non-OK API
responses (previously dropped silently).
Add clusterFromParam helper with a round-trip integer check (the old
`Number(value) in Cluster` form let "" and " " coerce to MainnetBeta).
Lives in the entity's server.ts entry point per FSD convention.
Consolidate toAddress / toPublicKey web3.js<->kit bridge helpers into
app/shared/lib/web3js-compat.ts.
…d Address for denylist lookup Greptile review comments (PR solana-foundation#994): - Add the missing Logger.warn assertion to the sub-discriminator-length test, matching the other decode-failure tests in the file. - Use the validated `programId` (branded Address) for the denylist Set.has() lookup instead of the raw query-string `programAddress`. Behaviour is identical today since @solana/kit's address() validates without normalising, but keeps the lookup consistent with the validated value.
`useSWRImmutable(key, fetcher, { suspense: true })` throws a promise on
first render with undefined data. The transaction inspector renders the
hook (via InstructionsSection) without an enclosing <Suspense> boundary,
so the suspense throw rolls back to the previous committed state —
`inspectorData = undefined` — and the page is stuck on the empty input
view. This broke `InspectorPage.spec.tsx` and `InspectorPage-SystemProgram.spec.tsx`
on CI.
(Subtlety: the spec file does vi.mock('swr', ...), but `swr/immutable`
resolves its own internal `import useSWR from 'swr'` via the pnpm-internal
module path, bypassing the mock. The real `useSWR` is what actually
runs, which is why suspense fires at all in the test.)
Drop `suspense: true`. The hook now returns `data ?? null` while the
fetch is in flight; cards already handle `null` by hiding the Anchor
tab. Existing <Suspense> boundaries on the address layout still cover
their other suspending children — they just no longer trigger a fallback
specifically for the IDL request.
…panic The previous catch-all `isSolanaError → warn` branch silenced real misconfiguration (wrong RPC token, missing API plan, header-stripping proxy, `Method not found`) by lumping it in with the Simd-296 "Internal error" storm the original fix was about. Adds `classifySolanaError`: codes in a small allowlist of ephemeral JSON-RPC server states stay transient (warn); `RPC__TRANSPORT_HTTP_ERROR` is bucketed by `context.statusCode` (5xx and 429 transient, rest of 4xx misconfig); everything else (auth, plan, method-not-found, malformed responses, programmer-bug-class codes) falls through to `Logger.panic` so misconfig pages Sentry on the first request instead of vanishing into a stream of warns.
Follows the route+config convention used by app/api/receipt/price and app/api/verification — keeps route.ts focused on request handling and makes the denylist / transient-error codes importable from tests.
…loads The suspense removal in 04eb0af collapsed "loading" and "no IDL" into the same null return, so IdlCard flashed the empty "no IDL" warning before the Anchor IDL resolved. Surface SWR's isLoading through useIdlFromAnchorProgramSeed and useAnchorProgram (still no suspense, so the inspector's boundary-less render tree stays intact) and gate the empty-state branch on it.
1a284fc to
c2be1a1
Compare
The previous loader fix only tracked isAnchorIdlLoading. For programs whose IDL comes from Program Metadata (the default/preferred tab), the Anchor fetch resolves to null first while Program Metadata is still in flight, so IdlCard flashed the empty no-IDL state instead of the loader. Surface isLoading from useProgramCanonicalMetadata through useProgramMetadataIdl/useProgramMetadataCodamaIdl and gate the empty-state branch on the combined loading flag. Also fix a pre-existing tsc failure in the anchor route spec (SOLANA_ERROR__RPC__API_PLAN_MISSING_FOR_RPC_METHOD context uses method/params, not rpcMethod/rpcParams).
Description
/api/anchorrouted every IDL-fetch failure throughLogger.panic, producing a steady stream of fatal-level Sentry events. The two largest sources we observed:Internal errorreturned for the derived IDL PDA of built-in programs (notably the Simd-296 testnet) instead of a cleannull. Every transaction touching System / Token / ComputeBudget / Vote / Stake on that cluster paged.C7QLEmDz81Usvy2sYa4xZSdA8EwEcYvZo8iuYZMaqXmj) whose IDL bytes are present but undecodable — JSON inflation/parse threw a non-buffer-bounds error and tripped the catch-all panic branch.This PR replaces the
AnchorProvider/Program.fetchIdlpath with a direct@solana/kitRPC call plus a manual Anchor IDL decode, and classifies the four real outcomes explicitly:200{ "idl": { ... } }200{ "idl": null }502{ "error": "Upstream RPC error" }Logger.warn(not escalated)502{ "error": "Failed to fetch IDL" }Logger.panicKey behavioural changes:
{ idl: null }before any RPC call, so an upstream that errors instead of returningnullfor the IDL PDA can no longer escalate.SolanaErrorfrom the RPC client is treated as a transient502withLogger.warn, not a panic. App layer can't act on it, but we don't escalate either.warnand cached as{ idl: null }, so the client never re-fetches them.Bonus refactors that ride along:
useIdlFromAnchorProgramSeedreplaced withuseSWRImmutable+ suspense. The old hand-rolled module-scope promise cache had a race where two concurrent callers for the same program could each fire a request; SWR dedupes natively. The hook also now logs non-OK API responses (previously dropped silently).clusterFromParamhelper extracted toapp/entities/cluster/server.tswith a round-trip integer check. The oldNumber(value) in Clusterform let""and" "coerce toCluster.MainnetBeta.toAddress/toPublicKeyweb3.js↔kit bridge helpers consolidated intoapp/shared/lib/web3js-compat.ts.Type of change
Testing
Unit tests
app/api/anchor/__tests__/route.spec.ts— denylist short-circuit, no-account-at-PDA, valid IDL, garbage bytes, wrong-shape JSON, sub-discriminator-length, transientSolanaError, and unexpected-error escalation.app/entities/cluster/lib/__tests__/cluster-from-param.spec.ts— known values, out-of-range, non-numeric inputs.Manual verification
Open DevTools → Network, filter
anchor, on the Vercel preview below.Base URL:
https://explorer-git-fix-anchor-idl-fatals-solana-foundation.vercel.app1. Parsed Anchor IDL — expect
200with parsed{ idl: { ... } }, IDL card renders.2. Denylisted built-ins — expect
200with{ idl: null }, no RPC roundtrip (server-side response sub-millisecond), no IDL card rendered.3. Undecodable IDL — expect
200with{ idl: null }, server logsLogger.warn '[api:anchor] IDL account present but undecodable', no Sentry panic.SyntaxErrorpanics.4. Built-in on Simd-296 testnet — expect
200with{ idl: null }instantly, regardless of whether the upstream RPC returnsnullcleanly or errors out (the denylist short-circuits before any RPC call).5. Multi-instruction transaction (denylist + Anchor mixed) — expect one
/api/anchorrequest per distinctprogramId, built-ins return200/{idl:null}instantly, real Anchor programs return parsed IDLs, no502s. Reloading the page serves all responses from HTTP cache.https://explorer-git-fix-anchor-idl-fatals-solana-foundation.vercel.app/tx/<sig>?cluster=mainnet-beta.6. Bad-request sanity
GET https://explorer-git-fix-anchor-idl-fatals-solana-foundation.vercel.app/api/anchor→400 { "error": "Invalid query params" }GET https://explorer-git-fix-anchor-idl-fatals-solana-foundation.vercel.app/api/anchor?cluster=999&programAddress=…→400 { "error": "Invalid cluster" }GET https://explorer-git-fix-anchor-idl-fatals-solana-foundation.vercel.app/api/anchor?cluster=0&programAddress=not-a-pubkey→400 { "error": "Invalid program address" }Post-deploy verification
24h after release, in the related Sentry issue (
SOLANA-EXPLORER-15in our tracker):level=errorfrom the unexpected-error branch, notlevel=fatalfrom the old catch-all panic.extra.programAddressmatching any address in section 2.Related Issues
Closes HOO-531
Checklist
build:infoscript to update build information