- YAML frontmatter parsing: use
bytes.SplitN(content, []byte("---"), 3)to split. Element 0 is empty (before first---), element 1 is YAML, element 2 is body. Handle\r\nline endings. gitops.CommitQueueper-project ordering, idle teardown: the queue spawns one goroutine perProjectvalue and serializes that project's commits in enqueue order; different projects commit in parallel. Production wiresgitops.NewCommitQueue(git, 0, gitops.WithIdleTimeout(30*time.Minute))inmain.go, so a worker that goes idle for 30 minutes exits and the nextEnqueuefor that project spawns a fresh one —projectWorker.closedplus the per-worker mutex stop an Enqueue from sending into a channel the worker is about to abandon. Do not assume worker identity across an idle gap when reasoning about cached state.CardService.LockWritesis paired with queue pause + drain: the gitsync layer holdsLockWritesacross a pull+rebase so no card mutation interleaves with the rebase. The same function also callscommitQueue.Pause()andAwaitIdle(ctx)with a 30-second budget so an in-flight go-git commit cannot collide with the shell-git rebase on.git/index.lock.UnlockWritesmust callResume()before releasingwriteMu; reversing the order leaves the queue paused under a fresh write.- Deferred git commits (
boards.git_deferred_commit): Whenboards.git_deferred_commit: trueinconfig.yaml, agent mutations (heartbeats, log entries, intermediate updates) are batched and committed in a single flush at release/complete time instead of per-operation. This reduces git churn during long agent work sessions. However, two categories of mutation always commit immediately, even when deferred mode is on: (1) card creation — both the card file and.board.yamlare committed together so the new card survives agit pullon another machine; (2) human edits to unclaimed cards via the REST API — the PUT/PATCH handlers setImmediateCommit: truewhencard.AssignedAgent == "", triggering an immediate commit. MCP tool callers (agents) never set this flag, so their commits continue to defer normally. - MCP middleware chain and body limit:
/mcpis registered on the same innerhttp.ServeMuxas the REST API, so it automatically inherits the shared middleware chain (recovery, security headers, CORS when enabled, request ID, observe/metrics+logging, body limit, csrfGuard). The body-size cap is 5 MB (maxRequestBodySize) — sized to the largest legitimate MCP card payload and applied uniformly to every route. Requests with aContent-Lengthexceeding 5 MB are rejected with413 Payload Too Largebefore the body is read; requests withoutContent-Lengthare capped during reads viahttp.MaxBytesReader. - SSE and MCP streaming vs.
WriteTimeout: Go'shttp.Server.WriteTimeoutis an absolute deadline measured from when request headers are read — it is NOT reset by intermediate writes (keepalive comments, partial event data, etc.). Long-lived SSE connections will always hit it, causing the client to see an abrupt disconnect everyWriteTimeoutseconds regardless of keepalive activity. The fix ishttp.NewResponseController(w).SetWriteDeadline(time.Time{})called before entering the streaming loop. This clears the deadline for that one connection only; all other endpoints keep the server-wide timeout. Applied ininternal/api/events.go(SSE event stream),internal/api/runner_logs.go(runner SSE log stream), and as theclearWriteDeadlineForStreamingmiddleware ininternal/mcp/server.go(MCP GET stream). The MCP middleware scopes the clear toGETrequests only —POSTandDELETE(short RPC calls) retain the normalWriteTimeout. Critical:ResponseControllerfinds the underlying connection by callingUnwrap()on theResponseWriter. Any middleware that wraps the writer (e.g., the logging middleware'sresponseWriter) must implementUnwrap() http.ResponseWriterorSetWriteDeadlinesilently fails — the error is non-fatal, so the handler continues but the timeout stays active. - Tailwind v4 preflight strips
list-stylefromul/ol:@import "tailwindcss"injects@layer base { ol, ul { list-style: none } }, which overrides browser UA defaults. Third-party markdown libraries (e.g.@uiw/react-markdown-preview) setlist-style-typeonly on nested levels and rely on UA defaults for the top level — so bullets and numbers silently disappear. Restore them with explicit!importantrules scoped to the library's wrapper class (e.g..wmde-markdown ul { list-style: disc !important }). Also re-assert the nested cascade (lower-roman,lower-alpha) because your!importanton the base rule wins over the library's non-!importantnested rules. - Frontend embed:
//go:embed all:distinweb/embed.go(packageweb). Theall:prefix is required so dotfiles underdist/are included; a plainweb/dist/*glob would silently miss them. Must build frontend before building Go binary. SPA routing requires a fallback toindex.htmlfor all non-API, non-file routes. - 404 handling is React Router's job:
newSPAHandlerreturnsindex.htmlfor every path that isn't an/api/prefix,/healthz,/readyz,/mcp, or a real static file. The Go layer never returns a 404 for UI paths. Unknown routes are caught by<Route path="*" element={<NotFound />} />placed as the last route in bothApp.tsx(top-level) andProjectShell.tsx(nested project routes). If you add a newRoutessubtree, add its own catch-all or users will see a blank screen instead of the 404 page. - stdlib URL params: use
r.PathValue("project")(Go 1.22+). Route patterns use{project}syntax:mux.HandleFunc("GET /api/projects/{project}", handler). time.Durationin YAML:time.Durationdoesn't unmarshal from strings like"30m"withgopkg.in/yaml.v3. Either use a custom type withUnmarshalYAML, or store as string in config and parse withtime.ParseDuration()at load time./healthzand/readyzrequests are not logged: the HTTP logging middleware skipsslog.InfoforGET /healthzandGET /readyzto prevent k8s liveness/readiness probe traffic from spamming logs. Both endpoints still respond normally — only the log line is suppressed. If you expect to see probe traffic in logs for debugging, hit any other path or check the endpoints directly withcurl.- Firefox per-origin SSE connection limit: Firefox's connection manager
cancels in-flight requests to the same origin with
NS_BINDING_ABORTED/ "connection interrupted while the page was loading" when a new navigation- adjacent fetch pushes the total past its limit. Practically: if the app opens ≥ 3EventSource('/api/events')connections and then a 4th SSE stream opens at the same origin (e.g./api/runner/logson HITL start), Firefox aborts the earlier three simultaneously. Chrome does not exhibit this behaviour. The fix is to share a singleEventSourcefor the whole app viaSSEProviderand fan events out to subscribers in-process — seeweb/src/hooks/useSSEBus.tsx. Never open more than oneEventSourceper distinct URL; use the subscriber API for additional consumers of the same stream. For runner logs specifically,ProjectShellowns a single card-scopeduseRunnerLogscall (enabled only while the selected card is a HITL running session) and passes the resultingLogEntry[]array down toCardChatas a prop —CardChatdoes not open its ownEventSource. sessionlog.Managerfan-out invariants:readUpstream(card-scoped) andreadProjectUpstream(project-scoped) both append to the ring buffer and fan out to subscribers under a singlem.mulock. These two operations must stay under the same lock — separating them reintroduces the duplicate-delivery race where an event lands in the snapshot AND insub.pendingfor the same subscriber. The primed-flag protocol (sub.primed,sub.pending) is what enforces snapshot-before-live ordering: the pump stages live events insub.pendingwhilesub.primedis false; the snapshot goroutine inSubscribe/SubscribeProjectflipsprimed = true(underm.mu) only after draining both the snapshot slice andsub.pendinginto the subscriber's channel. Do not bypass this gate. Two additional channels onsubscriberenforce lifecycle safety:done(closed byunsuborStop/terminal-error) signals the snapshot goroutine to exit early;snapDone(closed by the snapshot goroutine viadefer) signals that it has exited.Stopand the terminal-error path in both pumps callcloseSubscriber, which closesdone, waits onsnapDone(up to 1 s), then sends the terminal event and callsclose(ch). This ordering is mandatory: closingchwhile the snapshot goroutine is still sending on it panics.close(done)is guarded bysync.Once(doneOnce) so bothunsubandStopcan call it safely. The snapshot goroutine blocks on each channel send (select { case ch <- evt: case <-sub.done: return }) rather than dropping — slow subscribers receive the full snapshot; they are never silently truncated. Project-scoped sessions use the key"project:<name>"in the sharedactiveSessions,pendingSubs, andsessionsmaps; this prefix prevents collisions with card IDs. The only difference from the card-scoped pump is thatreadProjectUpstreamdoes not filter by card ID — it accepts every event and preserves the originatingCardIDfield onsessionlog.Event.request_idlog correlation: every HTTP request gets arequest_idUUID injected into its context by therequestIDmiddleware viactxlog.WithRequestID(ctx, id). All log sites must usectxlog.Logger(ctx)— notslog.Default()or a package-level logger — otherwise the log line will not carry the correlation ID. Background goroutines (stall scanner, git-pull ticker) do not go through the middleware;ctxlog.Logger(ctx)falls back toslog.Default()safely in those paths.- MCP tool name in the request log line: for
POST /mcprequests theobservemiddleware emits two extra fields alongside the standardmethod,path,status,duration_ms, andrequest_idfields:mcp_method(JSON-RPC method, e.g.tools/call) andmcp_tool(tool name, e.g.claim_cardorreport_usage). Both fields are omitted for non-MCP routes and for MCP methods other thantools/call(e.g.initialize) where there is no tool name. The extraction is best-effort — a body-peeking middleware (mcpRequestInfoMiddlewareininternal/mcp/server.go) reads the request body, parses the JSON-RPC envelope, restores the body, and writes the results into a*ctxlog.MCPCallstashed in the context byobserve. Errors during extraction are swallowed; the log line is still emitted with whatever fields were successfully extracted. /metricsand pprof live on the admin port: Prometheus scraping (GET /metrics) and/debug/pprof/*are served only on the admin listener (admin_port), which defaults to127.0.0.1(admin_bind_addr). The main listener never exposes them. There is no authentication on the admin listener — keep it loopback-only, or gate with firewall / NetworkPolicy / service-mesh rules if your scrape setup requires a non-loopback bind. A non-loopback bind logs a warning at startup.- PAT mode requires specific permissions: when
github.auth_mode: pat, the fine-grained PAT must haveContents: Read and writeon the boards repo andIssues: Read-onlyon each project repo referenced in.board.yamlthat hasgithub.import_issues: true. PAT mode only works with GitHub (github.com or GHEC/GHES); for non-GitHub hosts, use a different auth mode. - All git remote URLs must be HTTPS:
boards.git_remote_urlandtask_skills.git_remote_urlare validated at startup and must start withhttps://regardless ofgithub.auth_mode. SSH URLs are rejected unconditionally — there is no SSH transport fallback. - Chat SQLite: WAL + MaxOpenConns + manager-level writer mutex: the chat
store sets
MaxOpenConns=5so concurrent readers (ListMessages,MaxSeq,GetSession) do not queue behind a writer. SQLite remains a single-writer engine regardless of pool size; the single-writer gate ischat.Manager.mu, held across the entire seq-assignment + store insert inAppendMessage. Do not move the store write outside the lock — disk insertion order must match seq order, and the in-memory seq cache must stay consistent with the on-disk(session_id, seq)UNIQUE index. The pool size is a reader-concurrency knob; raising the writer concurrency requires changing the manager's locking model, not the pool. - SQLite driver is
modernc.org/sqlite, pragmas live in the DSN: the chat store openssql.Open("sqlite", path+"?_pragma=foreign_keys(1)&_pragma=journal_mode(WAL)&_pragma=busy_timeout(5000)"). The driver name is"sqlite"(not"sqlite3"— that registration belongs tomattn/go-sqlite3, which we do not import to keep the binary CGO-free). The_pragma=...query-string parameters are amodernc.orgextension; switching drivers means rewriting these asPRAGMAstatements executed on the open connection. - Chat SSE per-session subscriber cap:
SSEHub.Subscribereturns an error if a session already has 32 live subscribers (maxSubscribersPerSessionininternal/chat/sse.go). A normal browser tab is one subscriber; the cap blocks runaway clients from exhausting goroutines and channel memory. The 128-entry ring buffer per session is also a hard cap — events older than the ring window are gone, and reconnects past the window depend on the REST bootstrap inuseChatStreamto backfill. Session-update events (session_updated) are NOT stored in the ring; onlymessageevents are, since session metadata is meant to be re-fetched on reconnect. - Op-store schema is a clean-cut create — existing
chats.dbmust be deleted on upgrade:ensureSchemaininternal/opstore/sqlite/schema.goruns plainCREATE TABLE IF NOT EXISTSDDL for every table (model_blacklist,chat_sessions,chat_messages,chat_cost_archive) in the sharedops.db. There is no migration ledger — noschema_migrations(version, applied_at)table, no stepwise history, noaddColumnIfMissinghelper. The DB is not migrated: an obsolete one is deleted and recreated by the operator. To change the schema, edit theensureSchemaDDL directly — it is all idempotentCREATE ... IF NOT EXISTS. Achats.dbfrom a previous install is not forward-compatible; operators must delete it before starting a server that includes this change. useChatStreamring buffer + REST bootstrap seam: the hook usesuseRingBuffer(2000)and pairs the SSE subscription with a REST bootstrap viaGET /api/chats/{id}/messages?since_seq=0. On mount / sessionID change, the hook fetches the persisted transcript first, records the highestseq, then subscribes to the SSE stream withsince_seq=<last>. SSE events whose seq falls inside the bootstrap window are deduped on the client. Reverting to SSE-only (no bootstrap) loses everything older than the server-side 128-entry ring on refresh.- Chat rehydration is best-effort and never blocks
/open: the runner'sprepareChatResumewritesresume.jsonl+resume.meta.jsoninto a per-container host directory before starting the container. If the write fails (host tmp not writable, disk full, etc.),manager.gologsStartChat: rehydration file prep failed; starting fresh agent, omitsCM_CHAT_RESUME=1, and starts the container anyway. The stdin priming envelope is still written (it is gated on the CM payload'sresume, not on the file-write outcome), so the agent receives the instructions, fails to read/run/cm-chat/resume.jsonl, and callschat_rehydration_completewith a summary that says so./openalways returns200; surface failures via the transcript, never by refusing to open. rehydration_phasestamping prevents reopen pollution: every message appended whileSession.RehydrationActive=TRUEgets stamped withrehydration_phase=TRUEinchat_messages.chat.transcript.Builddrops those rows when assembling the next resume payload, so the resumed agent never sees prior agents'ls/Read/Bashchatter — only real conversation turns plus the priorchat_rehydration_completesummaries. Without this filter, each reopen would compound the previous reopen's rehydration noise into the next transcript.claude -p PROMPT --input-format stream-jsondoes NOT auto-executePROMPT: Claude treats-pas system context when stream-json input is enabled, not as a user message. The rehydration priming therefore has to arrive as a stream-jsonuser-typed envelope written to stdin afterAttachChatStdinsucceeds — seewebhook/chat.goandstreammsg.BuildUserMessage. The same applies to any future "kick the agent off with X" pattern in chat or HITL modes: use a stream-json user envelope on stdin, not-p. Cost us several iterations during the chat rehydration build.