Problem Statement
The SDK has no awareness of context window limits during agent execution. When the conversation history grows beyond the model's context window, the agent fails with FinishReasonLength. There is no proactive management, warning, or mitigation.
Specific issues:
-
No token counting before sending: The agent builds a request and sends it to the provider without checking whether the message history fits within ModelConstraints.MaxInputTokens. The constraint data exists but is never used proactively.
-
No graceful degradation: When context is 99% full, the agent happily accepts a new task, sends the request, and fails. There's no early warning or preemptive action.
-
No compaction strategy: When messages accumulate over a long conversation, there's no way to trim, summarize, or compress the history to stay within limits. The session just grows until it hits the wall.
-
No per-agent control: Different agents in a multi-agent system may need different context management strategies (e.g., a research agent can afford to lose old context, but an audit agent needs full history).
Proposed Solution
Context window management could be implemented as a combination of a built-in TurnInterceptor plugin and agent-level configuration. This keeps the core agent loop simple while providing opt-in management.
Token estimation
Add a token estimation utility that can approximate token count for a message list:
// TokenEstimator estimates token counts for messages.
type TokenEstimator interface {
EstimateTokens(messages []llm.Message) int
}
// DefaultEstimator uses a rough 4-chars-per-token heuristic.
// Provider-specific estimators can use tiktoken, etc.
Context management interceptor
A TurnInterceptor that checks context usage before each turn and applies a configured strategy:
ctxManager := contextmgmt.New(
contextmgmt.WithMaxTokenRatio(0.85), // Act when context is 85% full
contextmgmt.WithStrategy(contextmgmt.SlidingWindow(20)), // Keep last 20 messages
// OR: contextmgmt.WithStrategy(contextmgmt.Summarize(summaryModel))
// OR: contextmgmt.WithStrategy(contextmgmt.DropOldest(keepSystem: true))
// OR: contextmgmt.WithStrategy(customStrategyFunc)
)
agent, _ := llmagent.New("assistant", prompt, model,
llmagent.WithInterceptors(ctxManager),
)
Strategies
- Sliding window: Keep the N most recent messages (simplest, no LLM call needed)
- Drop oldest: Remove oldest messages while keeping system prompt and recent context
- Summarize: Use an LLM to summarize older messages into a condensed form, then replace them (most expensive but preserves information)
- Custom: User-provided function that receives messages and token budget, returns trimmed messages
Proactive warning
Emit a StatusEvent when context usage exceeds a configurable threshold, giving consumers visibility:
StatusEvent{
Stage: StatusStageTurnStarted,
Details: "context window at 87% capacity (35,200/40,000 tokens), compaction applied",
}
Use Case Example
Long-running support conversation:
// After 50+ exchanges, the conversation has grown to 35k tokens on a 40k model
// Without management: next turn fails with FinishReasonLength
// With management:
agent, _ := llmagent.New("support", prompt, model,
llmagent.WithInterceptors(
contextmgmt.New(
contextmgmt.WithMaxTokenRatio(0.80),
contextmgmt.WithStrategy(contextmgmt.SlidingWindow(30)),
),
),
)
// At turn 51, the interceptor detects context is at 87%
// It trims to the last 30 messages (keeping system prompt)
// Agent continues working normally
// StatusEvent emitted for observability
Summarization for high-value conversations:
contextmgmt.WithStrategy(contextmgmt.Summarize(cheapModel))
// When context exceeds threshold:
// 1. Takes messages 0..N-10 (old messages)
// 2. Sends them to cheapModel with "summarize this conversation"
// 3. Replaces old messages with a single summary message
// 4. Keeps last 10 messages intact for recency
Why This Matters
- Production reliability: Long-running agent sessions (customer support, coding assistants, research tasks) will inevitably hit context limits. Failing with an error is the worst possible UX.
- Cost efficiency: Summarization reduces token usage for subsequent turns, directly reducing API costs for long conversations.
- Already have the data:
ModelConstraints.MaxInputTokens is already populated per model. The infrastructure for proactive checking exists — it just needs to be wired up.
- Plugin-friendly: Implementing this as a
TurnInterceptor means it's opt-in, composable with other interceptors, and doesn't add complexity to the core agent loop.
Problem Statement
The SDK has no awareness of context window limits during agent execution. When the conversation history grows beyond the model's context window, the agent fails with
FinishReasonLength. There is no proactive management, warning, or mitigation.Specific issues:
No token counting before sending: The agent builds a request and sends it to the provider without checking whether the message history fits within
ModelConstraints.MaxInputTokens. The constraint data exists but is never used proactively.No graceful degradation: When context is 99% full, the agent happily accepts a new task, sends the request, and fails. There's no early warning or preemptive action.
No compaction strategy: When messages accumulate over a long conversation, there's no way to trim, summarize, or compress the history to stay within limits. The session just grows until it hits the wall.
No per-agent control: Different agents in a multi-agent system may need different context management strategies (e.g., a research agent can afford to lose old context, but an audit agent needs full history).
Proposed Solution
Context window management could be implemented as a combination of a built-in
TurnInterceptorplugin and agent-level configuration. This keeps the core agent loop simple while providing opt-in management.Token estimation
Add a token estimation utility that can approximate token count for a message list:
Context management interceptor
A
TurnInterceptorthat checks context usage before each turn and applies a configured strategy:Strategies
Proactive warning
Emit a
StatusEventwhen context usage exceeds a configurable threshold, giving consumers visibility:Use Case Example
Long-running support conversation:
Summarization for high-value conversations:
Why This Matters
ModelConstraints.MaxInputTokensis already populated per model. The infrastructure for proactive checking exists — it just needs to be wired up.TurnInterceptormeans it's opt-in, composable with other interceptors, and doesn't add complexity to the core agent loop.