Context window management and compaction

## Problem Statement

The SDK has no awareness of context window limits during agent execution. When the conversation history grows beyond the model's context window, the agent fails with `FinishReasonLength`. There is no proactive management, warning, or mitigation.

Specific issues:

1. **No token counting before sending**: The agent builds a request and sends it to the provider without checking whether the message history fits within `ModelConstraints.MaxInputTokens`. The constraint data exists but is never used proactively.

2. **No graceful degradation**: When context is 99% full, the agent happily accepts a new task, sends the request, and fails. There's no early warning or preemptive action.

3. **No compaction strategy**: When messages accumulate over a long conversation, there's no way to trim, summarize, or compress the history to stay within limits. The session just grows until it hits the wall.

4. **No per-agent control**: Different agents in a multi-agent system may need different context management strategies (e.g., a research agent can afford to lose old context, but an audit agent needs full history).

## Proposed Solution

Context window management could be implemented as a combination of a built-in `TurnInterceptor` plugin and agent-level configuration. This keeps the core agent loop simple while providing opt-in management.

### Token estimation

Add a token estimation utility that can approximate token count for a message list:

```go
// TokenEstimator estimates token counts for messages.
type TokenEstimator interface {
    EstimateTokens(messages []llm.Message) int
}

// DefaultEstimator uses a rough 4-chars-per-token heuristic.
// Provider-specific estimators can use tiktoken, etc.
```

### Context management interceptor

A `TurnInterceptor` that checks context usage before each turn and applies a configured strategy:

```go
ctxManager := contextmgmt.New(
    contextmgmt.WithMaxTokenRatio(0.85),      // Act when context is 85% full
    contextmgmt.WithStrategy(contextmgmt.SlidingWindow(20)), // Keep last 20 messages
    // OR: contextmgmt.WithStrategy(contextmgmt.Summarize(summaryModel))
    // OR: contextmgmt.WithStrategy(contextmgmt.DropOldest(keepSystem: true))
    // OR: contextmgmt.WithStrategy(customStrategyFunc)
)

agent, _ := llmagent.New("assistant", prompt, model,
    llmagent.WithInterceptors(ctxManager),
)
```

### Strategies

- **Sliding window**: Keep the N most recent messages (simplest, no LLM call needed)
- **Drop oldest**: Remove oldest messages while keeping system prompt and recent context
- **Summarize**: Use an LLM to summarize older messages into a condensed form, then replace them (most expensive but preserves information)
- **Custom**: User-provided function that receives messages and token budget, returns trimmed messages

### Proactive warning

Emit a `StatusEvent` when context usage exceeds a configurable threshold, giving consumers visibility:

```go
StatusEvent{
    Stage:   StatusStageTurnStarted,
    Details: "context window at 87% capacity (35,200/40,000 tokens), compaction applied",
}
```

## Use Case Example

Long-running support conversation:

```go
// After 50+ exchanges, the conversation has grown to 35k tokens on a 40k model
// Without management: next turn fails with FinishReasonLength
// With management:

agent, _ := llmagent.New("support", prompt, model,
    llmagent.WithInterceptors(
        contextmgmt.New(
            contextmgmt.WithMaxTokenRatio(0.80),
            contextmgmt.WithStrategy(contextmgmt.SlidingWindow(30)),
        ),
    ),
)

// At turn 51, the interceptor detects context is at 87%
// It trims to the last 30 messages (keeping system prompt)
// Agent continues working normally
// StatusEvent emitted for observability
```

Summarization for high-value conversations:

```go
contextmgmt.WithStrategy(contextmgmt.Summarize(cheapModel))
// When context exceeds threshold:
// 1. Takes messages 0..N-10 (old messages)
// 2. Sends them to cheapModel with "summarize this conversation"
// 3. Replaces old messages with a single summary message
// 4. Keeps last 10 messages intact for recency
```

## Why This Matters

- **Production reliability**: Long-running agent sessions (customer support, coding assistants, research tasks) will inevitably hit context limits. Failing with an error is the worst possible UX.
- **Cost efficiency**: Summarization reduces token usage for subsequent turns, directly reducing API costs for long conversations.
- **Already have the data**: `ModelConstraints.MaxInputTokens` is already populated per model. The infrastructure for proactive checking exists — it just needs to be wired up.
- **Plugin-friendly**: Implementing this as a `TurnInterceptor` means it's opt-in, composable with other interceptors, and doesn't add complexity to the core agent loop.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Context window management and compaction #96

Problem Statement

Proposed Solution

Token estimation

Context management interceptor

Strategies

Proactive warning

Use Case Example

Why This Matters

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Context window management and compaction #96

Description

Problem Statement

Proposed Solution

Token estimation

Context management interceptor

Strategies

Proactive warning

Use Case Example

Why This Matters

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions