Skip to content

Latest commit

 

History

History
352 lines (243 loc) · 17.6 KB

File metadata and controls

352 lines (243 loc) · 17.6 KB
id: thread-lifecycle
title: "Thread Lifecycle"
description: How threads are created, executed, and finalized
category: orchestration
tags: [threads, lifecycle, states, registry]
version: "1.1.0"

Thread Lifecycle

Every thread follows a deterministic lifecycle: generate an ID, register in the registry, load the directive, resolve limits and permissions, run the LLM loop, finalize spend and status.

Thread States

created ──→ running ──→ completed
                    ├──→ error
                    ├──→ cancelled
                    └──→ continued
State Meaning
created Registered in registry, not yet executing
running LLM loop is active
completed Finished successfully — result available
error Failed — error message in result
cancelled Cancelled via cancel_thread operation
continued Handed off to a new thread (context limit reached)

Thread ID Generation

Thread IDs are derived from the directive name and a Unix epoch timestamp:

thread_id = f"{directive_name}-{int(time.time())}"
# Example: "agency-kiwi/discover_leads-1739820456"

This makes thread IDs human-readable (you can see which directive spawned them) and unique (epoch seconds prevent collisions for typical usage).

For async tool execution (via _launch_async()), thread IDs are UUID v4 strings:

thread_id = str(uuid.uuid4())
# Example: "a1b2c3d4-e5f6-7890-abcd-ef1234567890"

Both formats are tracked in the same ThreadRegistry.

The Full Execution Flow

thread_directive.execute() runs these steps in order:

Step 1: Resolve parent context

The thread discovers its parent through three sources (first match wins):

  1. Explicit parent_thread_id parameter (used by handoff/resume)
  2. RYE_PARENT_THREAD_ID environment variable (set by parent threads)
  3. No parent — this is a root thread

If a parent is declared but its thread.json doesn't exist, execution fails immediately.

Step 2: Register thread

The thread is registered in the SQLite registry (registry.db) with status created, its directive name, and parent ID. The registry tracks all threads across the project.

Step 3: Load directive

The directive is loaded via DirectiveResolver, searching project → user → system spaces. The markdown/xml parser extracts metadata (limits, permissions, model, inputs) from the XML fence and preserves the raw content for the LLM prompt.

For normal execution, the inputs/validate and inputs/interpolate processors handle input validation and interpolation. For resume/handoff, FetchTool is used instead (no input validation needed since the directive ran before).

Step 4: Resolve extends

After the directive is loaded, resolve_extends hooks fire to allow dynamic routing into extends chains. Hooks can set or override the directive's extends attribute before the extends chain is walked.

Hook context includes:

Key Description
directive Directive name
has_extends Whether the directive already declares an extends target
category Directive category
inputs Resolved inputs
model Resolved model

First matching hook wins. A hook's set_extends action sets the directive's extends target.

Step 5: Resolve extends chain and compose context

The extends chain is walked — each parent directive's context is composed into the current directive. This merges capabilities, hooks, and context from ancestor directives.

Step 6: Reconstruct resume messages

For continuation/resume threads, previous thread transcript messages are reconstructed from the prior thread's transcript. For fresh threads this step is a no-op.

Step 7: Build limits

Limits are resolved through a layered merge:

defaults (resilience.yaml) → directive metadata → limit_overrides → parent upper bounds

Parent limits cap all values via min(). A child can never exceed its parent's limits. Depth decrements by 1 per level — if the parent has depth: 5, the child gets depth: 4.

Step 8: Check depth

If resolved depth is less than 0 (i.e., the parent's depth was already exhausted), the thread returns an error immediately. This prevents infinite recursion.

Step 9: Check spawns limit

If the thread has a parent, the orchestrator checks whether the parent has exceeded its spawns limit. If so, the thread returns an error. Otherwise, the parent's spawn count is incremented.

Step 10: Build hooks, harness, and preload tool schemas

Hooks are merged from five sources and sorted by layer:

Layer Source Config Location Purpose
0 User hooks ~/.ai/config/agent/hooks.yaml Cross-project personal hooks
1 Directive hooks Directive XML <hooks> block Per-directive hooks
2 Builtin hooks System hook_conditions.yaml Error/limit/compaction defaults
3 Project hooks .ai/config/agent/hooks.yaml Project-wide hooks
4 Infra hooks System hook_conditions.yaml Infrastructure (emitter, checkpoint)

User and project hooks use the same format as directive hooks — id, event, optional condition, and action. See Hooks Configuration below.

The SafetyHarness is constructed with the resolved limits, merged hooks, directive permissions, and parent capabilities.

After capabilities are resolved and attached to the harness, dynamic tool registration runs. tool_schema_loader.preload_tool_schemas() scans capabilities for rye.execute.tool.* patterns, resolves matching tools across the 3-tier space, extracts CONFIG_SCHEMA and __tool_description__ via AST, and returns structured tool_defs. Each tool gets a flattened API name (e.g., rye_file_system_read) and a _primary field for dispatch routing. Primary actions (rye/fetch, rye/sign) are included as peers. The tool defs are set on harness.available_tools and registered in the LLM's native tool palette. A capabilities tree is also generated for transcript metadata. Token budget is ~2000.

Step 11: Reserve budget

The hierarchical budget ledger handles cost tracking:

  • Root threads: ledger.register(thread_id, max_spend) — creates a top-level budget entry
  • Child threads: ledger.reserve(thread_id, spend_limit, parent_thread_id) — atomically reserves budget from the parent's remaining allocation

If the parent has insufficient remaining budget, the reservation fails and the thread returns an error.

Step 12: Write initial thread.json

The thread metadata file is written to .ai/state/threads/<thread_id>/thread.json:

{
  "thread_id": "agency-kiwi/discover_leads-1739820456",
  "directive": "agency-kiwi/discover_leads",
  "status": "running",
  "created_at": "2026-02-17T10:00:56+00:00",
  "updated_at": "2026-02-17T10:00:56+00:00",
  "model": "claude-3-5-haiku-20241022",
  "limits": {
    "turns": 10,
    "tokens": 200000,
    "spend": 0.10,
    "depth": 3,
    "spawns": 10
  },
  "capabilities": [
    "rye.execute.tool.scraping.gmaps.scrape_gmaps",
    "rye.fetch.knowledge.agency-kiwi.*"
  ]
}

The thread.json file is signed using canonical JSON serialization with a _signature field, protecting capabilities and limits from tampering.

Step 13: Set parent env var

RYE_PARENT_THREAD_ID is set to this thread's ID so any child subprocesses (spawned via async) inherit the parent relationship.

Step 14: Run thread synchronously / spawn async

  • Synchronous (default): Calls runner.run() directly and blocks until completion
  • Asynchronous (async: true): Triggered by execute directive with async: true, which delegates to thread_directive internally. spawn_detached() launches a subprocess that re-executes thread_directive.py with --thread-id and --pre-registered flags. The child rebuilds all state from scratch. Detached spawning uses the lillux exec spawn Rust binary for cross-platform support, with a POSIX subprocess.Popen fallback. The parent process returns immediately with {"thread_id": "...", "status": "running"}

Step 15: Report spend and finalize

After the LLM loop completes:

Note: When directive_return was called during the LLM loop, the final result includes an outputs dict (the structured key-value pairs from the return call) alongside the raw result text.

  1. Report actual spend to the ledger: ledger.report_actual(thread_id, actual_spend)
  2. Cascade spend to parent: ledger.cascade_spend(thread_id, parent_thread_id, actual_spend)
  3. Release budget reservation: ledger.release(thread_id, final_status)
  4. Update registry status: registry.update_status(thread_id, status)
  5. Store result in registry: registry.set_result(thread_id, cost)
  6. Write final thread.json with cost and updated status

The Runner's LLM Loop

runner.run() manages the core conversation loop. There is no system prompt — tools are passed via API tool definitions, and context framing is injected through hooks.

First Message Construction

run_hooks_context() takes an explicit event parameter (required, no default) and dispatches hooks matching that event:

  • Fresh threads: run_hooks_context(event="thread_started") fires thread_started hooks. Context includes directive_body and inputs. Hook context and the user prompt (full directive content) are concatenated into a single user message.
messages = [{"role": "user", "content": f"{hook_context}\n\n{directive_prompt}"}]
  • Continuation threads (when resume_messages is provided): run_hooks_context(event="thread_continued") fires thread_continued hooks instead. Context is injected near the last user message (not prepended). The context dict also includes previous_thread_id and inputs, available for interpolation.

Turn Loop

Each turn follows this sequence:

  1. Check limitsharness.check_limits(cost) tests turns, tokens, spend, duration. If exceeded, hooks evaluate the limit event. If no hook handles it, the thread terminates with a limit error.

  2. Check cancellationharness.is_cancelled() checks the _cancelled flag (set by cancel_thread operation). If cancelled, the thread terminates.

  3. LLM call — If the provider supports streaming, provider.create_streaming_completion() is used with a TranscriptSink that writes token_delta events to the transcript JSONL and appends text to the knowledge markdown in real-time. Otherwise, provider.create_completion() is used. Errors trigger the error classification system and hooks. See Per-Token Streaming.

  4. Track tokens — Input/output tokens and spend from the response are accumulated in the cost dict.

  5. Parse tool calls — Native tool_use blocks are used if the provider supports them. Otherwise, text_tool_parser.extract_tool_calls() parses tool calls from the response text.

  6. No tool calls — If the LLM responds with text only (no tool calls), the thread completes with the raw text as the result. For directives with <outputs>, the LLM should call rye_directive_return directly (registered as a tool in the palette), which provides structured key-value outputs that parent threads can consume programmatically. On the first turn with native tool_use, the runner nudges the model to use tools before accepting a text-only response.

  7. Dispatch each tool call:

    • Resolve the tool name to a dispatch route via tool_primary_map
    • Check permission via harness.check_permission() — denied calls return an error message to the LLM
    • If the tool is rye_directive_return, the runner intercepts the call before dispatch. It validates that all required output fields (declared in <outputs>) are present. If fields are missing, an error is returned to the LLM to retry. If valid, the directive_return hook event fires, and the thread finalizes with structured outputs in the result.
    • Auto-inject parent context for child thread spawns (parent_thread_id, parent_depth, parent_limits, parent_capabilities)
    • Execute via ToolDispatcher
    • Guard result (bound large results, deduplicate, store artifacts)
    • Append result as a tool message
  8. Run after_step hooks — Post-turn hooks evaluate (e.g., cost tracking, logging).

  9. Update cost snapshot — The registry is updated with current cost data (best-effort).

  10. Check context limit — If estimated token usage exceeds the threshold (default 0.9 of context window), trigger handoff_thread to continue in a new thread. The handoff no longer generates a summary — summarization is hook-driven via after_complete hooks declared by the directive.

After-Complete Hook Dispatch

After the turn loop exits and render_knowledge_transcript() runs, the runner dispatches after_complete hooks in the finally block. This is best-effort (wrapped in try/except) — failures do not affect the thread's final status. This enables directives to declare hooks for post-completion actions like summarization.

Thread Storage

Each thread creates a directory at .ai/state/threads/<thread_id>/ containing:

File Purpose
thread.json Signed thread metadata: ID, directive, status, model, cost, limits, capabilities
transcript.jsonl Append-only event log with inline checkpoint signatures
capabilities.md Signed snapshot of tool definitions and capabilities tree available to the thread

Thread transcripts are also exported as signed knowledge entries at .ai/knowledge/agent/threads/{directive}/{thread_id}.md for discoverability via rye search knowledge. The knowledge frontmatter includes a capabilities_ref field pointing to the capabilities file.

The thread registry (registry.db) and budget ledger (budget_ledger.db) are shared SQLite databases at .ai/state/threads/.

Thread Registry

The ThreadRegistry class provides these operations:

Method Purpose
register(thread_id, directive, parent_id) Create thread entry with status created
update_pid(thread_id, pid) Update child process PID after spawn
update_status(thread_id, status) Transition to a new state
get_thread(thread_id) Get full thread record
set_result(thread_id, result) Store final result (JSON serialized)
update_cost_snapshot(thread_id, cost) Update cost columns mid-execution
list_active() List all non-terminal threads
list_children(parent_id) List children of a thread
set_continuation(thread_id, continuation_thread_id) Mark thread as continued
set_chain_info(thread_id, chain_root_id, continuation_of) Set chain metadata
get_chain(thread_id) Get full continuation chain

Hooks Configuration

User and project hooks let you inject context, record learnings, or run directives on every thread — without modifying each directive individually.

Config format

Create .ai/config/agent/hooks.yaml at the project level, or ~/.ai/config/agent/hooks.yaml at the user level:

hooks:
  - id: "inject_project_conventions"
    event: "thread_started"
    action:
      primary: "fetch"
      item_type: "knowledge"
      item_id: "project/conventions"
    description: "Inject project conventions into every thread"

  - id: "inject_api_types"
    event: "thread_started"
    condition:
      path: "directive"
      op: "contains"
      value: "api"
    action:
      primary: "fetch"
      item_type: "knowledge"
      item_id: "project/api-types"
    description: "Inject API types for API-related directives only"

  - id: "record_learnings"
    event: "after_complete"
    condition:
      path: "cost.turns"
      op: "gte"
      value: 3
    action:
      primary: "execute"
      item_type: "directive"
      item_id: "project/record-learnings"
    description: "Record learnings after substantial threads"

Available events

Event When it fires Context available
resolve_extends After directive load, before extends chain is walked (step 4) directive, has_extends, category, inputs, model. Action: set_extends sets the directive's extends target.
thread_started Before first LLM turn (fresh threads) directive, directive_body, model, limits, inputs
thread_continued Before first LLM turn (continuation threads) directive, directive_body, model, limits, previous_thread_id, inputs
after_step After each turn in the LLM loop cost, thread_id
after_complete In the finally block after the loop ends thread_id, cost, project_path
error When an LLM call or tool execution fails error, classification
limit When a limit is exceeded limit_code, current_value, current_max

Hook actions

Actions use the same format as directive hooks:

  • primary: "fetch" — Fetch a knowledge or directive item (for context injection)
  • primary: "execute" — Execute a tool or directive

Conditions

Conditions use the same operators as the condition evaluator: eq, ne, gt, gte, lt, lte, in, contains, regex, exists. Combine with any, all, not for complex logic.

Layer ordering

Lower layers run first. Within a layer, hooks run in definition order. A hook at any layer can use conditions to selectively fire based on context (directive name, cost, model, etc.).

What's Next