id: thread-lifecycle
title: "Thread Lifecycle"
description: How threads are created, executed, and finalized
category: orchestration
tags: [threads, lifecycle, states, registry]
version: "1.1.0"Every thread follows a deterministic lifecycle: generate an ID, register in the registry, load the directive, resolve limits and permissions, run the LLM loop, finalize spend and status.
created ──→ running ──→ completed
├──→ error
├──→ cancelled
└──→ continued
| State | Meaning |
|---|---|
created |
Registered in registry, not yet executing |
running |
LLM loop is active |
completed |
Finished successfully — result available |
error |
Failed — error message in result |
cancelled |
Cancelled via cancel_thread operation |
continued |
Handed off to a new thread (context limit reached) |
Thread IDs are derived from the directive name and a Unix epoch timestamp:
thread_id = f"{directive_name}-{int(time.time())}"
# Example: "agency-kiwi/discover_leads-1739820456"This makes thread IDs human-readable (you can see which directive spawned them) and unique (epoch seconds prevent collisions for typical usage).
For async tool execution (via _launch_async()), thread IDs are UUID v4 strings:
thread_id = str(uuid.uuid4())
# Example: "a1b2c3d4-e5f6-7890-abcd-ef1234567890"Both formats are tracked in the same ThreadRegistry.
thread_directive.execute() runs these steps in order:
The thread discovers its parent through three sources (first match wins):
- Explicit
parent_thread_idparameter (used by handoff/resume) RYE_PARENT_THREAD_IDenvironment variable (set by parent threads)- No parent — this is a root thread
If a parent is declared but its thread.json doesn't exist, execution fails immediately.
The thread is registered in the SQLite registry (registry.db) with status created, its directive name, and parent ID. The registry tracks all threads across the project.
The directive is loaded via DirectiveResolver, searching project → user → system spaces. The markdown/xml parser extracts metadata (limits, permissions, model, inputs) from the XML fence and preserves the raw content for the LLM prompt.
For normal execution, the inputs/validate and inputs/interpolate processors handle input validation and interpolation. For resume/handoff, FetchTool is used instead (no input validation needed since the directive ran before).
After the directive is loaded, resolve_extends hooks fire to allow dynamic routing into extends chains. Hooks can set or override the directive's extends attribute before the extends chain is walked.
Hook context includes:
| Key | Description |
|---|---|
directive |
Directive name |
has_extends |
Whether the directive already declares an extends target |
category |
Directive category |
inputs |
Resolved inputs |
model |
Resolved model |
First matching hook wins. A hook's set_extends action sets the directive's extends target.
The extends chain is walked — each parent directive's context is composed into the current directive. This merges capabilities, hooks, and context from ancestor directives.
For continuation/resume threads, previous thread transcript messages are reconstructed from the prior thread's transcript. For fresh threads this step is a no-op.
Limits are resolved through a layered merge:
defaults (resilience.yaml) → directive metadata → limit_overrides → parent upper bounds
Parent limits cap all values via min(). A child can never exceed its parent's limits. Depth decrements by 1 per level — if the parent has depth: 5, the child gets depth: 4.
If resolved depth is less than 0 (i.e., the parent's depth was already exhausted), the thread returns an error immediately. This prevents infinite recursion.
If the thread has a parent, the orchestrator checks whether the parent has exceeded its spawns limit. If so, the thread returns an error. Otherwise, the parent's spawn count is incremented.
Hooks are merged from five sources and sorted by layer:
| Layer | Source | Config Location | Purpose |
|---|---|---|---|
| 0 | User hooks | ~/.ai/config/agent/hooks.yaml |
Cross-project personal hooks |
| 1 | Directive hooks | Directive XML <hooks> block |
Per-directive hooks |
| 2 | Builtin hooks | System hook_conditions.yaml |
Error/limit/compaction defaults |
| 3 | Project hooks | .ai/config/agent/hooks.yaml |
Project-wide hooks |
| 4 | Infra hooks | System hook_conditions.yaml |
Infrastructure (emitter, checkpoint) |
User and project hooks use the same format as directive hooks — id, event, optional condition, and action. See Hooks Configuration below.
The SafetyHarness is constructed with the resolved limits, merged hooks, directive permissions, and parent capabilities.
After capabilities are resolved and attached to the harness, dynamic tool registration runs. tool_schema_loader.preload_tool_schemas() scans capabilities for rye.execute.tool.* patterns, resolves matching tools across the 3-tier space, extracts CONFIG_SCHEMA and __tool_description__ via AST, and returns structured tool_defs. Each tool gets a flattened API name (e.g., rye_file_system_read) and a _primary field for dispatch routing. Primary actions (rye/fetch, rye/sign) are included as peers. The tool defs are set on harness.available_tools and registered in the LLM's native tool palette. A capabilities tree is also generated for transcript metadata. Token budget is ~2000.
The hierarchical budget ledger handles cost tracking:
- Root threads:
ledger.register(thread_id, max_spend)— creates a top-level budget entry - Child threads:
ledger.reserve(thread_id, spend_limit, parent_thread_id)— atomically reserves budget from the parent's remaining allocation
If the parent has insufficient remaining budget, the reservation fails and the thread returns an error.
The thread metadata file is written to .ai/state/threads/<thread_id>/thread.json:
{
"thread_id": "agency-kiwi/discover_leads-1739820456",
"directive": "agency-kiwi/discover_leads",
"status": "running",
"created_at": "2026-02-17T10:00:56+00:00",
"updated_at": "2026-02-17T10:00:56+00:00",
"model": "claude-3-5-haiku-20241022",
"limits": {
"turns": 10,
"tokens": 200000,
"spend": 0.10,
"depth": 3,
"spawns": 10
},
"capabilities": [
"rye.execute.tool.scraping.gmaps.scrape_gmaps",
"rye.fetch.knowledge.agency-kiwi.*"
]
}The thread.json file is signed using canonical JSON serialization with a _signature field, protecting capabilities and limits from tampering.
RYE_PARENT_THREAD_ID is set to this thread's ID so any child subprocesses (spawned via async) inherit the parent relationship.
- Synchronous (default): Calls
runner.run()directly and blocks until completion - Asynchronous (
async: true): Triggered byexecute directivewithasync: true, which delegates tothread_directiveinternally.spawn_detached()launches a subprocess that re-executesthread_directive.pywith--thread-idand--pre-registeredflags. The child rebuilds all state from scratch. Detached spawning uses thelillux exec spawnRust binary for cross-platform support, with a POSIXsubprocess.Popenfallback. The parent process returns immediately with{"thread_id": "...", "status": "running"}
After the LLM loop completes:
Note: When
directive_returnwas called during the LLM loop, the final result includes anoutputsdict (the structured key-value pairs from the return call) alongside the rawresulttext.
- Report actual spend to the ledger:
ledger.report_actual(thread_id, actual_spend) - Cascade spend to parent:
ledger.cascade_spend(thread_id, parent_thread_id, actual_spend) - Release budget reservation:
ledger.release(thread_id, final_status) - Update registry status:
registry.update_status(thread_id, status) - Store result in registry:
registry.set_result(thread_id, cost) - Write final
thread.jsonwith cost and updated status
runner.run() manages the core conversation loop. There is no system prompt — tools are passed via API tool definitions, and context framing is injected through hooks.
run_hooks_context() takes an explicit event parameter (required, no default) and dispatches hooks matching that event:
- Fresh threads:
run_hooks_context(event="thread_started")firesthread_startedhooks. Context includesdirective_bodyandinputs. Hook context and the user prompt (full directive content) are concatenated into a single user message.
messages = [{"role": "user", "content": f"{hook_context}\n\n{directive_prompt}"}]- Continuation threads (when
resume_messagesis provided):run_hooks_context(event="thread_continued")firesthread_continuedhooks instead. Context is injected near the last user message (not prepended). The context dict also includesprevious_thread_idandinputs, available for interpolation.
Each turn follows this sequence:
-
Check limits —
harness.check_limits(cost)tests turns, tokens, spend, duration. If exceeded, hooks evaluate the limit event. If no hook handles it, the thread terminates with a limit error. -
Check cancellation —
harness.is_cancelled()checks the_cancelledflag (set bycancel_threadoperation). If cancelled, the thread terminates. -
LLM call — If the provider supports streaming,
provider.create_streaming_completion()is used with aTranscriptSinkthat writestoken_deltaevents to the transcript JSONL and appends text to the knowledge markdown in real-time. Otherwise,provider.create_completion()is used. Errors trigger the error classification system and hooks. See Per-Token Streaming. -
Track tokens — Input/output tokens and spend from the response are accumulated in the
costdict. -
Parse tool calls — Native tool_use blocks are used if the provider supports them. Otherwise,
text_tool_parser.extract_tool_calls()parses tool calls from the response text. -
No tool calls — If the LLM responds with text only (no tool calls), the thread completes with the raw text as the result. For directives with
<outputs>, the LLM should callrye_directive_returndirectly (registered as a tool in the palette), which provides structured key-value outputs that parent threads can consume programmatically. On the first turn with native tool_use, the runner nudges the model to use tools before accepting a text-only response. -
Dispatch each tool call:
- Resolve the tool name to a dispatch route via
tool_primary_map - Check permission via
harness.check_permission()— denied calls return an error message to the LLM - If the tool is
rye_directive_return, the runner intercepts the call before dispatch. It validates that all required output fields (declared in<outputs>) are present. If fields are missing, an error is returned to the LLM to retry. If valid, thedirective_returnhook event fires, and the thread finalizes with structuredoutputsin the result. - Auto-inject parent context for child thread spawns (parent_thread_id, parent_depth, parent_limits, parent_capabilities)
- Execute via
ToolDispatcher - Guard result (bound large results, deduplicate, store artifacts)
- Append result as a tool message
- Resolve the tool name to a dispatch route via
-
Run after_step hooks — Post-turn hooks evaluate (e.g., cost tracking, logging).
-
Update cost snapshot — The registry is updated with current cost data (best-effort).
-
Check context limit — If estimated token usage exceeds the threshold (default 0.9 of context window), trigger
handoff_threadto continue in a new thread. The handoff no longer generates a summary — summarization is hook-driven viaafter_completehooks declared by the directive.
After the turn loop exits and render_knowledge_transcript() runs, the runner dispatches after_complete hooks in the finally block. This is best-effort (wrapped in try/except) — failures do not affect the thread's final status. This enables directives to declare hooks for post-completion actions like summarization.
Each thread creates a directory at .ai/state/threads/<thread_id>/ containing:
| File | Purpose |
|---|---|
thread.json |
Signed thread metadata: ID, directive, status, model, cost, limits, capabilities |
transcript.jsonl |
Append-only event log with inline checkpoint signatures |
capabilities.md |
Signed snapshot of tool definitions and capabilities tree available to the thread |
Thread transcripts are also exported as signed knowledge entries at .ai/knowledge/agent/threads/{directive}/{thread_id}.md for discoverability via rye search knowledge. The knowledge frontmatter includes a capabilities_ref field pointing to the capabilities file.
The thread registry (registry.db) and budget ledger (budget_ledger.db) are shared SQLite databases at .ai/state/threads/.
The ThreadRegistry class provides these operations:
| Method | Purpose |
|---|---|
register(thread_id, directive, parent_id) |
Create thread entry with status created |
update_pid(thread_id, pid) |
Update child process PID after spawn |
update_status(thread_id, status) |
Transition to a new state |
get_thread(thread_id) |
Get full thread record |
set_result(thread_id, result) |
Store final result (JSON serialized) |
update_cost_snapshot(thread_id, cost) |
Update cost columns mid-execution |
list_active() |
List all non-terminal threads |
list_children(parent_id) |
List children of a thread |
set_continuation(thread_id, continuation_thread_id) |
Mark thread as continued |
set_chain_info(thread_id, chain_root_id, continuation_of) |
Set chain metadata |
get_chain(thread_id) |
Get full continuation chain |
User and project hooks let you inject context, record learnings, or run directives on every thread — without modifying each directive individually.
Create .ai/config/agent/hooks.yaml at the project level, or ~/.ai/config/agent/hooks.yaml at the user level:
hooks:
- id: "inject_project_conventions"
event: "thread_started"
action:
primary: "fetch"
item_type: "knowledge"
item_id: "project/conventions"
description: "Inject project conventions into every thread"
- id: "inject_api_types"
event: "thread_started"
condition:
path: "directive"
op: "contains"
value: "api"
action:
primary: "fetch"
item_type: "knowledge"
item_id: "project/api-types"
description: "Inject API types for API-related directives only"
- id: "record_learnings"
event: "after_complete"
condition:
path: "cost.turns"
op: "gte"
value: 3
action:
primary: "execute"
item_type: "directive"
item_id: "project/record-learnings"
description: "Record learnings after substantial threads"| Event | When it fires | Context available |
|---|---|---|
resolve_extends |
After directive load, before extends chain is walked (step 4) | directive, has_extends, category, inputs, model. Action: set_extends sets the directive's extends target. |
thread_started |
Before first LLM turn (fresh threads) | directive, directive_body, model, limits, inputs |
thread_continued |
Before first LLM turn (continuation threads) | directive, directive_body, model, limits, previous_thread_id, inputs |
after_step |
After each turn in the LLM loop | cost, thread_id |
after_complete |
In the finally block after the loop ends |
thread_id, cost, project_path |
error |
When an LLM call or tool execution fails | error, classification |
limit |
When a limit is exceeded | limit_code, current_value, current_max |
Actions use the same format as directive hooks:
primary: "fetch"— Fetch a knowledge or directive item (for context injection)primary: "execute"— Execute a tool or directive
Conditions use the same operators as the condition evaluator: eq, ne, gt, gte, lt, lte, in, contains, regex, exists. Combine with any, all, not for complex logic.
Lower layers run first. Within a layer, hooks run in definition order. A hook at any layer can use conditions to selectively fire based on context (directive name, cost, model, etc.).
- Per-Token Streaming — Real-time token streaming to transcript and knowledge files
- Spawning Children — How to spawn, wait, and collect results
- Safety and Limits — How limits resolve and what happens when they're exceeded