This document provides a comprehensive technical overview of TinyIntent's architecture, designed for developers who need to understand how the system works internally.
TinyIntent is a voice-activated AI platform that enables iPhone users to control local AI models through natural language. The system prioritizes local-first operation, security, and modularity.
- 🔒 Local-First: All AI inference happens locally (no cloud calls)
- 📱 Voice-Optimized: Designed specifically for iPhone Shortcuts + Siri
- 🛡️ Security-First: Multi-layer security with sandboxing and audit logging
- 🧩 Modular: Clear separation of concerns with pluggable components
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ iPhone │ │ Bridge │ │ Router │ │ Helpers │
│ (Shortcuts)│ │ (FastAPI) │ │ (CoreML) │ │ (Sandboxed) │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
│ │ │ │
│ HTTP POST │ Intent │ Schema │
│ /shortcut/route │ Classification │ Validation │
▼ ▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────────────┐
│ Request Flow │
│ │
│ Voice → Shortcut → Bridge → Security → Router → Helper → Response │
│ │ │ │ │ │ │ │ │
│ │ │ │ │ │ │ └─ JSON │
│ │ │ │ │ │ └─ Sandbox Execution │
│ │ │ │ │ └─ gen|act Classification │
│ │ │ │ └─ Auth + Validation + Rate Limiting │
│ │ │ └─ FastAPI + Audit Logging │
│ │ └─ HTTP Request │
│ └─ Dictated Text │
└─────────────────────────────────────────────────────────────────────────┘
Purpose: User interface and configuration management
tinyintent/
├── cli.py # Main CLI entry point
├── config.py # Configuration management with Pydantic
├── credentials.py # Secure credential generation and storage
├── simple_server.py # Fallback server for basic functionality
└── health.py # System health checksKey Features:
- Auto-configuration: Generates secure credentials automatically
- Environment management: Handles all environment variable setup
- Graceful fallback: Falls back to simple server if full bridge unavailable
- Developer experience:
--quiet,--verbose,--reloadmodes
Purpose: Core FastAPI service handling all requests
bridge/
├── tinyrpc.py # Main FastAPI application
├── routes/ # Modular API endpoints
│ ├── shortcut.py # iPhone Shortcut integration (M11.0)
│ ├── health.py # Health check endpoints
│ ├── helpers.py # Helper management APIs
│ ├── agents.py # Agent lifecycle management
│ └── system.py # System administration
├── security.py # Multi-layer security framework
├── validation.py # Input validation and sanitization
├── sandbox_security.py # Sandboxing security enforcement
├── secret_manager.py # Scoped secret management
├── gen_client.py # Ollama client with circuit breaker
├── provenance.py # Agent signing and tamper detection
└── logs/ # Audit logging systemSecurity Architecture:
┌─────────────────────────────────────────────────────────┐
│ Security Layers │
├─────────────────────────────────────────────────────────┤
│ 1. Authentication: X-TinyIntent-Secret + timing attack │
│ prevention with constant-time comparison │
├─────────────────────────────────────────────────────────┤
│ 2. Authorization: CSRF protection for state-changing │
│ operations with token validation │
├─────────────────────────────────────────────────────────┤
│ 3. Input Validation: Script injection detection, │
│ schema validation, length limits │
├─────────────────────────────────────────────────────────┤
│ 4. Rate Limiting: Per-session and global rate limits │
│ with rolling windows │
├─────────────────────────────────────────────────────────┤
│ 5. Sandboxing: Isolated execution with CPU/memory/ │
│ filesystem/network restrictions │
├─────────────────────────────────────────────────────────┤
│ 6. Audit Logging: Comprehensive event logging with │
│ integrity chains and tamper detection │
└─────────────────────────────────────────────────────────┘
Purpose: Intent classification using local CoreML models
router/
├── SmallIntent.mlmodel # CoreML intent classifier (70-85MB)
├── train_router.swift # Model training with CreateML
├── eval_router.swift # Model evaluation and metrics
├── data/ # Training datasets
│ ├── intents.tsv # Intent classification data
│ └── eval_results.json # Evaluation metrics
└── train_summary.json # Training results and performanceClassification Flow:
User Input: "Show me recent error logs"
↓
CoreML Model Processing (SmallIntent.mlmodel)
↓
Intent Classification: { route: "act", confidence: 0.94 }
↓
Route Decision: Execute helper for log analysis
Performance Characteristics:
- Inference Speed: <10ms on Apple Silicon (Neural Engine optimized)
- Model Size: ~70-85MB (INT8 quantized CoreML)
- Accuracy: >90% on validation set
- Fallback: Heuristic classification if model unavailable
Purpose: Sandboxed execution of specific tasks
helpers/
├── registry.py # Helper discovery and validation
├── executor.py # Sandboxed execution engine
├── manifest.py # Helper metadata management
├── sdk.py # Helper development SDK
├── sandbox.py # Sandboxing implementation
├── bot_guard/ # Crypto trading helper
│ ├── main.js # Node.js implementation
│ ├── helper.yaml # Helper metadata
│ ├── input.schema.json
│ └── output.schema.json
└── log_tailer/ # System log analysis helper
├── main.py # Python implementation
├── helper.yaml # Helper metadata
├── input.schema.json
└── output.schema.jsonSandboxing Architecture:
┌─────────────────────────────────────────────────────────┐
│ Helper Sandbox │
├─────────────────────────────────────────────────────────┤
│ Resource Limits: │
│ • CPU: 10 seconds max execution time │
│ • Memory: 256MB max memory usage │
│ • File Descriptors: 50 max open files │
│ • Network: Disabled by default │
├─────────────────────────────────────────────────────────┤
│ Filesystem Isolation: │
│ • Isolated working directory per execution │
│ • No access to sensitive system paths │
│ • Temporary file cleanup after execution │
├─────────────────────────────────────────────────────────┤
│ Process Isolation: │
│ • Separate process per helper execution │
│ • Process tree cleanup on timeout │
│ • Signal handling for graceful termination │
└─────────────────────────────────────────────────────────┘
Purpose: Request logging, training data collection, and agent staging
data/episodes/
├── episodes.py # Episode logging and storage
├── schema.py # Data schemas and validation
├── staging.db # SQLite database for agent staging
└── logger.py # Async logging implementationData Flow:
Request → Bridge → Episode Logger → SQLite/JSON → Training Data
↓ ↓ ↓ ↓ ↓
Voice FastAPI Async Logger Structured Model Training
Input Handler Background Storage Data Pipeline
- Token-based authentication for iPhone Shortcuts
- API secret authentication for advanced operations
- Constant-time comparison to prevent timing attacks
- Rate limiting with per-session and global limits
- Comprehensive input validation with schema enforcement
- Script injection detection using pattern matching
- Length limits to prevent buffer overflow attacks
- Type validation for all parameters
- Process sandboxing with isolated working directories
- Resource limits (CPU, memory, file descriptors)
- Network isolation (disabled by default for helpers)
- Capability-based access control for helper permissions
- Scoped secret management with encryption
- Audit logging with integrity chains
- Provenance tracking with cryptographic signing
- Emergency kill switches for immediate execution shutdown
Request → Auth Check → Input Validation → Rate Limit Check →
Helper Execution → Audit Logging → Response
↓ ↓ ↓ ↓ ↓
403 Forbidden 400 Bad 429 Too Many Security Tamper
if invalid Request if Requests if Violation Detection
credentials malicious rate limited Logging Alerts
Endpoint: POST /shortcut/route
Authentication: X-Shortcut-Token header
Request Flow:
// iPhone Shortcuts Request
{
"text": "Show me recent error logs",
"session_id": "optional-session-id",
"mode": "preview|execute",
"return_format": "text|json"
}
// TinyIntent Response
{
"speak": "Found 3 recent error logs from the last hour...",
"truncated": false,
"data": {
"status": "success",
"logs": [...],
"timestamp": "2024-01-15T10:30:00Z"
}
}Voice Optimization:
- Response length limits: Truncate at 280 characters for TTS
- Clear error messages: User-friendly error responses
- Session tracking: Optional session continuity
- Format flexibility: Text for voice, JSON for advanced shortcuts
Training Data → CreateML → CoreML → Deployment
↓ ↓ ↓ ↓
intents.tsv Swift Model Neural Engine
(episodes) Training Export Inference
Model Characteristics:
- Input: Text sequences up to 128 tokens
- Output:
gen(generative) oract(action) classification - Architecture: Transformer-based text classifier
- Optimization: INT8 quantization for Neural Engine
- Evaluation: Precision, recall, F1 score tracking
User Input → Router → Ollama → Response Formatting → Voice Output
↓ ↓ ↓ ↓ ↓
"What is "gen" llama3.1 "The weather iPhone TTS
weather?" 8B model today is..." Playback
Components:
- Circuit Breaker: Prevents cascade failures from Ollama
- Async Client: Non-blocking generation with timeouts
- Retry Logic: Exponential backoff for transient failures
- Task Management: Cancellation support for long requests
1. iPhone Shortcuts
├─ Voice Input: "Show me system logs"
├─ HTTP POST: /shortcut/route
└─ Headers: X-Shortcut-Token
2. Bridge Authentication
├─ Token Validation (constant-time)
├─ Rate Limit Check
└─ CSRF Protection
3. Input Processing
├─ JSON Parsing
├─ Schema Validation
├─ Input Sanitization
└─ Length Limit Check
4. Intent Classification
├─ CoreML Model Inference
├─ Confidence Threshold Check
└─ Route Decision: gen|act
5. Helper Execution (if act)
├─ Helper Selection: log_tailer
├─ Sandbox Creation
├─ Resource Limit Setup
├─ Schema Validation
├─ Process Execution
└─ Output Capture
6. Response Generation
├─ Output Formatting
├─ Voice Optimization
├─ Error Handling
└─ JSON Response
7. Audit Logging
├─ Request/Response Logging
├─ Security Event Logging
├─ Performance Metrics
└─ Error Tracking
8. iPhone Response
├─ JSON Parsing
├─ Text Extraction
└─ TTS Playback
Error Type → Detection → Response → Recovery
↓ ↓ ↓ ↓
Auth Failure → Token → 401 → Re-authenticate
Validation Error
Input Error → Schema → 400 → Fix input format
Validation Error
Rate Limit → Counter → 429 → Wait and retry
Check Error
Helper Error→ Execution → 500 → Fallback response
Failure Error
System Error→ Health → 503 → Service restart
Check Error
| Component | Target | Actual | Notes |
|---|---|---|---|
| Router Inference | <10ms | ~5ms | Neural Engine optimized |
| Helper Execution | <5s | 1-3s | Sandboxed process startup |
| Total Request | <10s | 3-7s | End-to-end including voice |
| Authentication | <1ms | <1ms | Constant-time comparison |
| Metric | Limit | Monitoring |
|---|---|---|
| Requests/session/min | 60 | Rate limiter |
| Global requests/min | 500 | Circuit breaker |
| Helper executions/min | 10 | Resource manager |
| Concurrent helpers | 4 | Process pool |
| Component | CPU | Memory | Disk |
|---|---|---|---|
| Bridge (FastAPI) | 5-15% | 50-100MB | Minimal |
| Router (CoreML) | 1-3% | 200MB | 70MB model |
| Helper (sandbox) | 10-30% | 50-256MB | Isolated dirs |
| Total System | 20-50% | 300-600MB | <1GB |
1. Environment Variables (highest priority)
├─ TINYINTENT_SECRET
├─ SHORTCUT_TOKEN
└─ TINYINTENT_LOG_LEVEL
2. Configuration Files
├─ ~/.tinyintent/credentials.json
├─ models.yaml
└─ helpers/registry.yaml
3. Defaults (lowest priority)
├─ Development-friendly defaults
├─ Auto-generated credentials
└─ Fallback configurations
# Security settings with validation
class SecuritySettings(BaseSettings):
secret: str = Field(..., min_length=32) # Required API secret
allow_dev_local: bool = Field(True) # Development bypass
rate_limit_session: int = Field(60) # Per-session limit
rate_limit_global: int = Field(500) # Global limit
csrf_token_expiry: int = Field(3600) # CSRF token TTLDeveloper Machine:
├─ tinyintent (CLI)
├─ Bridge (FastAPI) on localhost:8787
├─ Ollama on localhost:11434
├─ iPhone on same WiFi network
└─ Direct HTTP communication
Mac Server:
├─ tinyintent as launchd service
├─ Bridge with production secrets
├─ Ollama with optimized models
├─ Tailscale for secure networking
└─ iPhone over encrypted tunnel
- Single-node design: Optimized for personal/small team use
- Local-first: No external dependencies required
- Resource-aware: Designed for standard Mac hardware
- Network-efficient: Minimal bandwidth usage
This architecture provides a robust, secure, and maintainable foundation for voice-activated AI while maintaining simplicity and developer-friendliness.