This guide covers deployment options for TinyIntent in various environments.
- Python 3.11+
- Docker (for containerized deployment)
- Ollama server
- At least 4GB RAM and 2GB free disk space
- CoreML model artifacts (generated via
make learn- see Model Generation section)
-
Copy the example environment file:
cp .env.example .env
-
Edit
.envwith your configuration:- Set a strong
TINYINTENT_SECRET(minimum 32 characters) - Configure Ollama URL if not using localhost
- Set
TINYINTENT_EXECUTION_ENABLED=1to enable helper execution - Adjust rate limits and resource constraints as needed
- Set a strong
# Generate both SmallIntent.mlmodel and TinyIntent.mlmodel
make learn
# Verify models were created
make doctor
# Expected output:
# ✅ SmallIntent.mlmodel present (macOS optimized, ≤52MB)
# ✅ TinyIntent.mlmodel present (mobile optimized, ≤10MB)The following files must exist in router/ before starting the bridge:
router/SmallIntent.mlmodel- Main router model for macOS (ANE accelerated)router/TinyIntent.mlmodel- Compact version for mobile/constrained environmentsrouter/train_summary.json- Training metadata and performance metrics
If these files are missing, the bridge will fail to start with model loading errors.
# 1. Install dependencies
pip install -r requirements.txt
# 2. Generate models (this will take several minutes)
make learn
# 3. Verify system readiness
make doctor
# 4. Start the bridge service
make bridgesrv-
Install dependencies:
pip install -r requirements.txt
-
Start the service:
make bridgesrv
or
python -m tinyintent.bridge.tinyrpc
-
Verify deployment:
curl http://localhost:8787/healthz
-
Set environment variables:
export TINYINTENT_SECRET="your-production-secret-key-here" export GRAFANA_PASSWORD="your-grafana-password"
-
Start all services:
docker-compose up -d
-
Check service health:
docker-compose ps curl http://localhost:8787/healthz
-
Create service user:
sudo useradd --system --create-home --shell /bin/false tinyintent
-
Install application:
sudo -u tinyintent git clone https://github.com/tinyintent/tinyintent.git /home/tinyintent/app cd /home/tinyintent/app sudo -u tinyintent python -m venv venv sudo -u tinyintent ./venv/bin/pip install -r requirements.txt -
Create systemd service file
/etc/systemd/system/tinyintent.service:[Unit] Description=TinyIntent Bridge Service After=network.target Requires=network.target [Service] Type=exec User=tinyintent Group=tinyintent WorkingDirectory=/home/tinyintent/app Environment=PATH=/home/tinyintent/app/venv/bin EnvironmentFile=/home/tinyintent/app/.env ExecStart=/home/tinyintent/app/venv/bin/python -m tinyintent.bridge.tinyrpc ExecReload=/bin/kill -HUP $MAINPID Restart=always RestartSec=5 # Security settings NoNewPrivileges=true PrivateTmp=true ProtectSystem=strict ProtectHome=true ReadWritePaths=/home/tinyintent/app/data /home/tinyintent/app/logs [Install] WantedBy=multi-user.target
-
Enable and start service:
sudo systemctl daemon-reload sudo systemctl enable tinyintent sudo systemctl start tinyintent
-
Install Python dependencies:
pip install -r requirements.txt
-
Create production directories:
mkdir -p data/episodes logs backups
-
Set environment variables in
.envfile -
Start with production settings:
TINYINTENT_ENVIRONMENT=production \ TINYINTENT_WORKERS=4 \ TINYINTENT_BIND=0.0.0.0 \ python -m tinyintent.bridge.tinyrpc
- Use a strong, randomly generated
TINYINTENT_SECRET - Disable
ALLOW_DEV_LOCALin production - Use HTTPS in production (reverse proxy recommended)
- Bind to localhost (
127.0.0.1) if using a reverse proxy - Use firewall rules to restrict access
- Consider VPN for remote access
- Start with
EXECUTION_ENABLED=0until security review is complete - Review all helpers before enabling execution
- Set appropriate resource limits for helpers
- Monitor helper execution logs
- Run as non-root user
- Restrict file system access using sandboxing
- Regular backup of audit logs and episode data
The TinyIntent router uses locally trained CoreML models for intent classification. Generate them using:
# Full automated learning pipeline
make learn
# Individual steps
make router-train # Train PyTorch → ONNX → CoreML conversion
make router-eval # Evaluate model performance
make promote # Promote if evaluation passes criteriaThis produces versioned CoreML artifacts:
router/SmallIntent.mlmodel- macOS optimized (~70-85MB INT8 CoreML)router/TinyIntent.mlmodel- Mobile optimized (≤5MB)router/train_summary.json- Training metadata and metrics
Check training and model status via API:
# Get comprehensive training summary
curl -H "X-TinyIntent-Secret: $TINYINTENT_SECRET" \
http://localhost:8787/router/train_summary
# Example response includes:
# - Training metrics (accuracy, precision/recall, F1)
# - Model artifact status and file sizes
# - Calibration statistics and confidence curves
# - Deployment readiness indicators
# - Sample predictions with confidence scoresThe training pipeline includes automatic validation:
- Size constraints: SmallIntent 50-100MB (target 70-85MB), TinyIntent ≤5MB
- Performance gates: Minimum accuracy, F1 score thresholds
- Calibration quality: ECE (Expected Calibration Error) limits
- Latency requirements: P95 inference time under 50ms
Before enabling the router in production:
-
Verify model artifacts exist:
ls -la router/*.mlmodel # Should show SmallIntent.mlmodel and TinyIntent.mlmodel
-
Check model performance:
curl -s -H "X-TinyIntent-Secret: $SECRET" \ http://localhost:8787/router/train_summary | \ jq '.evaluation_results.promotion_eligible'
-
Validate model sizes meet constraints:
curl -s -H "X-TinyIntent-Secret: $SECRET" \ http://localhost:8787/router/train_summary | \ jq '.current_models'
- Training data is stored in
router/data/intents.tsv - Episode data can be exported using
make export-episodes - Models are retrained automatically via
make learn - Training metadata includes unique training IDs for versioning
- GET
/healthz- Basic health check - GET
/readyz- Readiness check - GET
/doctor- Comprehensive system health - GET
/router/train_summary- Training status and model artifacts
- Audit logs rotate automatically
- Monitor disk space for episode data
- Set up log retention policies
# Backup episode data
make backup-data
# Manual backup
tar -czf backup-$(date +%Y%m%d).tar.gz data/ logs/- Stop the service
- Backup current installation
- Pull latest changes
- Update dependencies
- Run database migrations if needed
- Restart service
- Verify health
- Check environment variables
- Verify Ollama connectivity
- Check file permissions
- Review logs for errors
- Monitor resource usage
- Check helper execution patterns
- Review rate limiting settings
- Consider scaling with multiple workers
- Check helper manifests
- Verify sandbox constraints
- Review capability permissions
- Check environment variables
| Variable | Default | Description |
|---|---|---|
TINYINTENT_SECRET |
Required | Authentication secret |
TINYINTENT_ENVIRONMENT |
development | Environment mode |
TINYINTENT_BIND |
127.0.0.1 | Server bind address |
TINYINTENT_PORT |
8787 | Server port |
TINYINTENT_EXECUTION_ENABLED |
0 | Enable helper execution |
HELPER_RATE_LIMIT |
10 | Helper executions per minute |
MODEL_OLLAMA_URL |
http://localhost:11434 | Ollama server URL |
See .env.example for complete configuration options.
For deployment issues:
- Check the troubleshooting section
- Review application logs
- Run system health checks
- Consult the project documentation