Reading time: 20 minutes | Skill level: Intermediate | Last updated: 2026-01-22
Comprehensive guide for optimizing cost, performance, and security across multi-provider AI development workflows.
- Strategic Model Selection
- Daily Development Workflows
- Team Best Practices
- Performance Optimization
- Security & Privacy Patterns
- Cost Management
- Common Anti-Patterns
Choose your provider and model based on three dimensions: cost, quality, and privacy requirements.
| Scenario | Recommended Provider | Reason |
|---|---|---|
| Daily feature development | ccc (Copilot) |
Uses quota (1x), fast, balanced quality |
| Code review / critical analysis | ccd (Anthropic) or ccc-opus |
Best quality for security-critical decisions |
| Proprietary/sensitive code | cco (Ollama) |
100% local, no data leaves machine |
| Rapid prototyping | ccc-gpt (Copilot) |
0x multiplier = doesn't consume quota |
| Learning/experimentation | ccc (Copilot) |
Uses quota, good for exploration |
| Production deployment review | ccd --model opus (Anthropic) |
Official API, no ToS risk |
| Offline/air-gapped development | cco (Ollama) |
Only option without internet |
When using GitHub Copilot (ccc), choose models based on task complexity:
| Task Type | Model | Command | Quality | Speed | Use Case |
|---|---|---|---|---|---|
| Quick questions | Haiku 4.5 | ccc-haiku |
⭐⭐⭐ | ⚡⚡⚡ | "What does this function do?" |
| Feature implementation | Sonnet 4.5 | ccc or ccc-sonnet |
⭐⭐⭐⭐ | ⚡⚡ | Daily development (default) |
| Complex architecture | Opus 4.5 | ccc-opus |
⭐⭐⭐⭐⭐ | ⚡ | System design, critical code paths |
| Alternative perspective | GPT-4.1 | ccc-gpt |
⭐⭐⭐⭐ | ⚡⚡ | When Claude approach isn't working |
All free with Copilot Pro+ subscription ($10/month).
Use Copilot for 80-90% of work, reserve Anthropic Direct for critical 10-20%.
# Morning: Feature development (free)
ccc
> Implement JWT authentication middleware
# Afternoon: Continue development (free)
ccc
> Add rate limiting to API endpoints
# Before commit: Final review (paid, high quality)
ccd --model opus
> Security audit this authentication implementationCost savings: $40-80/month vs Anthropic-only workflow.
Start with faster/cheaper models, escalate only when needed.
# Level 1: Quick prototype (Haiku)
ccc-haiku
> Sketch out user profile component structure
# Level 2: Full implementation (Sonnet)
ccc-sonnet
> Implement complete user profile with validation
# Level 3: Critical review (Opus)
ccc-opus
> Review for security vulnerabilities and edge casesCost savings: Avoid using Opus for tasks Haiku can handle.
For critical decisions, validate across models before using expensive Anthropic.
# Free validation round
ccc-sonnet # Claude perspective
> Evaluate this caching strategy
ccc-gpt # GPT perspective
> Evaluate this caching strategy
# Only if disagreement warrants it:
ccd --model opus
> Final arbitration on caching approachCost savings: Often free models converge on the right answer, eliminating need for paid validation.
Use faster models (ccc-haiku, ccc-sonnet) when:
- Exploring unfamiliar codebase
- Rapid prototyping / spike solutions
- Refactoring with clear specifications
- Answering straightforward questions
- Iterating on feedback quickly
Do:
ccc-haiku
> What files handle user authentication?Don't:
ccc-opus # Overkill for simple questions
> What files handle user authentication?Use best models (ccc-opus, ccd --model opus) when:
- Security-critical code paths
- Production deployment reviews
- Complex architectural decisions
- Debugging subtle race conditions
- Compliance-sensitive implementations
Do:
ccd --model opus
> Analyze this payment processing flow for security vulnerabilitiesDon't:
ccc-haiku # Insufficient for security analysis
> Analyze this payment processing flow for security vulnerabilitiesGoal: Understand codebase, plan features, quick wins.
Recommended: Copilot Haiku or Sonnet (fast, free)
# Start session
ccc-haiku
# Typical morning tasks
> Explore project structure and identify auth components
> Generate test cases for new feature
> Refactor duplicated validation logic
> Answer team's code review questionsWhy this works: Morning tasks are exploratory with rapid iterations. Haiku's speed keeps flow state intact.
Goal: Build features, write tests, integrate components.
Recommended: Copilot Sonnet (balanced)
# Continue or new session
ccc-sonnet
# Typical afternoon tasks
> Implement user profile API endpoints with validation
> Write integration tests for authentication flow
> Debug failing test cases
> Optimize database queriesWhy this works: Sonnet balances quality and speed for heads-down implementation work.
Goal: Code review, security checks, documentation.
Recommended: Copilot Opus or Anthropic Direct (quality focus)
# High-quality review session
ccc-opus
# Typical evening tasks
> Review today's code for security issues
> Analyze edge cases in error handling
> Generate comprehensive API documentation
> Architectural review of new microserviceWhy this works: End-of-day reviews benefit from highest quality model, worth the slower speed.
# Start with Sonnet
ccc-sonnet
> Debug why authentication fails for federated users
# If stuck after 15 minutes, escalate
ccc-opus
> Deep analysis: authentication flow for federated vs local users
# If still stuck, alternative perspective
ccc-gpt
> Review authentication logic from different angle# Planning phase
ccc-sonnet
> Design user notification system architecture
# Implementation phase (continue same session)
> Implement notification service with queue integration
> Add rate limiting and retry logic
> Write unit and integration tests
# Review phase (switch to Opus)
/exit
ccc-opus
> Security and performance review of notification system# Fast model sufficient for clear refactoring
ccc-haiku
> Extract payment processing into separate module
> Rename ambiguous variables in auth service
> Remove deprecated API endpoints
> Update function signatures for TypeScript strict modeRun multiple terminals with different providers for complex problems:
# Terminal 1: Primary investigation (free)
ccc-sonnet
> Analyze performance bottleneck in API response times
# Terminal 2: Alternative approach (free)
ccc-gpt
> Suggest database query optimizations
# Terminal 3: Private code analysis (offline)
cco
> Review proprietary algorithm performance characteristicsPass code through quality gates with increasing rigor:
# Gate 1: Implementation (Copilot Sonnet)
ccc-sonnet
> Implement feature X
# Gate 2: Self-review (Copilot Opus)
ccc-opus
> Review implementation for correctness
# Gate 3: Security audit (Anthropic Direct)
ccd --model opus
> Security analysis before production deploymentCost optimization: Most code fails Gate 1 or 2, avoiding expensive Gate 3 review.
Standardized onboarding reduces configuration variability:
Day 1: Installation
# 1. Verify prerequisites
which claude
which jq
which nc
# 2. Install cc-copilot-bridge
curl -fsSL https://raw.githubusercontent.com/FlorianBruniaux/cc-copilot-bridge/main/install.sh | bash
source ~/.zshrc
# 3. Verify installation
ccs # Check all provider statusDay 2: Provider configuration
# Configure Anthropic (optional but recommended for critical work)
export ANTHROPIC_API_KEY="<YOUR_API_KEY>"
# Configure Copilot (primary provider)
npm install -g copilot-api
copilot-api start
# Follow authentication flow
# Verify both work
ccd # Test Anthropic
ccc # Test CopilotDay 3: Optional Ollama setup
# For sensitive code work
brew install ollama
ollama serve &
ollama pull qwen2.5-coder:7b # Start with small model
cco # Test OllamaStandardize aliases across team in shared .bash_aliases.team:
# === cc-copilot-bridge Team Standards ===
# Primary providers
alias ccd='claude-switch direct'
alias ccc='claude-switch copilot'
alias cco='claude-switch ollama'
alias ccs='claude-switch status'
# Copilot models (team conventions)
alias cc='ccc-sonnet' # Default for daily work
alias cc-review='ccc-opus' # Code reviews
alias cc-fast='ccc-haiku' # Quick questions
alias cc-alt='ccc-gpt' # Alternative perspective
# Production quality gate (Anthropic)
alias cc-prod='ccd --model opus' # Production reviews only
# Ollama for sensitive work
alias cc-private='cco' # Proprietary codeTeam convention: Everyone uses cc for daily work, escalates to cc-review or cc-prod as needed.
Establish when to use each model tier:
Level 1: Peer Review (Sonnet)
- Standard pull requests
- Non-critical bug fixes
- Documentation changes
- Test additions
Level 2: Senior Review (Opus via Copilot)
- Security-sensitive changes
- API contract modifications
- Database schema migrations
- Critical bug fixes
Level 3: Production Gate (Anthropic Direct)
- Production hotfixes
- Infrastructure changes
- Security patches
- Compliance-critical code
Implementation:
# In .github/pull_request_template.md
## AI Review Checklist
- [ ] Level 1 review completed (`cc-review`)
- [ ] Level 2 review for security-sensitive changes (`cc-review`)
- [ ] Level 3 review for production deployments (`cc-prod`)Enable team-wide session tracking:
# Shared team function in .bash_aliases.team
cc-log-summary() {
echo "=== Your AI Usage Summary ==="
echo ""
echo "Copilot sessions:"
grep "mode=copilot" ~/.claude/claude-switch.log | wc -l
echo ""
echo "Anthropic sessions:"
grep "mode=direct" ~/.claude/claude-switch.log | wc -l
echo ""
echo "Model breakdown:"
grep "mode=copilot:" ~/.claude/claude-switch.log | cut -d':' -f4 | sort | uniq -c
}
# Weekly review
cc-log-summaryTeam lead review: Monthly analysis to optimize model selection patterns.
Recommended budget split for 5-person team:
| Provider | Monthly Budget | Usage Pattern |
|---|---|---|
| GitHub Copilot | $50 (5 × $10) | 80-90% of all AI interactions |
| Anthropic Direct | $100-200 total | Critical reviews, production gates |
| Ollama | $0 (infrastructure only) | Sensitive code, offline work |
Total: $150-250/month vs $500-1000+ for Anthropic-only team.
Share monthly cost analysis:
#!/bin/bash
# monthly-ai-report.sh
echo "=== Monthly AI Cost Report ==="
echo "Period: $(date +'%B %Y')"
echo ""
# Copilot usage
total_sessions=$(grep "mode=copilot" ~/.claude/claude-switch.log | wc -l)
echo "Copilot Sessions: ${total_sessions}"
echo " Cost: $10.00 (flat rate)"
echo ""
# Anthropic usage (estimate from logs)
anthropic_sessions=$(grep "mode=direct" ~/.claude/claude-switch.log | wc -l)
echo "Anthropic Sessions: ${anthropic_sessions}"
echo " Estimated cost: Check billing dashboard"
echo ""
# Model distribution
echo "Model usage breakdown:"
grep "mode=copilot:" ~/.claude/claude-switch.log | cut -d':' -f4 | sort | uniq -c | sort -rnTeam lead action: If Anthropic costs exceed $50/person/month, review escalation patterns.
Train team to recognize when escalation is worth the cost:
Escalate to Opus/Anthropic when:
- Security vulnerability analysis
- Production incident post-mortems
- Architectural design decisions
- Compliance code review
Don't escalate for:
- Syntax questions
- Simple refactoring
- Test writing
- Documentation generation
- Code exploration
Example training exercise:
# Good escalation
ccc-sonnet
> Initial implementation of OAuth flow
ccd --model opus # Escalate for security
> Security audit of OAuth implementation
# Bad escalation (waste of money)
ccd --model opus # Unnecessary
> What does this forEach loop do?Standardize MCP server configurations:
# Shared team repo: .claude-config/
├── mcp-profiles/
│ ├── excludes.yaml # Team-agreed exclusions
│ ├── generate.sh # Profile generator
│ └── prompts/
│ ├── gpt-4.1-team.txt # Team conventions for GPT
│ └── gemini-team.txt # Team conventions for GeminiSetup script for new members:
#!/bin/bash
# setup-team-mcp.sh
# Copy team MCP config
cp -r .claude-config/mcp-profiles ~/.claude/
# Generate profiles
~/.claude/mcp-profiles/generate.sh
echo "Team MCP configuration installed"Keep team aliases in sync with version control:
# In team repo: config/bash_aliases.team
# Team members source this in their ~/.zshrc
if [ -f ~/workspace/team-repo/config/bash_aliases.team ]; then
source ~/workspace/team-repo/config/bash_aliases.team
fiBenefits:
- Consistent commands across team
- Easy updates via git pull
- Onboarding documentation matches reality
- Match tool to task: Don't use Opus for tasks Haiku can handle
- Minimize context switching: Stay in one session for related tasks
- Batch similar operations: Group file operations, code reviews, etc.
- Monitor response times: If consistently slow, investigate provider health
# Check API latency
time ccd --model haiku << EOF
> Calculate 1+1
> /exit
EOF
# Expected: 1-3 seconds for HaikuIf slow (>5 seconds):
- Check internet connection:
ping api.anthropic.com - Verify API key validity:
echo $ANTHROPIC_API_KEY - Review Anthropic status: https://status.anthropic.com
Anthropic charges per token (input + output):
Minimize input tokens:
# Bad: Verbose context
ccd
> I have this really long file with lots of code and I want you to analyze
every single line and tell me everything about it in great detail...
# Good: Concise requests
ccd
> Security review: auth.js lines 45-67Minimize output tokens:
# Bad: Open-ended responses
ccd
> Explain everything about React hooks
# Good: Specific questions
ccd
> When should I use useEffect vs useLayoutEffect?Monitor usage:
# Check Anthropic dashboard monthly
# Target: <$20/month for individual developerCopilot performance depends on copilot-api being healthy:
# Check if running
nc -zv localhost 4141
# Check logs for errors
copilot-api logs
# Restart if unhealthy
copilot-api restartAuto-restart on Mac (recommended):
# Create ~/Library/LaunchAgents/com.copilot-api.plist
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>com.copilot-api</string>
<key>ProgramArguments</key>
<array>
<string>/usr/local/bin/copilot-api</string>
<string>start</string>
</array>
<key>RunAtLoad</key>
<true/>
<key>KeepAlive</key>
<true/>
</dict>
</plist>Load:
launchctl load ~/Library/LaunchAgents/com.copilot-api.plistResponse time comparison (typical):
| Model | Latency | Use When |
|---|---|---|
| Haiku 4.5 | 0.5-1s | Need immediate feedback |
| Sonnet 4.5 | 1-2s | Balanced work |
| Opus 4.5 | 3-5s | Quality over speed |
| GPT-4.1 | 1-2s | Alternative perspective |
| GPT-5 | 2-4s | Advanced reasoning |
Speed optimization workflow:
# Quick iteration loop (Haiku)
ccc-haiku
> Try approach A
> Try approach B
> Try approach C
# Once approach settled, switch to Sonnet for quality
/exit
ccc-sonnet
> Implement chosen approach with production qualityMCP servers add latency to each request:
# Minimal MCP profile for speed
# ~/.claude/mcp-profiles/generated/minimal.json
{
"mcpServers": {
"bash": { ... }, # Essential
"filesystem": { ... } # Essential
# Disable: grepai, playwright, context7, etc.
}
}
# Use for speed-critical sessions
COPILOT_MODEL=claude-haiku-4.5 claude --mcp-config ~/.claude/mcp-profiles/generated/minimal.jsonLatency per MCP server: ~50-200ms per server initialization.
Problem: Claude Code sends ~60K context, Ollama defaults to 8K → 87% truncation → constant reprocessing.
Solution: Increase context length:
# Set 32K context (recommended for Claude Code)
launchctl setenv OLLAMA_CONTEXT_LENGTH 32768
brew services restart ollama
# Verify
echo $OLLAMA_CONTEXT_LENGTHTrade-off: Higher context = slower inference but correct behavior.
Full optimization guide: OPTIMISATION-M4-PRO.md
Quick wins:
# Enable flash attention (2-4x speed boost)
launchctl setenv OLLAMA_FLASH_ATTENTION 1
# Increase parallel processing
launchctl setenv OLLAMA_NUM_PARALLEL 4
# Restart Ollama
brew services restart ollamaExpected performance (M4 Pro 48GB, qwen2.5-coder:32b):
- Without optimization: 8-12 tok/s
- With optimization: 26-39 tok/s
- Improvement: 3-4x faster
Match model size to available RAM:
| Hardware | Recommended Model | RAM Usage | Performance |
|---|---|---|---|
| M1 16GB | qwen2.5-coder:7b | 6-8 GB | 15-25 tok/s |
| M2/M3 24GB | qwen2.5-coder:14b | 12-14 GB | 20-30 tok/s |
| M4 Pro 48GB | qwen2.5-coder:32b | 26 GB | 26-39 tok/s |
Don't:
# M1 16GB trying to run 32B model
OLLAMA_MODEL=qwen2.5-coder:32b cco # Will crash or swap heavilyDo:
# M1 16GB with appropriate model
OLLAMA_MODEL=qwen2.5-coder:7b cco # Smooth performanceModel quantization affects quality and speed:
| Quantization | Size | Quality | Speed | Use Case |
|---|---|---|---|---|
| q4_k_m | Smallest | ⭐⭐⭐ | Fastest | Simple refactoring |
| q5_k_m | Medium | ⭐⭐⭐⭐ | Balanced | Daily work (recommended) |
| q8_0 | Largest | ⭐⭐⭐⭐⭐ | Slowest | Critical code |
Example:
# Fast, good enough
ollama pull qwen2.5-coder:7b-instruct-q4_k_m
OLLAMA_MODEL=qwen2.5-coder:7b-instruct-q4_k_m cco
# Best quality
ollama pull qwen2.5-coder:7b-instruct-q8_0
OLLAMA_MODEL=qwen2.5-coder:7b-instruct-q8_0 cco# This will be painfully slow (2-6 minute responses)
cco # Default 8K context with 60K Claude Code contextFix:
# Configure context length first
launchctl setenv OLLAMA_CONTEXT_LENGTH 32768
brew services restart ollama
cco # Now usable# Waste of time and money
ccd --model opus
> What does map() do in JavaScript?Fix:
# Use fast free model
ccc-haiku
> What does map() do in JavaScript?# copilot-api crashes silently, you get confusing errors
ccc
# ERROR: Connection refusedFix:
# Check status before starting session
ccs # Shows copilot-api health
# Or add to shell prompt
PS1='[$(nc -z localhost 4141 && echo "✓" || echo "✗")] $ '# Terminal 1
OLLAMA_MODEL=qwen2.5-coder:32b cco # Uses 26GB RAM
# Terminal 2
OLLAMA_MODEL=qwen2.5-coder:32b cco # Tries to use another 26GB
# System crashes or swaps heavilyFix:
# Use cloud providers for parallel sessions
ccc-sonnet # Terminal 1
ccc-gpt # Terminal 2
# Ollama only in Terminal 3 if neededClassify your code by sensitivity level:
Sensitivity: None
Allowed providers: All (ccc, ccd, cco)
Examples: Open-source contributions, learning projects, public tools
# Any provider is fine
ccc
> Refactor this open-source React componentSensitivity: Medium Allowed providers: Copilot, Anthropic (check company policy), Ollama Examples: Internal tools, standard business features, non-confidential APIs
# Copilot generally acceptable (verify company policy)
ccc
> Implement user profile management featureSensitivity: High Allowed providers: Ollama ONLY Examples: Proprietary algorithms, trade secrets, client confidential code, NDA work
# Use local-only provider
cco
> Optimize proprietary recommendation algorithmSensitivity: Critical Allowed providers: Ollama ONLY + manual audit Examples: Healthcare (HIPAA), Finance (PCI-DSS), Government (FedRAMP)
# Local only + human review mandatory
cco
> Review payment card processing implementation
# Then: Manual security audit by qualified personnel| Code Type | Use | Don't Use | Why |
|---|---|---|---|
| Open-source | ccc, ccd, cco |
N/A | Public anyway |
| Standard features | ccc, ccd |
N/A | Acceptable per most policies |
| Trade secrets | cco |
ccc, ccd |
Must stay local |
| Regulated data | cco + audit |
ccc, ccd |
Compliance requirement |
| Client NDA code | cco |
ccc, ccd |
Contractual obligation |
Understanding where your code goes:
Your machine → HTTPS → api.anthropic.com (AWS US) → Anthropic servers
↓
Retained 30 days for abuse detection
Privacy considerations:
- Anthropic retains data for 30 days
- Used for abuse/safety monitoring
- Not used for model training (per Anthropic policy)
- Subject to Anthropic Terms of Service
Acceptable for: Standard business code, non-confidential work Not acceptable for: Trade secrets, NDA code, regulated data
Your machine → HTTPS → copilot-api (localhost:4141) → GitHub Copilot API
↓
Retention per GitHub policy
Privacy considerations:
- Managed by GitHub/Microsoft
- Retention policy varies by subscription type
- May be used for service improvement (check current policy)
- Subject to GitHub Copilot Terms
Acceptable for: Standard business code (check company policy) Not acceptable for: Highly confidential or regulated code
Your machine → localhost:11434 (Ollama) → STAYS LOCAL
↓
No external network calls
Privacy guarantees:
- 100% local processing
- No data leaves your machine
- No telemetry or cloud calls
- You control data retention
Acceptable for: Everything, including most sensitive code Required for: Trade secrets, NDA work, regulated data
Confirm Ollama doesn't phone home:
# Monitor network connections during Ollama use
sudo tcpdump -i any -n 'host not localhost' | grep ollama &
TCPDUMP_PID=$!
# Use Ollama
cco
> Analyze this sensitive code
# Stop monitoring
kill $TCPDUMP_PID
# Should see: No external connectionsAlternative verification:
# Check Ollama process network connections
lsof -i -P | grep ollama
# Should only show: localhost:11434 (no external IPs)Before adopting cc-copilot-bridge, verify:
- AI Tool Policy: Does your company allow AI coding assistants?
- Cloud Provider Restrictions: Are GitHub/Anthropic approved vendors?
- Data Residency: Any geographic restrictions on data processing?
- Audit Requirements: Need logging/traceability of AI usage?
- Code Review: Must AI-generated code be reviewed by humans?
Recommend this classification to your manager:
Company AI Coding Policy - cc-copilot-bridge
Allowed Use Cases:
- GitHub Copilot (ccc): Standard feature development, testing, documentation
- Anthropic Direct (ccd): Code review, architecture analysis, learning
- Ollama Local (cco): Sensitive/confidential code, regulated data
Prohibited:
- Sending customer data to AI models (any provider)
- Using cloud AI (ccc/ccd) for code under NDA
- Bypassing code review for AI-generated code
Required:
- Human review of all AI-generated code before commit
- Use of local Ollama (cco) for Level 3+ sensitive code
- Monthly audit of AI usage logs
All sessions are logged to ~/.claude/claude-switch.log:
[2026-01-22 09:42:33] [INFO] Provider: GitHub Copilot - Model: claude-sonnet-4-6
[2026-01-22 09:42:33] [INFO] Session started: mode=copilot:claude-sonnet-4-6 pid=12345
[2026-01-22 10:15:20] [INFO] Session ended: duration=32m47s exit=0Audit queries:
# How many times did I use each provider this month?
grep "Session started" ~/.claude/claude-switch.log | \
grep "$(date +%Y-%m)" | \
cut -d: -f4 | cut -d' ' -f1 | sort | uniq -c
# Did I use Copilot for sensitive project? (Audit violation check)
grep "mode=copilot" ~/.claude/claude-switch.log | grep -C 3 "$(date +%Y-%m-%d)"
# Cross-reference with git commits in sensitive reposFor compliance: Retain logs for duration required by policy (typically 1-3 years).
If you accidentally used wrong provider for sensitive code:
# Immediate action: Document incident
echo "[$(date)] INCIDENT: Used ccc for project X (should be cco)" >> ~/ai-audit.log
# Inform security team if required by policy
# Document: What code was exposed, when, to which provider
# Prevention: Add pre-commit hook
# .git/hooks/pre-commit
#!/bin/bash
if git diff --cached | grep -q "SENSITIVE"; then
echo "WARNING: Committing sensitive code. Verify you used Ollama (cco)."
echo "Last AI session: $(tail -1 ~/.claude/claude-switch.log)"
read -p "Continue? (y/n) " -n 1 -r
echo
[[ ! $REPLY =~ ^[Yy]$ ]] && exit 1
fiWhen working on client code under NDA:
Strict approach:
# ONLY use Ollama for any client work
alias cc-client='cco'
# Add to shell prompt to remind you
PS1='[CLIENT-NDA] $ '
# Disable other providers during client work
alias ccc='echo "ERROR: Use cc-client for NDA work" && false'
alias ccd='echo "ERROR: Use cc-client for NDA work" && false'Moderate approach (if policy allows):
# Non-confidential scaffolding: Copilot OK
ccc
> Generate boilerplate Express.js server
# Client-specific logic: Ollama only
cco
> Implement client's proprietary pricing algorithmDocument your usage:
# Log which provider used for which parts
echo "[$(date)] Project ClientX - used ccc for boilerplate only" >> ~/client-ai-log.txt
echo "[$(date)] Project ClientX - used cco for pricing algorithm" >> ~/client-ai-log.txtRecommended monthly budget allocation:
| Scenario | Copilot | Anthropic | Ollama | Total |
|---|---|---|---|---|
| Cost-conscious | $10 | $0-5 | $0 | $10-15 |
| Balanced | $10 | $10-20 | $0 | $20-30 |
| Quality-focused | $10 | $30-50 | $0 | $40-60 |
| Privacy-focused | $0 | $0 | $0 | $0* |
*Ollama hardware costs not included (one-time)
Target: Stay under $30/month for most developers.
Fixed cost: $10/month (GitHub Copilot Pro+ subscription)
Usage tracking:
# Count Copilot sessions this month
grep "mode=copilot" ~/.claude/claude-switch.log | \
grep "$(date +%Y-%m)" | wc -l
# Average session duration
grep "mode=copilot.*duration=" ~/.claude/claude-switch.log | \
grep "$(date +%Y-%m)" | \
sed 's/.*duration=\([0-9]*\)m.*/\1/' | \
awk '{sum+=$1; count++} END {print sum/count " minutes"}'Value metrics:
- Sessions per day: Target 3-5 for daily usage
- Monthly quota: Pro = 300 requests, Pro+ = 1,500 requests
- Different models consume different quota (see multipliers)
Variable cost: Per-token usage
Check dashboard:
- Login: https://console.anthropic.com
- Navigate: Usage → Current month
- Monitor: Input tokens, output tokens, cost
Cost estimation from logs:
# Count Anthropic sessions
grep "mode=direct" ~/.claude/claude-switch.log | grep "$(date +%Y-%m)" | wc -l
# Rough estimate (actual cost varies by model and tokens)
# Haiku: ~$0.50/session
# Sonnet: ~$2/session
# Opus: ~$5/sessionBudget alerts: Set up in Anthropic Console:
- Navigate: Settings → Billing → Usage alerts
- Set alert: $20, $40, $60 thresholds
Operational cost: $0 (electricity negligible)
Infrastructure cost:
- Apple Silicon Mac: One-time purchase ($1500-3500)
- Amortized over 3-4 years: ~$35-90/month
- Benefit: Usable for all work, not just AI
Usage tracking:
# Ollama sessions
grep "mode=ollama" ~/.claude/claude-switch.log | grep "$(date +%Y-%m)" | wc -l
# Disk usage (models)
du -sh ~/.ollama/modelsSet Copilot as your default:
# In ~/.zshrc
alias cc='ccc-sonnet' # Default to free Copilot
# Only use paid explicitly
alias cc-paid='ccd'Result: Requires conscious decision to spend money.
Build habits for right model selection:
# Cheap/free by default
cc-haiku # Quick questions
cc-sonnet # Implementation
# Expensive only when justified
cc-paid # Critical security reviewMental model: "Is this question worth $2-5?" If no, use Copilot.
Instead of multiple expensive reviews, batch them:
# Bad: Multiple Opus sessions
ccd --model opus
> Review function A
/exit
ccd --model opus
> Review function B
/exit
# Cost: 2 sessions × $5 = $10
# Good: Single batched session
ccd --model opus
> Review functions A, B, C, and D
/exit
# Cost: 1 session × $5 = $5Savings: 50% on review costs.
Start vague, add detail only if needed:
# First try: Free Copilot with vague question
ccc-sonnet
> This authentication isn't working
# If response insufficient, add detail
> Here's the code: [paste code]
# Only if still stuck, escalate
/exit
ccd --model opus
> Deep debugging: [paste code + context]Savings: Often Copilot solves it, avoiding Anthropic cost.
Use Ollama for rapid iteration (free), Anthropic for final review:
# Iterate with Ollama (free)
cco
> Try caching strategy A
> Try caching strategy B
> Try caching strategy C
# Once settled, validate with paid model
ccd --model opus
> Final review of caching strategy CCost strategy: Iterate with Copilot (uses quota), finalize with Anthropic Direct (official API).
For teams using shared Anthropic account:
Track individual usage:
# Add to ~/.zshrc for each team member
alias ccd='echo "[$(date)] USER=$(whoami)" >> ~/.claude/claude-switch.log && claude-switch direct'
# Monthly team report
grep "USER=" ~/.claude/claude-switch.log | \
cut -d= -f2 | sort | uniq -cCost allocation:
Total monthly bill: $150
User A sessions: 60 (40%)
User B sessions: 45 (30%)
User C sessions: 45 (30%)
User A pays: $60
User B pays: $45
User C pays: $45
Alternative: Team pool with agreed limits:
Team: 5 developers
Anthropic budget: $200/month pooled
Limit per dev: $40/month
Weekly check-in:
- Review: Who used what models
- Flag: Anyone approaching $40 limit
- Adjust: Encourage Copilot for that developer
Symptom: High Anthropic bills for routine work.
Example:
# Every day, for everything
ccd --model opus
> Write unit test
> Refactor variable names
> Explain this function
> Fix typo in comment
# Monthly cost: $200+Fix:
# Use free Copilot for routine work
ccc-haiku
> Write unit test
> Refactor variable names
# Reserve Opus for critical decisions
ccd --model opus
> Architecture review for new microservice
# Monthly cost: $20Rule: If Haiku could do it, Opus is waste.
Symptom: Intermittent failures, confusion about what's broken.
Example:
# Just try commands until something works
ccc # ERROR: Connection refused
ccd # ERROR: API key invalid
cco # ERROR: Model not found
# Frustration: Nothing works!Fix:
# Check status FIRST
ccs
# Output shows exactly what's broken
# Anthropic API: ✓ Reachable
# copilot-api: ✗ Not running ← Fix this
# Ollama: ✗ Not running ← And this
# Fix identified issues
copilot-api start
ollama serve &
# Now try again
ccc # Works!Rule: ccs before debugging.
Symptom: Compliance violations, NDA breaches.
Example:
# Using cloud AI for trade secret
ccc
> Optimize our proprietary recommendation algorithm
# PROBLEM: Sent to GitHub Copilot serversFix:
# Use local-only provider for sensitive code
cco
> Optimize our proprietary recommendation algorithm
# Safe: Stays on your machineRule: If under NDA or trade secret, use cco only.
Symptom: Ollama unbearably slow (2-6 minute responses).
Example:
# Using Ollama with default 8K context
cco # Takes forever, responses incoherentFix:
# Configure 32K context ONCE
launchctl setenv OLLAMA_CONTEXT_LENGTH 32768
brew services restart ollama
# Now Ollama works properly
cco # Reasonable speed, coherent responsesRule: Configure Ollama before complaining it's slow.
Symptom: Random model choice, inconsistent results, waste.
Example:
# No strategy, just random
ccc-opus # Overkill for simple question
ccc-haiku # Insufficient for security review
ccc-gpt # When Claude would work fineFix:
# Clear decision tree
# Quick question? → Haiku
ccc-haiku
> What's the syntax for async/await?
# Implementation? → Sonnet
ccc-sonnet
> Implement user authentication
# Critical review? → Opus or Anthropic
ccc-opus
> Security review before productionRule: Match model to task complexity.
Symptom: System crashes, extreme swap usage.
Example:
# Terminal 1
OLLAMA_MODEL=qwen2.5-coder:32b cco # 26GB RAM
# Terminal 2
OLLAMA_MODEL=qwen2.5-coder:32b cco # Another 26GB
# Total: 52GB on 48GB machine → crashFix:
# Use cloud for parallel, Ollama for one
# Terminal 1
ccc-sonnet # Cloud-based, no RAM issue
# Terminal 2
ccc-gpt # Cloud-based, no RAM issue
# Terminal 3 (only if needed)
cco # Local, uses RAMRule: Only one Ollama session at a time.
Symptom: ccc fails randomly after reboots.
Example:
# After reboot
ccc
# ERROR: copilot-api not running on :4141
# Have to remember to start manually every time
copilot-api startFix:
# Set up auto-start ONCE (macOS)
# Create ~/Library/LaunchAgents/com.copilot-api.plist
# (see Performance Optimization section for full plist)
launchctl load ~/Library/LaunchAgents/com.copilot-api.plist
# Now copilot-api starts automaticallyRule: Automate what you use daily.
Symptom: Surprise bills, no visibility into spending.
Example:
# Just use whatever, whenever
ccd --model opus # Multiple times daily
# End of month: $300 bill, surprise!Fix:
# Weekly cost check
cc-cost-check() {
echo "This week:"
echo " Copilot: $(grep mode=copilot ~/.claude/claude-switch.log | grep "$(date +%Y-%m)" | wc -l) sessions"
echo " Anthropic: $(grep mode=direct ~/.claude/claude-switch.log | grep "$(date +%Y-%m)" | wc -l) sessions"
echo "Check Anthropic dashboard for $$ amount"
}
# Run every Monday
cc-cost-checkRule: Track spending, don't discover it.
ccs # Check provider status (start here)
ccc # Daily work (Copilot Sonnet, free)
ccc-haiku # Quick questions (fastest, free)
ccc-opus # Code reviews (best quality, free)
ccd # Critical analysis (Anthropic, paid)
cco # Sensitive code (local, free)Task type?
├─ Quick question → ccc-haiku
├─ Feature implementation → ccc-sonnet (or just ccc)
├─ Code review → ccc-opus
├─ Security-critical → ccd --model opus
└─ Trade secret → cco
- Default to Copilot (free)
- Use Haiku for speed, Sonnet for balance
- Reserve Opus/Anthropic for critical work only
- Batch expensive reviews
- Check
ccsbefore debugging - Track costs weekly
Code sensitivity?
├─ Public/open-source → ccc, ccd, cco (any)
├─ Internal business logic → ccc, ccd (check policy)
├─ Trade secret/NDA → cco (local only)
└─ Regulated (HIPAA, PCI) → cco + manual audit
- copilot-api running (for
ccc):nc -zv localhost 4141 - Ollama context configured (for
cco):echo $OLLAMA_CONTEXT_LENGTHshould show32768 - Using appropriate model size for hardware
- Not running multiple Ollama sessions
- MCP profiles generated for GPT models
Effective use of cc-copilot-bridge requires:
- Strategic thinking: Match provider and model to task requirements
- Cost discipline: Default to free, escalate consciously
- Security awareness: Classify code sensitivity, choose provider accordingly
- Performance optimization: Configure tools properly, monitor health
- Team coordination: Standardize practices, share configurations
Remember: The most expensive model isn't always the best choice. Use the right tool for each task, and you'll maximize value while minimizing costs.
- Quick Start Guide - Get started in 2 minutes
- Model Switching Guide - Dynamic model selection
- Decision Trees - Task-to-command mapping
- FAQ - Common questions answered
- Troubleshooting - Problem resolution
- MCP Profiles - Advanced MCP configuration
- Ollama Optimization - Apple Silicon tuning
Back to: Documentation Index | Main README