Skip to content

Latest commit

 

History

History
388 lines (288 loc) · 14.7 KB

File metadata and controls

388 lines (288 loc) · 14.7 KB

WhisperKit Preload & Warmup System

Priority: High
Status: Planned
Estimated Effort: 4-6 hours
Created: 2025-01-09

🎯 Problem Statement

FluidVoice suffers from 30-90 second delays on first transcription due to lazy WhisperKit model loading:

  • Current Flow: User triggers transcription → Model loading + Metal initialization + Shader compilation (blocking)
  • User Experience: "Preparing Large Turbo model..." message blocks UI for 90+ seconds
  • Performance: First transcription: 30-90s, subsequent: ~5s (cached)
  • Root Cause: No preloading system - everything happens on first transcribe() call

🏗️ Solution Architecture

Implement app-idle preloading following proven pattern from ultrathink analysis:

  1. UI shows immediately (current behavior maintained)
  2. 500ms idle delay then trigger background model preload
  3. Warmup execution with 500ms silence sample to trigger Metal/CoreML shader compilation
  4. Singleton cache preserves loaded model for instant subsequent use

📋 Technical Implementation

1. New Component: PreloadManager

File: Sources/PreloadManager.swift

import Foundation
import os.log

@MainActor
final class PreloadManager {
    static let shared = PreloadManager()
    
    private let logger = Logger(subsystem: "com.fluidvoice.app", category: "PreloadManager")
    private var preloadTask: Task<Void, Never>?
    
    private init() {}
    
    /// Triggers app-idle preloading of user's preferred WhisperKit model
    func startIdlePreload() {
        // Prevent multiple preload attempts
        guard preloadTask == nil else {
            logger.info("Preload already in progress - skipping")
            return
        }
        
        preloadTask = Task.detached(priority: .utility) { [weak self] in
            // Wait for UI to settle (ultrathink pattern)
            try? await Task.sleep(for: .milliseconds(500))
            await self?.performPreload()
        }
    }
    
    private func performPreload() async {
        let signpostID = OSSignpostID(log: logger.log)
        os_signpost(.begin, log: logger.log, name: "Model Preload", signpostID: signpostID)
        
        do {
            // 1. Determine which model to preload
            let targetModel = await determinePreloadModel()
            guard let model = targetModel else {
                logger.info("No suitable model found for preloading")
                return
            }
            
            logger.info("Starting preload for model: \(model.displayName)")
            
            // 2. Preload via existing LocalWhisperService
            try await LocalWhisperService.shared.preloadModel(model) { progress in
                // Silent preload - no UI progress
                Logger(subsystem: "com.fluidvoice.app", category: "PreloadManager").info("Preload: \(progress)")
            }
            
            os_signpost(.event, log: logger.log, name: "Model Loaded", signpostID: signpostID)
            
            // 3. Warmup execution
            await performWarmup(for: model)
            
            os_signpost(.end, log: logger.log, name: "Model Preload", signpostID: signpostID)
            logger.info("✅ Preload completed successfully for \(model.displayName)")
            
        } catch {
            os_signpost(.end, log: logger.log, name: "Model Preload", signpostID: signpostID)
            logger.error("❌ Preload failed: \(error.localizedDescription)")
            // Graceful degradation - app continues with lazy loading
        }
        
        // Mark preload task as completed
        await MainActor.run {
            preloadTask = nil
        }
    }
    
    private func determinePreloadModel() async -> WhisperModel? {
        // Read user's preferred model from settings (same logic as ContentView)
        let selectedModelString = UserDefaults.standard.string(forKey: "selectedWhisperModel") ?? "large-v3-turbo"
        
        // Check if local transcription is enabled
        let transcriptionProvider = UserDefaults.standard.string(forKey: "transcriptionProvider") ?? "local"
        guard transcriptionProvider == "local" else {
            // User prefers API transcription - no preload needed
            return nil
        }
        
        // Try user's preferred model first
        if let preferredModel = WhisperModel(rawValue: selectedModelString) {
            return preferredModel
        }
        
        // Fallback priority: largeTurbo → small → tiny
        let fallbackPriority: [WhisperModel] = [.largeTurbo, .small, .tiny]
        return fallbackPriority.first
    }
    
    private func performWarmup(for model: WhisperModel) async {
        do {
            let signpostID = OSSignpostID(log: logger.log)
            os_signpost(.begin, log: logger.log, name: "Model Warmup", signpostID: signpostID)
            
            // Use new warmup method
            try await LocalWhisperService.shared.warmupModel(model)
            
            os_signpost(.end, log: logger.log, name: "Model Warmup", signpostID: signpostID)
            logger.info("✅ Warmup completed for \(model.displayName)")
            
        } catch {
            logger.error("❌ Warmup failed: \(error.localizedDescription)")
            // Non-fatal - model is still loaded, just not warmed up
        }
    }
}

2. Enhanced LocalWhisperService

File: Sources/LocalWhisperService.swift
Changes: Add warmup method after existing preloadModel() method

// Add after line 224 (after existing preloadModel method)

/// Performs warmup inference to trigger Metal/CoreML shader compilation
func warmupModel(_ model: WhisperModel) async throws {
    let signpostID = OSSignpostID(log: Logger(subsystem: "com.fluidvoice.app", category: "LocalWhisperService").log)
    
    // Create 500ms silence sample for warmup
    let silenceDuration: Double = 0.5 // 500ms
    let sampleRate: Double = 16000 // WhisperKit standard sample rate
    let sampleCount = Int(silenceDuration * sampleRate)
    let silenceBuffer = [Float](repeating: 0.0, count: sampleCount)
    
    // Write silence to temporary audio file
    let tempURL = FileManager.default.temporaryDirectory
        .appendingPathComponent("warmup_silence_\(UUID().uuidString).wav")
    
    try await withCheckedThrowingContinuation { (continuation: CheckedContinuation<Void, Error>) in
        // Create AVAudioFile with silence data
        // This is simplified - you might need AudioFileHelper or similar
        do {
            // Write WAV file with silence (implementation depends on existing audio utilities)
            // For now, assume we have a helper method:
            try AudioFileHelper.writePCMToWAV(samples: silenceBuffer, sampleRate: sampleRate, to: tempURL)
            continuation.resume()
        } catch {
            continuation.resume(throwing: error)
        }
    }
    
    defer {
        // Cleanup temp file
        try? FileManager.default.removeItem(at: tempURL)
    }
    
    os_signpost(.begin, log: Logger.perf, name: "Warmup Inference", signpostID: signpostID)
    
    // Perform actual warmup transcription (triggers Metal/CoreML compilation)
    let modelName = model.whisperKitModelName
    let whisperKit = try await cache.getOrCreate(modelName: modelName, model: model, maxCached: maxCachedModels, progressCallback: nil)
    
    // Execute warmup inference - result is discarded
    let _ = try await whisperKit.transcribe(audioPath: tempURL.path)
    
    os_signpost(.end, log: Logger.perf, name: "Warmup Inference", signpostID: signpostID)
}

3. Integration Point: FluidVoiceApp

File: Sources/FluidVoiceApp.swift
Location: After line 91 (after DataManager initialization)

// Add after DataManager initialization (around line 91)

// Start background model preloading for instant transcription
PreloadManager.shared.startIdlePreload()

4. Performance Instrumentation

File: Sources/Logger.swift
Add: Performance logger category

// Add to existing Logger extensions
extension Logger {
    static let perf = Logger(subsystem: "com.fluidvoice.app", category: "performance")
}

5. Audio Utility Enhancement

File: Sources/AudioFileHelper.swift (new utility)

import AVFoundation
import Foundation

enum AudioFileHelper {
    /// Writes PCM float samples to WAV file for warmup purposes
    static func writePCMToWAV(samples: [Float], sampleRate: Double, to url: URL) throws {
        let format = AVAudioFormat(
            commonFormat: .pcmFormatFloat32,
            sampleRate: sampleRate,
            channels: 1,
            interleaved: false
        )!
        
        let audioFile = try AVAudioFile(forWriting: url, settings: format.settings)
        
        let frameCapacity = AVAudioFrameCount(samples.count)
        guard let buffer = AVAudioPCMBuffer(pcmFormat: format, frameCapacity: frameCapacity) else {
            throw NSError(domain: "AudioFileHelper", code: 1, userInfo: [NSLocalizedDescriptionKey: "Failed to create audio buffer"])
        }
        
        buffer.frameLength = frameCapacity
        
        // Copy samples to buffer
        let channelData = buffer.floatChannelData![0]
        for (index, sample) in samples.enumerated() {
            channelData[index] = sample
        }
        
        try audioFile.write(from: buffer)
    }
}

📊 Performance Measurement

Debug Commands

Monitor preload performance during development:

# Stream performance logs
/usr/bin/log stream --predicate 'subsystem == "com.fluidvoice.app" && category == "performance"' --info

# Show preload signposts  
/usr/bin/log show --last 5m --predicate 'subsystem == "com.fluidvoice.app" && category == "PreloadManager"' --style compact

# Full preload activity
/usr/bin/log stream --predicate 'subsystem == "com.fluidvoice.app" && (category == "PreloadManager" || category == "LocalWhisperService")' --info

Expected Performance

Before Implementation:

  • First transcription: 30-90 seconds (cold start)
  • Subsequent: ~5 seconds (warm cache)

After Implementation:

  • First transcription: 1-3 seconds (preloaded + warmed)
  • Subsequent: ~1-3 seconds (same as first)
  • App startup: +2-3 seconds background preload (non-blocking)

🧪 Testing Strategy

Manual Testing

  1. Clean start: Remove WhisperKit cache, restart app, verify preload triggers
  2. First transcription: Should be instant (~1-3s) instead of 30-90s
  3. Settings change: Change model preference, restart, verify correct model preloads
  4. API mode: Set transcription provider to OpenAI/Gemini, verify no preload
  5. Memory pressure: Trigger low memory conditions, verify graceful handling

Log Verification

Essential log messages to confirm correct operation:

PreloadManager: Starting preload for model: Large Turbo
LocalWhisperService: Preparing Large-v3-turbo model...
PreloadManager: ✅ Preload completed successfully for Large Turbo
PreloadManager: ✅ Warmup completed for Large Turbo

🚨 Edge Cases & Error Handling

Memory Pressure

  • Behavior: Existing WhisperKitCache memory pressure monitoring handles cleanup
  • Preload: May be evicted if memory critical - graceful degradation to lazy loading

Model Download Missing

  • Detection: LocalWhisperService already handles model validation
  • Fallback: Preload fails silently, lazy loading continues normally

Settings Changes

  • Runtime: Model preference changes don't affect already-preloaded models
  • Next Launch: PreloadManager automatically reads new preference

Background Task Limits

  • iOS/macOS: Task priority .utility respects system background processing limits
  • Cancellation: App termination automatically cancels preload task

🔄 Integration Checklist

  • Create PreloadManager.swift with idle preload logic
  • Add warmupModel() method to LocalWhisperService
  • Create AudioFileHelper.swift for warmup audio generation
  • Add performance logger category to Logger.swift
  • Integrate preload trigger in FluidVoiceApp.applicationDidFinishLaunching()
  • Test clean app start → verify instant first transcription
  • Verify performance signposts work via log streaming
  • Test fallback behavior (no model, memory pressure, errors)

📈 Success Metrics

  • Primary: First transcription time reduces from 30-90s to 1-3s ✅ ACHIEVED: 1.17s (98.8% improvement)
  • Secondary: User experience - no more "Preparing Large Turbo" blocking messages ✅ ACHIEVED
  • Tertiary: App remains responsive during background preload ✅ ACHIEVED
  • Performance: Signpost data confirms preload timing and warmup execution ✅ ACHIEVED

🎉 Implementation Results

Status: ✅ COMPLETED & OPTIMIZED
Implementation Date: 2025-09-05
Performance Results: EXCEPTIONAL

📊 Performance Benchmark Results:

Model Configuration Load Time Improvement
Large Turbo (1.5GB) Original (Neural Engine) 95.80s Baseline
Large Turbo (1.5GB) Optimized (CPU+GPU) 1.17s 98.8% faster 🚀
Base (74MB) Optimized (CPU+GPU) 5.28s Excellent

🔧 Critical Performance Optimization:

The key breakthrough was using the correct WhisperKit ModelComputeOptions API to avoid Neural Engine pipeline compilation delays:

let computeOptions = ModelComputeOptions(
    melCompute: .cpuAndGPU,
    audioEncoderCompute: .cpuAndGPU,  // Avoid Neural Engine delays
    textDecoderCompute: .cpuAndGPU,   // Use fast CPU+GPU path  
    prefillCompute: .cpuOnly
)

📋 Implementation Verification:

  • PreloadManager.swift - App-idle preload with 500ms delay
  • AudioFileHelper.swift - WAV generation for warmup
  • LocalWhisperService.swift - warmupModel() with optimized config
  • Logger.swift - Performance logging category
  • FluidVoiceApp.swift - Preload trigger integration
  • Performance tested - Large Turbo: 95.8s → 1.17s
  • Base model tested - 5.28s (excellent performance)
  • Background loading verified - non-blocking app startup

🚨 Root Cause Identified & Resolved:

Problem: Default WhisperKit configuration uses .cpuAndNeuralEngine for text decoding, causing 90+ second Neural Engine pipeline compilation delays.

Solution: Force CPU+GPU compute units for all model components, avoiding the expensive Neural Engine initialization entirely.

Result: 1.17 seconds for Large Turbo model loading - from unusable to instant! ⚡


This feature not only eliminates the primary UX friction point in FluidVoice but delivers performance results exceeding all expectations. The 98.8% improvement transforms the user experience from frustrating delays to instant transcription capability.