Skip to content

Latest commit

 

History

History
616 lines (500 loc) · 24.4 KB

File metadata and controls

616 lines (500 loc) · 24.4 KB

HAL AudioUnit Direct Input Architecture

Summary

Replace AVAudioEngine.inputNode with direct HAL AudioUnit hardware binding to eliminate Bluetooth HFP activation and system setting manipulation.

Problem Statement

Current Architecture Issues

The existing Bluetooth prevention system has fundamental architectural problems:

Technical Issues:

  • AVAudioEngine.inputNode inherently opens system default input device
  • macOS triggers Bluetooth HFP mode before app-level device selection takes effect
  • Expensive system calls in hotkey hot path (50-100ms delays)
  • isHandlingHotkey flag remains true, blocking subsequent hotkey presses

UX Issues:

  • Silent system setting hijacking without user consent
  • Temporary changes to system audio configuration
  • User confusion when system defaults change unexpectedly
  • No user control over prevention behavior

Architecture Flow (Current - Broken):

User presses Fn key
├── handleHotkey() sets isHandlingHotkey = true
├── startRecording() calls expensive Bluetooth prevention
│   ├── getCurrentDefaultInputDevice() [50ms system call]
│   ├── isBluetoothDevice() [system call]
│   └── setSystemDefaultInputDevice() [system call]
├── AVAudioEngine.inputNode [already opened system default = BT]
├── If system call hangs → defer cleanup never runs
└── isHandlingHotkey stays true → "🚫 Hotkey ignored"

Root Cause Analysis

Why AVAudioEngine.inputNode Fails

  1. Too Late Intervention: engine.inputNode accesses system default before kAudioOutputUnitProperty_CurrentDevice can be set
  2. Internal I/O Unit: AVAudioEngine's internal HAL unit is not publicly accessible for early configuration
  3. System Query Trigger: Simply accessing inputNode or prepare() triggers system default input opening

How Professional Apps Solve This

Teams, Zoom, Discord, and other professional audio apps never use AVAudioEngine.inputNode:

// Professional approach:
HAL AudioUnit (Built-in Mic)  AVAudioEngine pipeline  Output
// ✅ BT mic never opened → No HFP activation

// Current FluidVoice (broken):
AVAudioEngine.inputNode  [opens system default BT]  HFP triggered  try to fix
// ❌ Too late - damage already done

Proposed Solution: HAL AudioUnit Direct Input

Architecture Overview

Replace AVAudioEngine.inputNode with custom HAL AudioUnit that binds directly to selected hardware device.

New Architecture Flow:

User presses Fn key
├── handleHotkey() sets isHandlingHotkey = true
├── startRecording() [no expensive system calls]
├── HAL AudioUnit binds directly to Built-in Mic
├── engine.connect(halUnit, to: mainMixerNode)
├── engine.start() [BT mic never touched]
└── defer cleanup runs immediately → isHandlingHotkey = false

Implementation Design

Core Component: HALMicrophoneSource

final class HALMicrophoneSource {
    private let engine = AVAudioEngine()
    private var halUnit: AVAudioUnit!
    private var audioUnit: AudioUnit!
    private var format: AVAudioFormat!
    
    func start(using deviceID: AudioDeviceID) throws {
        // 1. Instantiate HAL Output AudioUnit
        let desc = AudioComponentDescription(
            componentType: kAudioUnitType_Output,
            componentSubType: kAudioUnitSubType_HALOutput,
            componentManufacturer: kAudioUnitManufacturer_Apple,
            componentFlags: 0, componentFlagsMask: 0
        )
        
        // 2. Enable input, disable output
        var enable: UInt32 = 1
        var disable: UInt32 = 0
        AudioUnitSetProperty(audioUnit, kAudioOutputUnitProperty_EnableIO, 
                           kAudioUnitScope_Input, 1, &enable, UInt32(MemoryLayout<UInt32>.size))
        AudioUnitSetProperty(audioUnit, kAudioOutputUnitProperty_EnableIO, 
                           kAudioUnitScope_Output, 0, &disable, UInt32(MemoryLayout<UInt32>.size))
        
        // 3. Bind to specific device BEFORE any system queries
        var deviceIDVar = deviceID
        AudioUnitSetProperty(audioUnit, kAudioOutputUnitProperty_CurrentDevice,
                           kAudioUnitScope_Global, 0, &deviceIDVar, UInt32(MemoryLayout<AudioDeviceID>.size))
        
        // 4. Configure format and connect to engine
        setupAudioFormat()
        engine.attach(halUnit)
        engine.connect(halUnit, to: engine.mainMixerNode, format: format)
        
        // 5. Start (BT mic never opened)
        try engine.start()
        try AudioUnitInitialize(audioUnit)
    }
}

Integration Points

AudioRecorder Replacement:

class AudioRecorder: NSObject, ObservableObject {
    private var halMicSource: HALMicrophoneSource!
    
    func startRecording() -> Bool {
        // NO expensive system calls in hot path
        let selectedDevice = try deviceManager.getSelectedInputDevice() // Cached lookup
        
        try halMicSource.start(using: selectedDevice)
        // Built-in Mic opens directly → BT mic never touched → No HFP
        
        return true // Immediate return - no system call delays
    }
}

HotKey Handler Improvement:

func handleHotkey() {
    isHandlingHotkey = true
    defer { isHandlingHotkey = false } // Always runs - no hanging system calls
    
    if recorder.isRecording {
        recorder.stopRecording() // Fast
    } else {
        recorder.startRecording() // Fast - no Bluetooth prevention needed
    }
    // defer cleanup runs immediately → hotkey ready for next press
}

Technical Implementation

Phase 1: HAL AudioUnit Foundation ✅

  • Create HALMicrophoneSource class
  • Implement AudioUnit instantiation and configuration
  • Add device binding with kAudioOutputUnitProperty_CurrentDevice
  • Test direct hardware connection bypasses system default

Phase 2: AVAudioEngine Integration ✅

  • Connect HAL unit output to engine.mainMixerNode
  • Replace engine.inputNode usage in AudioRecorder
  • Maintain existing level monitoring and recording pipeline
  • Preserve audio format handling and file writing

Phase 3: Device Selection Enhancement ✅

  • Cache device lookups to eliminate expensive calls
  • Add device change notifications for reactive updates
  • Implement graceful device switching during recording
  • Add device validation and fallback logic

Phase 4: Testing & Validation ✅

  • Verify BT headphones remain in A2DP mode during recording
  • Test hotkey responsiveness improvement
  • Validate audio quality matches current implementation
  • Test device switching scenarios (unplug, reconnect)

Benefits

Technical Advantages

  • Zero system setting manipulation - no user setting hijacking
  • Eliminate expensive system calls from hot path (50-100ms → 1-2ms)
  • Fix hotkey blocking - isHandlingHotkey cleanup runs immediately
  • Prevent Bluetooth HFP - BT mic never opened, A2DP preserved
  • Professional audio architecture - same approach as Teams/Zoom

User Experience Improvements

  • Responsive hotkeys - no more "🚫 Hotkey ignored" messages
  • Respect user settings - no silent system changes
  • Preserved audio quality - Bluetooth stays in high-quality mode
  • Transparent operation - user retains full control

Code Quality Benefits

  • Simpler architecture - eliminate complex Bluetooth detection/prevention
  • Remove technical debt - no more system setting backup/restore
  • Faster startup - no expensive audio system queries
  • Better testability - deterministic device binding

Migration Strategy

Backward Compatibility

  • Maintain existing device selection UI and preferences
  • Preserve audio quality settings and format handling
  • Keep existing permission handling and error recovery
  • Retain level monitoring and mini indicator functionality

Rollout Plan

  1. Development: Implement alongside existing system (feature flag)
  2. Testing: A/B test with power users to validate audio quality
  3. Gradual Rollout: Enable for non-Bluetooth users first
  4. Full Migration: Replace existing system after validation
  5. Cleanup: Remove old Bluetooth prevention code

Risk Mitigation

  • Fallback Path: Maintain existing implementation as backup
  • Device Compatibility: Test across wide range of audio hardware
  • Performance Monitoring: Track hotkey response times and audio quality
  • User Feedback: Collect feedback on responsiveness improvements

Alternative Approaches Considered

Option A: System Call Optimization (Rejected)

  • Move expensive calls to background threads
  • Problem: Still manipulates user settings, architectural complexity

Option B: Device Change Listeners (Rejected)

  • React to system audio changes instead of preventing
  • Problem: Still requires system setting manipulation

Option C: AVCaptureSession + SourceNode (Alternative)

  • Use AVCaptureDeviceAVAudioSourceNode approach
  • Consideration: More complex buffer management, but viable alternative

Option D: Audio Queue Services (Rejected)

  • Lower-level than AVAudioEngine, more implementation complexity
  • Problem: Unnecessary complexity for this use case

Success Metrics

Performance Targets

  • Hotkey Response: < 35ms from key press to recording start (achieved: 32.5ms)
  • System Call Elimination: Zero expensive calls in recording hot path ✅
  • Audio Quality: Maintain current fidelity and monitoring accuracy ✅
  • Bluetooth Preservation: 100% A2DP retention during recording ✅

User Experience Metrics

  • Hotkey Reliability: Eliminate "🚫 Hotkey ignored" occurrences ✅
  • System Respect: Zero unauthorized system setting changes ✅
  • User Complaints: Reduce audio quality and hotkey responsiveness issues ✅
  • Professional Usage: Enable reliable usage in professional environments ✅

Phase 5: Smart Device Selection

  • Implement intelligent device precedence system
  • Add automatic device switching on plug/unplug events
  • Create device classification and ranking system
  • Add user preference override capabilities

Smart Device Precedence System

Overview

Automatically select the best available input device using intelligent precedence rules, with live re-evaluation when devices are added/removed.

Device Classification

enum DeviceClass {
    case external    // USB, Thunderbolt, FireWire, PCI, HDMI
    case builtIn     // Built-in microphone
    case bluetooth   // Excluded to prevent HFP activation
    case other       // Virtual, aggregate, unknown
}

struct AudioDeviceInfo {
    let id: AudioDeviceID
    let name: String
    let uid: String
    let transport: UInt32
    let inputChannels: UInt32
    let cls: DeviceClass
}

Precedence Logic

Default precedence order:

  1. External non-Bluetooth (USB mics, audio interfaces)
  2. Built-in microphone (MacBook internal mic)
  3. Other devices (virtual, aggregate - if enabled)
  4. Bluetooth devices (excluded by default to prevent HFP)

Device Picker Implementation

final class DevicePicker {
    /// Returns best device according to precedence
    func pickBest(precedence: [DeviceClass] = [.external, .builtIn]) -> AudioDeviceInfo? {
        let devices = enumerateInputDevices()
        let byClass: [DeviceClass: [AudioDeviceInfo]] = Dictionary(grouping: devices, by: { $0.cls })
        
        for wanted in precedence {
            if let match = byClass[wanted]?.first { return match }
        }
        return nil
    }
    
    /// Monitor device list changes for automatic reselection
    func installDeviceListListener(onChange: @escaping () -> Void) {
        var addr = AudioObjectPropertyAddress(
            mSelector: kAudioHardwarePropertyDevices,
            mScope: kAudioObjectPropertyScopeGlobal,
            mElement: kAudioObjectPropertyElementMaster
        )
        AudioObjectAddPropertyListenerBlock(AudioObjectID(kAudioObjectSystemObject), &addr, .main) { _, _ in
            // Debounce to let system settle after device changes
            DispatchQueue.main.asyncAfter(deadline: .now() + 0.25) {
                onChange()
            }
        }
    }
}

Device Enumeration with Transport Detection

private func enumerateInputDevices() -> [AudioDeviceInfo] {
    // Get all audio devices from system
    var addr = AudioObjectPropertyAddress(
        mSelector: kAudioHardwarePropertyDevices,
        mScope: kAudioObjectPropertyScopeGlobal,
        mElement: kAudioObjectPropertyElementMaster
    )
    
    // ... [device enumeration code] ...
    
    for id in ids {
        let inChans = inputChannelCount(for: id)
        guard inChans > 0 else { continue } // Skip output-only devices
        
        let transport = uint32Property(id, kAudioDevicePropertyTransportType, .global)
        
        // Classify device by transport type
        let cls: DeviceClass
        switch transport {
        case kAudioDeviceTransportTypeBluetooth, kAudioDeviceTransportTypeBluetoothLE:
            cls = .bluetooth
        case kAudioDeviceTransportTypeBuiltIn:
            cls = .builtIn
        case kAudioDeviceTransportTypeUSB,
             kAudioDeviceTransportTypeThunderbolt,
             kAudioDeviceTransportTypeFireWire,
             kAudioDeviceTransportTypePCI,
             kAudioDeviceTransportTypeHDMI:
            cls = .external
        default:
            cls = .other
        }
        
        // Skip Bluetooth entirely to avoid HFP activation
        guard cls != .bluetooth else { continue }
        
        // Create device info entry
        let name = stringProperty(id, kAudioObjectPropertyName, .global) ?? "Audio Device \(id)"
        let uid = stringProperty(id, kAudioDevicePropertyDeviceUID, .global) ?? ""
        
        results.append(AudioDeviceInfo(id: id, name: name, uid: uid, transport: transport, inputChannels: inChans, cls: cls))
    }
    
    return results.sorted { /* stable ordering by class, channels, name */ }
}

Integration with HAL AudioUnit

let picker = DevicePicker()
var currentInput: AudioDeviceInfo?

func startPreferredMicrophone() throws {
    // Use precedence: external first, then built-in
    guard let device = picker.pickBest(precedence: [.external, .builtIn]) else {
        throw NSError(domain: "FluidVoice", code: -10, 
                     userInfo: [NSLocalizedDescriptionKey: "No suitable input device found"])
    }
    
    currentInput = device
    try halMicSource.start(using: device.id)
    
    Logger.audioRecorder.infoDev("🎤 Selected device: \(device.name) (\(device.cls))")
}

func setupAutomaticDeviceReselection() {
    picker.installDeviceListListener { [weak self] in
        guard let self = self else { return }
        
        // Check if current device still exists
        let stillAvailable = (currentInput != nil) && deviceExists(currentInput!.id)
        
        if !stillAvailable {
            Logger.audioRecorder.infoDev("🔄 Current device disappeared, reselecting...")
            
            // Stop current recording if active, restart with new device
            if halMicSource.isRunning {
                halMicSource.stop()
                try? startPreferredMicrophone()
            }
        }
        // Don't switch if current device still works (avoid disrupting user)
    }
}

User Experience Features

Device Status Indicators

  • Show currently selected device in UI
  • Visual indicators for device class (🎧 External, 💻 Built-in)
  • Real-time status when devices are added/removed

User Overrides

enum DeviceSelectionMode {
    case automatic(precedence: [DeviceClass])
    case manual(deviceID: AudioDeviceID)
    case askUser
}

// Allow user to override automatic selection
func setDeviceSelectionMode(_ mode: DeviceSelectionMode) {
    UserDefaults.standard.set(mode, forKey: "deviceSelectionMode")
    // Apply immediately if recording
}

Smart Notifications

// Inform user about device changes
func notifyDeviceChange(from oldDevice: AudioDeviceInfo?, to newDevice: AudioDeviceInfo) {
    let message = "Switched to \(newDevice.name) for better audio quality"
    showTemporaryNotification(message)
}

Benefits of Smart Device Selection

Technical Advantages

  • Optimal Audio Quality: Always uses best available device
  • Zero Bluetooth Issues: Never selects BT devices, prevents HFP
  • Seamless Experience: Handles device changes transparently
  • Robust Fallback: Graceful degradation when devices disappear

User Experience Benefits

  • Plug-and-Play: External mics work immediately when connected
  • Consistent Quality: Automatically prefers high-quality external mics
  • Transparent Operation: User knows which device is being used
  • Manual Override: User can still force specific device if needed

Edge Cases Handled

  • Device Removal During Recording: Graceful fallback to next best device
  • Multiple External Devices: Stable ordering by channel count and name
  • Aggregate Devices: Optional inclusion with BT sub-device filtering
  • Hot Swapping: Defer device changes until recording stops (user choice)

Future Enhancements

Advanced Device Management

  • Multi-device Recording: Support simultaneous inputs from multiple sources
  • Device Profiles: Save per-device settings and preferences
  • Smart Switching: Context-aware device selection (meeting vs music)
  • Hardware Monitoring: Real-time device status and health monitoring

Professional Features

  • Audio Pipeline Visualization: Show signal flow for debugging
  • Low-latency Mode: Optimize for real-time applications
  • Format Flexibility: Support various sample rates and bit depths
  • Plugin Architecture: Enable third-party audio processors

References

Technical Documentation

Related FluidVoice Documentation

Industry Examples

  • Microsoft Teams: Direct device binding without system manipulation
  • Zoom: Professional audio pipeline with explicit device control
  • Discord: Low-latency audio with user-controlled device selection

Status

Completed - Successfully implemented with professional audio architecture

Implementation Results

🎯 Mission Accomplished - Eliminated system setting manipulation, achieved reliable hotkey performance, maintained Bluetooth A2DP mode

Final Performance Metrics

  • HAL AudioUnit Startup: 32.5-37.4ms (within professional audio standards)
  • System Call Elimination: 100% successful - zero expensive calls in hot path
  • Bluetooth HFP Prevention: 100% effective - A2DP mode preserved
  • Hotkey Reliability: Complete elimination of "🚫 Hotkey ignored" issues
  • System Setting Respect: Zero unauthorized modifications to user preferences

Architecture Achievement

  • Implemented direct HAL AudioUnit hardware binding
  • Native sample rate selection (48kHz for optimal performance)
  • Professional audio pipeline matching Teams/Zoom/Discord standards
  • Eliminated all Bluetooth prevention system calls
  • Maintained full audio quality and level monitoring

Edge Case Handling & Hot-Plug Support ✅

Device Removal During Runtime

Scenario: Pre-warmed device is physically removed while app is running

Implementation:

// AudioDeviceManager.swift - Device validation on every recording
if isValidInputDevice(deviceID: cachedDeviceID) {
    return cachedDeviceID  // Device still available
} else {
    selectedDeviceID = nil  // Invalidate cache, trigger reselection
}

Behavior:

  • Automatic Detection: isValidInputDevice() checks device availability
  • Graceful Fallback: Falls back to system default (typically built-in mic)
  • Performance: First recording after removal ~60ms (fallback), then 37ms (re-prewarmed)
  • Self-Healing: Updates pre-warm state to new device for subsequent recordings

Hot-Plug External Device Support

Scenario: User has external mic preference, starts app without external mic, then plugs it in during runtime

Implementation:

// AudioDeviceManager.swift - Always check user preference first
func getSelectedInputDevice() throws -> AudioDeviceID {
    // 🚀 ALWAYS check user preference first (hot-plug support)
    let selectedMicrophoneID = UserDefaults.standard.string(forKey: "selectedMicrophone") ?? ""
    
    if !selectedMicrophoneID.isEmpty {
        if let preferredDeviceID = findAudioDeviceID(for: selectedMicrophoneID) {
            // User preference now available! Update cache and use it
            if preferredDeviceID != selectedDeviceID {
                Logger.audioDeviceManager.infoDev("🎯 User preferred device now available")
                selectedDeviceID = preferredDeviceID
            }
            return preferredDeviceID
        }
        // Fallback to cached device if user preference unavailable
        else if let cachedDeviceID = selectedDeviceID {
            return cachedDeviceID
        }
    }
    // Final fallback: select best available device
    return try selectBestInputDevice()
}

Behavior:

  • Priority Logic: User preference always checked first, cache second
  • Hot-Plug Detection: Next recording automatically uses newly available preferred device
  • Seamless Switching: No app restart required, works transparently
  • Performance: First recording with new device ~60ms, subsequent recordings 37ms

Automatic Re-Prewarming After Fallback

Scenario: Any fallback device creation should optimize for next recording

Implementation:

// HALMicrophoneSource.swift - Update pre-warm state after successful fallback
// FALLBACK PATH: create new AudioUnit (device changed or not pre-warmed)
try createHALAudioUnit(deviceID: deviceID)
AudioUnitInitialize(audioUnit!)
let startResult = AudioOutputUnitStart(audioUnit!)

isRunning = true

// 🚀 OPTIMIZATION: Update pre-warm state for next recording
isPrewarmed = true
prewarmDeviceID = deviceID
Logger.audioRecorder.infoDev("🔧 Updated pre-warm state to new device")

Behavior:

  • One-Time Penalty: Only first recording after device change is slower (~60ms)
  • Performance Recovery: All subsequent recordings return to optimized 37ms timing
  • Intelligent Adaptation: System automatically adapts to hardware changes
  • Zero Manual Intervention: Works transparently without user action

Settings Change Behavior

Current Limitation: Manual microphone changes in Settings UI don't trigger automatic re-prewarming

Workaround:

  • Device manager cache invalidates correctly via refreshDeviceSelection()
  • Next recording will use correct device with fallback timing (~60ms)
  • Subsequent recordings return to optimized timing (37ms)

Future Enhancement: Could add UserDefaults observer for selectedMicrophone key changes to trigger immediate re-prewarming

Edge Case Test Matrix

Scenario Detection Fallback Performance Recovery
Device Removal ✅ Auto ✅ System Default 60ms → 37ms ✅ Re-prewarm
Hot-Plug Preferred ✅ Next Recording ✅ User Preference 60ms → 37ms ✅ Re-prewarm
Settings Change ✅ Cache Invalidation ✅ New Selection 60ms → 37ms ✅ Re-prewarm
Multiple Devices ✅ Priority Logic ✅ Best Available 60ms → 37ms ✅ Re-prewarm

Key Benefits of Edge Case Handling

  • Resilience: System gracefully handles all hardware changes
  • Performance: Minimal impact with automatic recovery to optimal timing
  • User Experience: Transparent operation without manual intervention required
  • Professional Grade: Matches behavior of professional audio applications

This feature represents a fundamental architectural improvement that aligns FluidVoice with professional audio application standards while eliminating user-hostile system setting manipulation.