🎤 Gemini Voice V2: Audio Architecture Evolution

This page tracks the different audio setups attempted during the V2 development to prevent repeating failed configurations.

1. The ALSA/PyAudio Attempt (V1 Style)

Setup: Used standard PyAudio streams with standard ALSA device names.
Result: FAILED.
Failure Mode: “Expression 'alsa_snd_pcm_mmap_begin' failed.”
Lessons: PyAudio struggled with ALSA memory mapping on Kubuntu 25.10 when multiple system apps held the audio device.

Setup: Switched to \`sounddevice\` for robustness. Microphone and Speaker were on separate \`InputStream\` and \`OutputStream\` threads.
Result: UNSTABLE.
Failure Mode: “1011 keepalive ping timeout.”
Lessons: The separate hardware threads occasionally blocked the \`asyncio\` event loop, preventing the WebSocket from answering server pings.

Setup: Mic was muted while the assistant was speaking to prevent echo.
Result: FAILED.
Failure Mode: Assistant would answer once, then lock up.
Lessons: The VAD (Voice Activity Detection) state was getting stuck, and the AI was confused by the hard cut in audio data.

Setup: Uses a single \`sd.Stream\` (Duplex) which handles both Mic and Speaker in one hardware-managed thread.
Result: ACTIVE / TESTING.
Key Features:
- 16kHz Unified Rate: Maximizes Bluetooth (q20i) and network stability.
- Thread-Safe Queues: Decouples hardware from the WebSocket loop completely.
- Ollama Integration: Gemini uses function calling to route system tasks to local brain.
- Self-Healing Loop: Background \`while\` loop automatically restarts on 1011 errors.