====== 🎤 Gemini Voice V2: Audio Architecture Evolution ====== This page tracks the different audio setups attempted during the V2 development to prevent repeating failed configurations. ===== 1. The ALSA/PyAudio Attempt (V1 Style) ===== * **Setup:** Used standard PyAudio streams with standard ALSA device names. * **Result:** **FAILED.** * **Failure Mode:** "Expression 'alsa_snd_pcm_mmap_begin' failed." * **Lessons:** PyAudio struggled with ALSA memory mapping on Kubuntu 25.10 when multiple system apps held the audio device. ===== 2. The Sounddevice Split-Thread Setup ===== * **Setup:** Switched to \`sounddevice\` for robustness. Microphone and Speaker were on separate \`InputStream\` and \`OutputStream\` threads. * **Result:** **UNSTABLE.** * **Failure Mode:** "1011 keepalive ping timeout." * **Lessons:** The separate hardware threads occasionally blocked the \`asyncio\` event loop, preventing the WebSocket from answering server pings. ===== 3. The Half-Duplex (Mic Muting) Setup ===== * **Setup:** Mic was muted while the assistant was speaking to prevent echo. * **Result:** **FAILED.** * **Failure Mode:** Assistant would answer once, then lock up. * **Lessons:** The VAD (Voice Activity Detection) state was getting stuck, and the AI was confused by the hard cut in audio data. ===== 4. The "Tank" Duplex Engine (Final Production) ===== * **Setup:** Uses a single \`sd.Stream\` (Duplex) which handles both Mic and Speaker in one hardware-managed thread. * **Result:** **ACTIVE / TESTING.** * **Key Features:** * **16kHz Unified Rate:** Maximizes Bluetooth (q20i) and network stability. * **Thread-Safe Queues:** Decouples hardware from the WebSocket loop completely. * **Ollama Integration:** Gemini uses function calling to route system tasks to local brain. * **Self-Healing Loop:** Background \`while\` loop automatically restarts on 1011 errors. ===== 🛠️ Technical Baseline ===== * **Library:** \`sounddevice\` + \`numpy\` * **Transport:** Multimodal Live API (v1alpha WebSockets) * **Encoding:** 16-bit Little Endian PCM