User Tools

Site Tools


projects:gemini_voice:audio_evolution

🎤 Gemini Voice V2: Audio Architecture Evolution

This page tracks the different audio setups attempted during the V2 development to prevent repeating failed configurations.

1. The ALSA/PyAudio Attempt (V1 Style)

  • Setup: Used standard PyAudio streams with standard ALSA device names.
  • Result: FAILED.
  • Failure Mode: “Expression 'alsa_snd_pcm_mmap_begin' failed.”
  • Lessons: PyAudio struggled with ALSA memory mapping on Kubuntu 25.10 when multiple system apps held the audio device.

2. The Sounddevice Split-Thread Setup

  • Setup: Switched to \`sounddevice\` for robustness. Microphone and Speaker were on separate \`InputStream\` and \`OutputStream\` threads.
  • Result: UNSTABLE.
  • Failure Mode: “1011 keepalive ping timeout.”
  • Lessons: The separate hardware threads occasionally blocked the \`asyncio\` event loop, preventing the WebSocket from answering server pings.

3. The Half-Duplex (Mic Muting) Setup

  • Setup: Mic was muted while the assistant was speaking to prevent echo.
  • Result: FAILED.
  • Failure Mode: Assistant would answer once, then lock up.
  • Lessons: The VAD (Voice Activity Detection) state was getting stuck, and the AI was confused by the hard cut in audio data.

4. The "Tank" Duplex Engine (Final Production)

  • Setup: Uses a single \`sd.Stream\` (Duplex) which handles both Mic and Speaker in one hardware-managed thread.
  • Result: ACTIVE / TESTING.
  • Key Features:
    • 16kHz Unified Rate: Maximizes Bluetooth (q20i) and network stability.
    • Thread-Safe Queues: Decouples hardware from the WebSocket loop completely.
    • Ollama Integration: Gemini uses function calling to route system tasks to local brain.
    • Self-Healing Loop: Background \`while\` loop automatically restarts on 1011 errors.

🛠️ Technical Baseline

  • Library: \`sounddevice\` + \`numpy\`
  • Transport: Multimodal Live API (v1alpha WebSockets)
  • Encoding: 16-bit Little Endian PCM
projects/gemini_voice/audio_evolution.txt · Last modified: by 127.0.0.1

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki