Table of Contents
π€ Gemini Voice Assistant Project
π Projects > π€ Gemini Voice
π Project Overview
A real-time, low-latency voice assistant integrated into the Kubuntu 25.10 environment. This project utilizes the Gemini 2.5 Flash Multimodal Live API to provide bidirectional audio interaction, system-level automation, and local reasoning capabilities.
βοΈ Technical Architecture
AI & Software Stack
| Component | Technology | Role |
|---|---|---|
| AI Model | `gemini-2.5-flash-native-audio-latest` | Multimodal Live API Core |
| Language | Python 3.11+ | Backend Implementation |
| Audio API | `PyAudio` (PortAudio) | Real-time PCM Streaming |
| Local LLM | Ollama (`gemma2:2b`) | Offline & Complex Reasoning |
| Automation | `ydotool` | System Input Injection |
Audio Routing (System Default Strategy)
The system is designed to be Hardware Agnostic by targeting the PulseAudio/PipeWire System Default streams. This allows the user to manually route the assistant to any connected device (Bluetooth headsets, USB mics) using standard KDE tools.
π οΈ Configuration & Routing
Connecting a Device
1. Pair Hardware: Connect Bluetooth headset or USB microphone. 2. Open Sound Settings: Use `pavucontrol` or KDE Sound Settings. 3. Identify Streams: Look for the βpython3β or βALSAβ capture/playback streams. 4. Set Routes:
- Playback: Route to your preferred speakers/headset.
- Recording: Route to your preferred microphone.
Virtual Sinks (Advanced)
Optional routing can be established via `bash gemini-voice/setup_audio.sh` to create loopbacks or dedicated assistant sinks.
π Core Features
* Live Audio Streaming: Low-latency bidirectional PCM audio. * CLI Approval Watcher: Verbally notifies the user when the Gemini CLI requires confirmation. * Remote Approval Integration: Works alongside KDE Approval for multi-modal confirmation. * Input Injection: Can trigger 'y' + 'Enter' via voice command using `ydotool`.
π Project Structure
* `live_voice.py`: Main application loop and API handler. * `setup_audio.sh`: Utility for creating virtual audio sinks. * `check_audio.py`: Diagnostic tool to list available hardware devices. * `memory.txt`: Persistent storage for assistant context.
- π Audio Evolution Log β History of failed/successful setups.
