π Projects > π€ Gemini Voice
A real-time, low-latency voice assistant integrated into the Kubuntu 25.10 environment. This project utilizes the Gemini 2.5 Flash Multimodal Live API to provide bidirectional audio interaction, system-level automation, and local reasoning capabilities.
| Component | Technology | Role |
|---|---|---|
| AI Model | `gemini-2.5-flash-native-audio-latest` | Multimodal Live API Core |
| Language | Python 3.11+ | Backend Implementation |
| Audio API | `PyAudio` (PortAudio) | Real-time PCM Streaming |
| Local LLM | Ollama (`gemma2:2b`) | Offline & Complex Reasoning |
| Automation | `ydotool` | System Input Injection |
The system is designed to be Hardware Agnostic by targeting the PulseAudio/PipeWire System Default streams. This allows the user to manually route the assistant to any connected device (Bluetooth headsets, USB mics) using standard KDE tools.
1. Pair Hardware: Connect Bluetooth headset or USB microphone. 2. Open Sound Settings: Use `pavucontrol` or KDE Sound Settings. 3. Identify Streams: Look for the βpython3β or βALSAβ capture/playback streams. 4. Set Routes:
Optional routing can be established via `bash gemini-voice/setup_audio.sh` to create loopbacks or dedicated assistant sinks.
* Live Audio Streaming: Low-latency bidirectional PCM audio. * CLI Approval Watcher: Verbally notifies the user when the Gemini CLI requires confirmation. * Remote Approval Integration: Works alongside KDE Approval for multi-modal confirmation. * Input Injection: Can trigger 'y' + 'Enter' via voice command using `ydotool`.
* `live_voice.py`: Main application loop and API handler. * `setup_audio.sh`: Utility for creating virtual audio sinks. * `check_audio.py`: Diagnostic tool to list available hardware devices. * `memory.txt`: Persistent storage for assistant context.