🎤 Gemini Voice Assistant Project

🎤 Gemini Voice Assistant Project

📂 Projects > 🎤 Gemini Voice

🌟 Project Overview

A real-time, low-latency voice assistant integrated into the Kubuntu 25.10 environment. This project utilizes the Gemini 2.5 Flash Multimodal Live API to provide bidirectional audio interaction, system-level automation, and local reasoning capabilities.

⚙️ Technical Architecture

AI & Software Stack

Component	Technology	Role
AI Model	`gemini-2.5-flash-native-audio-latest`	Multimodal Live API Core
Language	Python 3.11+	Backend Implementation
Audio API	`PyAudio` (PortAudio)	Real-time PCM Streaming
Local LLM	Ollama (`gemma2:2b`)	Offline & Complex Reasoning
Automation	`ydotool`	System Input Injection

Audio Routing (System Default Strategy)

The system is designed to be Hardware Agnostic by targeting the PulseAudio/PipeWire System Default streams. This allows the user to manually route the assistant to any connected device (Bluetooth headsets, USB mics) using standard KDE tools.

🛠️ Configuration & Routing

Connecting a Device

1. Pair Hardware: Connect Bluetooth headset or USB microphone. 2. Open Sound Settings: Use `pavucontrol` or KDE Sound Settings. 3. Identify Streams: Look for the “python3” or “ALSA” capture/playback streams. 4. Set Routes:

Playback: Route to your preferred speakers/headset.
Recording: Route to your preferred microphone.

Virtual Sinks (Advanced)

Optional routing can be established via `bash gemini-voice/setup_audio.sh` to create loopbacks or dedicated assistant sinks.

🚀 Core Features

* Live Audio Streaming: Low-latency bidirectional PCM audio. * CLI Approval Watcher: Verbally notifies the user when the Gemini CLI requires confirmation. * Remote Approval Integration: Works alongside KDE Approval for multi-modal confirmation. * Input Injection: Can trigger 'y' + 'Enter' via voice command using `ydotool`.

📂 Project Structure

* `live_voice.py`: Main application loop and API handler. * `setup_audio.sh`: Utility for creating virtual audio sinks. * `check_audio.py`: Diagnostic tool to list available hardware devices. * `memory.txt`: Persistent storage for assistant context.

🔊 Audio Evolution Log – History of failed/successful setups.

Table of Contents