Table of Contents

🎀 Gemini Voice Assistant Project

πŸ“‚ Projects > 🎀 Gemini Voice

🌟 Project Overview

A real-time, low-latency voice assistant integrated into the Kubuntu 25.10 environment. This project utilizes the Gemini 2.5 Flash Multimodal Live API to provide bidirectional audio interaction, system-level automation, and local reasoning capabilities.

βš™οΈ Technical Architecture

AI & Software Stack

Component Technology Role
AI Model `gemini-2.5-flash-native-audio-latest` Multimodal Live API Core
Language Python 3.11+ Backend Implementation
Audio API `PyAudio` (PortAudio) Real-time PCM Streaming
Local LLM Ollama (`gemma2:2b`) Offline & Complex Reasoning
Automation `ydotool` System Input Injection

Audio Routing (System Default Strategy)

The system is designed to be Hardware Agnostic by targeting the PulseAudio/PipeWire System Default streams. This allows the user to manually route the assistant to any connected device (Bluetooth headsets, USB mics) using standard KDE tools.

πŸ› οΈ Configuration & Routing

Connecting a Device

1. Pair Hardware: Connect Bluetooth headset or USB microphone. 2. Open Sound Settings: Use `pavucontrol` or KDE Sound Settings. 3. Identify Streams: Look for the β€œpython3” or β€œALSA” capture/playback streams. 4. Set Routes:

Virtual Sinks (Advanced)

Optional routing can be established via `bash gemini-voice/setup_audio.sh` to create loopbacks or dedicated assistant sinks.

πŸš€ Core Features

* Live Audio Streaming: Low-latency bidirectional PCM audio. * CLI Approval Watcher: Verbally notifies the user when the Gemini CLI requires confirmation. * Remote Approval Integration: Works alongside KDE Approval for multi-modal confirmation. * Input Injection: Can trigger 'y' + 'Enter' via voice command using `ydotool`.

πŸ“‚ Project Structure

* `live_voice.py`: Main application loop and API handler. * `setup_audio.sh`: Utility for creating virtual audio sinks. * `check_audio.py`: Diagnostic tool to list available hardware devices. * `memory.txt`: Persistent storage for assistant context.