====== 🎤 Gemini Voice Assistant Project ====== [[projects:start|📂 Projects]] > **🎤 Gemini Voice** ===== 🌟 Project Overview ===== A real-time, low-latency voice assistant integrated into the Kubuntu 25.10 environment. This project utilizes the **Gemini 2.5 Flash Multimodal Live API** to provide bidirectional audio interaction, system-level automation, and local reasoning capabilities. ===== ⚙️ Technical Architecture ===== ==== AI & Software Stack ==== ^ Component ^ Technology ^ Role ^ | AI Model | `gemini-2.5-flash-native-audio-latest` | Multimodal Live API Core | | Language | Python 3.11+ | Backend Implementation | | Audio API | `PyAudio` (PortAudio) | Real-time PCM Streaming | | Local LLM | Ollama (`gemma2:2b`) | Offline & Complex Reasoning | | Automation | `ydotool` | System Input Injection | ==== Audio Routing (System Default Strategy) ==== The system is designed to be **Hardware Agnostic** by targeting the PulseAudio/PipeWire **System Default** streams. This allows the user to manually route the assistant to any connected device (Bluetooth headsets, USB mics) using standard KDE tools. ===== 🛠️ Configuration & Routing ===== ==== Connecting a Device ==== 1. **Pair Hardware:** Connect Bluetooth headset or USB microphone. 2. **Open Sound Settings:** Use `pavucontrol` or KDE Sound Settings. 3. **Identify Streams:** Look for the "python3" or "ALSA" capture/playback streams. 4. **Set Routes:** * **Playback:** Route to your preferred speakers/headset. * **Recording:** Route to your preferred microphone. ==== Virtual Sinks (Advanced) ==== Optional routing can be established via `bash gemini-voice/setup_audio.sh` to create loopbacks or dedicated assistant sinks. ===== 🚀 Core Features ===== * **Live Audio Streaming:** Low-latency bidirectional PCM audio. * **CLI Approval Watcher:** Verbally notifies the user when the Gemini CLI requires confirmation. * **Remote Approval Integration:** Works alongside [[projects:kde_approval|KDE Approval]] for multi-modal confirmation. * **Input Injection:** Can trigger 'y' + 'Enter' via voice command using `ydotool`. ===== 📂 Project Structure ===== * `live_voice.py`: Main application loop and API handler. * `setup_audio.sh`: Utility for creating virtual audio sinks. * `check_audio.py`: Diagnostic tool to list available hardware devices. * `memory.txt`: Persistent storage for assistant context. * [[projects:gemini_voice:audio_evolution|🔊 Audio Evolution Log]] -- History of failed/successful setups.