====== 🎤 Gemini Voice Assistant Project ======

[[projects:start|📂 Projects]] > **🎤 Gemini Voice**

===== 🌟 Project Overview =====
A real-time, low-latency voice assistant integrated into the Kubuntu 25.10 environment. This project utilizes the **Gemini 2.5 Flash Multimodal Live API** to provide bidirectional audio interaction, system-level automation, and local reasoning capabilities.

===== ⚙️ Technical Architecture =====
==== AI & Software Stack ====
^ Component ^ Technology ^ Role ^
| AI Model | `gemini-2.5-flash-native-audio-latest` | Multimodal Live API Core |
| Language | Python 3.11+ | Backend Implementation |
| Audio API | `PyAudio` (PortAudio) | Real-time PCM Streaming |
| Local LLM | Ollama (`gemma2:2b`) | Offline & Complex Reasoning |
| Automation | `ydotool` | System Input Injection |

==== Audio Routing (System Default Strategy) ====
The system is designed to be **Hardware Agnostic** by targeting the PulseAudio/PipeWire **System Default** streams. This allows the user to manually route the assistant to any connected device (Bluetooth headsets, USB mics) using standard KDE tools.

===== 🛠️ Configuration & Routing =====
==== Connecting a Device ====
1. **Pair Hardware:** Connect Bluetooth headset or USB microphone.
2. **Open Sound Settings:** Use `pavucontrol` or KDE Sound Settings.
3. **Identify Streams:** Look for the "python3" or "ALSA" capture/playback streams.
4. **Set Routes:** 
   * **Playback:** Route to your preferred speakers/headset.
   * **Recording:** Route to your preferred microphone.

==== Virtual Sinks (Advanced) ====
Optional routing can be established via `bash gemini-voice/setup_audio.sh` to create loopbacks or dedicated assistant sinks.

===== 🚀 Core Features =====
* **Live Audio Streaming:** Low-latency bidirectional PCM audio.
* **CLI Approval Watcher:** Verbally notifies the user when the Gemini CLI requires confirmation.
* **Remote Approval Integration:** Works alongside [[projects:kde_approval|KDE Approval]] for multi-modal confirmation.
* **Input Injection:** Can trigger 'y' + 'Enter' via voice command using `ydotool`.

===== 📂 Project Structure =====
* `live_voice.py`: Main application loop and API handler.
* `setup_audio.sh`: Utility for creating virtual audio sinks.
* `check_audio.py`: Diagnostic tool to list available hardware devices.
* `memory.txt`: Persistent storage for assistant context.
  * [[projects:gemini_voice:audio_evolution|🔊 Audio Evolution Log]] -- History of failed/successful setups.