GLaSSIST - Desktop Voice Assistant for Home Assistant (Windows)
Hey everyone!
Iâm a lazy bastard with terrible memory, and clicking through HA interface or opening browser tabs was driving me nuts. So I built myself a desktop voice assistant that actually works.
What it does
Sits in your system tray, listens for wake words like âAlexaâ or âHey Jarvisâ, then connects to HA Assist to process voice commands. Simple as that.
Say âAlexa, turn on kitchen lightsâ - lights turn on. âAdd milk to shopping listâ - gets added. âWhen is my next meetingâ - tells you. No clicking, no opening browsers, no bullshit.
Key features
- 100+ wake word models - Alexa, Hey Jarvis, Computer, whatever you want
- WebRTC VAD - actually knows when youâre speaking vs background noise
- Real-time audio visualization - because why not make it look cool
- Proper Windows integration - lives in system tray, hotkey support
- GUI settings - no YAML editing required
- Pipeline selection - works with your existing HA Assist setup
Why I built this
Because Iâm forgetful and lazy. Voice commands should be instant, not require opening apps or remembering where buttons are. This thing just sits there waiting for me to talk to it.
Been using it to manage my smart home, calendar, and task lists through HA. Works way better than pulling out my phone or opening browser tabs.
Download & Setup
Installer - https://github.com/SmolinskiP/GLaSSIST/releases/download/glassist/GLaSSIST-Setup.exe
Requirements:
- Windows (sorry, not porting to other platforms)
- Home Assistant with Assist enabled
- Long-lived access token
Setup is pretty straightforward - download, run, enter your HA details in the GUI, pick your wake words, done.
Looking for feedback
If anyone wants to test this thing, Iâd appreciate feedback on:
- Wake word detection accuracy on different microphones
- Performance/CPU usage
- UI usability
- Any bugs or crashes
Confidence level: 80% - works great for me and a few others, but you know how it is with code⌠âworks on my machineâ
Been thinking about adding more features but figured Iâd see if anyone else finds this useful first.
Technical Details
For those interested in the implementation:
Architecture:
- Python backend with asyncio WebSocket client for HA API
- Three.js frontend with GLSL shaders for audio visualization
- WebRTC VAD for speech detection (supports 8/16/32kHz sample rates)
- openWakeWord integration with ONNX models (Windows optimized)
Audio Processing:
- Real-time FFT analysis for visual feedback
- Configurable VAD sensitivity (0-3 levels)
- Smart silence detection with customizable thresholds
- Frame duration options: 10/20/30ms for optimal performance
Wake Word Detection:
- Pre-trained models converted to ONNX format for Windows compatibility
- Support for custom model training via Google Colab
- Adaptive thresholds and noise suppression
- Multiple simultaneous wake word support
Integration:
- Full HA Assist pipeline support with automatic discovery
- WebSocket communication with proper error handling
- System tray integration with hotkey support (configurable)
- Transparent overlay window that doesnât interfere with desktop usage
Configuration:
- GUI settings dialog with connection testing
- Pipeline selection with real-time validation
- Audio device detection and configuration
- Debug mode with detailed logging
The app uses a WebSocket server (default port 8765) to communicate between the Python backend and Three.js frontend, enabling real-time audio visualization and state management.
Thanks!