HA voice control interests me. So, I installed Almond. After that everything requires skills of Sherlock Holmes.

There seems to be no instructions nor place to discuss about it?

Home Assistant integration in page seems to be telling about some different thing or at least different version than Almond seems to be today. It even links to some “genie-server”, which I don’t have any idea what that is.

I could find hardly any discussion about Almond as it is today (ver 2.0.1) . All that the addon says is:

  1. Start the add-on.
  2. Go to Home Assistant frontend → ConfigurationIntegrations to configure Home Assistant to use this add-on. After starting, it will be automatically discovered by Home Assistant.

…which I did. The configuration was just a press of a button. Almond itself seems to be working and reacting to written commands in “conversation”, but I cannot hear anything from speaker and Almond does not react to microphone.

Should it? No instructions again to be found. The addon says that no configuration needed, still there is a possibility to select audio input and output. This not documented anywhere. My audio devices are listed, but fiddling about with these cause no effect to anything. Log does not tell anything about audio.

I’m not even sure is this a right place to ask about it. I could not dind other places either. I do not know whether Almond even is regarded as a “Third party integration” or what it is…

So, how should I progress with audio? I suppose this is HA/ configuration related, not an Almond issue, but what do I know…

(Home Assistant OS 6.6 + core-2021.11.5 + Almond 2.0.1 on Raspberry Pi 4B 4GB, tried with Jabra 501 and some Microsoft USB headset)

I think Rhasspy is the voice project that have the forward trust at the moment.

Unless things have drastically changed in the past several months, I would guess you won’t be satisfied doing stt commands with just a pi4. Last I read on this, you need more like an i5-7 to have a chance (without help of cloud computers anyways).

Rhasspy is actually not that bad, but it require a bit of work, because there is no automatical speak-to-text of natural speak. You have to define all the possible sentences you want to use.

