Year of the Voice - Chapter 5

Set the device area (device page > pencil top right > Area)

Unfortunately area awareness doesn’t seem to be working for me. One of my LIFX bulbs is in Study, and my Wyoming device (a RasPi 3 running wyoming-satellite and wyoming-openwakeword) is set to in Study.

However the log from the device shows that it couldn’t understand “turn off the light” unless I called it “study light”.

Dec 16 22:42:48 HA-voice-2 run[1273]: Playing raw data 'stdin' : Signed 16 bit Little Endian, Rate 22050 Hz, Mono
Dec 16 22:42:53 HA-voice-2 run[945]: DEBUG:root:Streaming audio
Dec 16 22:42:53 HA-voice-2 run[945]: DEBUG:root:Connected to snd service
Dec 16 22:42:53 HA-voice-2 run[1275]: Playing raw data 'stdin' : Signed 16 bit Little Endian, Rate 22050 Hz, Mono
Dec 16 22:42:59 HA-voice-2 run[945]: DEBUG:root:Event(type='transcript', data={'text': ' Turn on study light.'}, payload=None)
Dec 16 22:42:59 HA-voice-2 run[945]: INFO:root:Waiting for wake word
Dec 16 22:42:59 HA-voice-2 run[945]: DEBUG:root:Connected to snd service
Dec 16 22:42:59 HA-voice-2 run[1277]: Playing raw data 'stdin' : Signed 16 bit Little Endian, Rate 22050 Hz, Mono
Dec 16 22:42:59 HA-voice-2 run[945]: DEBUG:root:Event(type='synthesize', data={'text': 'Turned on light', 'voice': {'name': 'en_GB-alan-low'}}, payload=None)
Dec 16 22:43:00 HA-voice-2 run[945]: DEBUG:root:Connected to snd service
Dec 16 22:43:00 HA-voice-2 run[1280]: Playing raw data 'stdin' : Signed 16 bit Little Endian, Rate 22050 Hz, Mono
Dec 16 23:18:02 HA-voice-2 run[945]: DEBUG:root:Streaming audio
Dec 16 23:18:02 HA-voice-2 run[945]: DEBUG:root:Connected to snd service
Dec 16 23:18:02 HA-voice-2 run[1312]: Playing raw data 'stdin' : Signed 16 bit Little Endian, Rate 22050 Hz, Mono
Dec 16 23:18:08 HA-voice-2 run[945]: DEBUG:root:Event(type='transcript', data={'text': ' Turn off the light.'}, payload=None)
Dec 16 23:18:08 HA-voice-2 run[945]: INFO:root:Waiting for wake word
Dec 16 23:18:08 HA-voice-2 run[945]: DEBUG:root:Event(type='synthesize', data={'text': "Sorry, I couldn't understand that", 'voice': {'name': 'en_GB-alan-low'}}, payload=None)
Dec 16 23:18:08 HA-voice-2 run[1314]: Playing raw data 'stdin' : Signed 16 bit Little Endian, Rate 22050 Hz, Mono
Dec 16 23:18:08 HA-voice-2 run[945]: DEBUG:root:Connected to snd service
Dec 16 23:18:09 HA-voice-2 run[945]: DEBUG:root:Connected to snd service
Dec 16 23:18:09 HA-voice-2 run[1317]: Playing raw data 'stdin' : Signed 16 bit Little Endian, Rate 22050 Hz, Mono
Dec 16 23:18:14 HA-voice-2 run[945]: DEBUG:root:Streaming audio
Dec 16 23:18:14 HA-voice-2 run[945]: DEBUG:root:Connected to snd service
Dec 16 23:18:14 HA-voice-2 run[1319]: Playing raw data 'stdin' : Signed 16 bit Little Endian, Rate 22050 Hz, Mono
Dec 16 23:18:19 HA-voice-2 run[945]: DEBUG:root:Event(type='transcript', data={'text': ' Turn off study light.'}, payload=None)
Dec 16 23:18:20 HA-voice-2 run[945]: INFO:root:Waiting for wake word
Dec 16 23:18:20 HA-voice-2 run[945]: DEBUG:root:Connected to snd service
Dec 16 23:18:20 HA-voice-2 run[1321]: Playing raw data 'stdin' : Signed 16 bit Little Endian, Rate 22050 Hz, Mono
Dec 16 23:18:20 HA-voice-2 run[945]: DEBUG:root:Event(type='synthesize', data={'text': 'Turned off light', 'voice': {'name': 'en_GB-alan-low'}}, payload=None)
Dec 16 23:18:21 HA-voice-2 run[945]: DEBUG:root:Connected to snd service

Thanks, that’s what I did, even though there’s no information written down anywhere.

Would be awesome if both Nabu Casa as well as third-parties could replicate the functionality and relatively high-quality audio of the speaker and microphone hardware in second-generation Google Nest Mini at under $99 or less with a powerful enough ARM-64 SoC to handle custom wake-words onboard.

IMHO, replicating the whole Google Home / Google Nest smart speaker and smart display series with fully Open-Source Hardware (OSH) compliant projects at affordable prices should preferably be the goal!

Great if we could order such products like we can build + make or buy the Home Assistant Yellow today!

1 Like

I would be very surprised if this ever happens, not because it would not be great, I think we all would love something like this but that hardware from google, Amazon, or Apple is actually worth far more money than you can buy them for. The companies are prepared to loose money when they sell them because they expect a financial return on the insight you provide by your voice data that they collect. I think this is part of the reason many of the original suppliers are loosing money with their smart speaker decisions and closing or reducing services. This would certainly not be possible with any open source project. This doesn’t mean I wouldn’t love to see it but I won’t hold my breath on this one.

2 Likes

Which quality aspect is hurting the most for you? Reliable wake word? Range? Hardware build quality? Reliable command recognition?

Why I’m asking: with year of the voice where it stands, I’m wondering how much of the delta between HA on the one side, and google/alexa on the other side, is due to hardware, code, training data, algorithms, …
Depending on the ratio, the potential to close the gap might be anything between “totally doable” and “sorry, this is the maximum for a non-billionaire-run company”

1 Like

Can you - or will you be able to - use the Box-3’s screen to display sensors, buttons and the like? I currently use an NSPanel with blackymas’ excellent blueprint and am experimenting with M5Stack Core2 running OpenHASP. It would be amazing if the Box-3 acted as both a voice satellite and a touch-screen interface for HA at the same time.

Same for me

Unfortunately that was a minor oversight where the only sentence that works is with the plural form. I have an open PR that sets straight both this and also makes (basically all) other supported voice actions area aware.

In the meantime, use “turn of the lights” and it will work.

2 Likes

I display HA sensor with clock right now together with being voice satellite, once it’s idle. Just modified their yaml a bit.

1 Like

Answering to myself. You have to define separate sentence with context requirement like this:

- sentences:
      - "(whats | what's | what is) [current] humidity"
  requires_context:
      area:
          slot: true

Then area name will be in slot, and can be used for services.

100% this, expecting an open source software project to provide you with hardware at a similar price point and spec to what for-profit companies produce by the millions as a loss leader to mine your personal data, well that’s just not going to happen.

The price that Nabu Casa would have to sell these small-batch, niche devices for just to break even would lose the interest of most people.

2 Likes

Are you sharing your code?

This is awesome, but speach recognition in Spanish is awful. I haven’t been able to get any sentence correctly recognised in local.
I like assist, but I can’t even make tests of it because of this.
Until this is not solved, the year of the voice is only for English or for nabu cloud, but the local solution which is the goal many of us, not.

1 Like

My understanding is that hardware currently available is up to the task - the limitation is that the Digital Signal Processing software magic is pretty much all currently proprietary.

Seeed have demos of using the multiple mics on their reSpeaker products - but their demos are not open source :frowning: and their device driver uses only one mic. Espressif’s software for the ESP-S3 BOX is really impressive - but only works with their IDE, uses several key proprietary modules, and doesn’t do more than their demo. To be fair, neither of these companies is trying to sell product direct to public - they are selling the building blocks for other companies to use in their own products. And these other companies are (quite reasonably) wanting to get the most advantage from their own software investment … which means locking customers in to cloud servers :frowning:

Agreed that no FOSS project can subsidise the hardware.

BUT it seems to me that we are currently spending about twice as much (for ESP-S3 BOX, RasPi + Jabra conference speaker, RasPi + reSpeaker, etc) as the for-profit mass-produced devices, and getting lower quality - most of which is the lack of DSP. I believe that with better firmware/software that price difference can be justified.

Mike has hinted several times throughout the year that Nabu Casa do see voice satellites as a key component of Home Assistant, and are closely watching developments in that area. In Chapter 5 we see that Nabu Casa have put in effort to bring the ESP-32-S3 BOX 3 from ESP-IDE to ESPHome. I understand that the ESP32-S3 chip includes some specialised AI hardware which could potentially be used to improve sound quality, wakeword detection, and some of the other DSP magic. I don’t care if some parts of it are proprietary and source code is not available (like RasPi video) - more important that it works reliably, independently, and LOCALLY.

The ESP S3 BOX 3 with ESPHome seems pretty close to my ideal voice satellite … and yet it was developed to showcase ESP’s hardware, not to sell direct to public as a finished product. Once the software is more mature, I anticipate Nabu Casa could leverage its ESP expertise to tweak the hardware and market a Voice Assistant Satellite, in much the same way that Home Assistant Blue, Yellow and Green are not totally new products developed from the ground up. Did “these small-batch, niche devices” “lose the interest of most people” and loose money for Nabu Casa ?

Just my thoughts. Nabu Casa will make its own decisions based on its own business intelligence and other factors. For the present time, with Nabu Casa support I am happy to recommend ESP-S3 BOX 3 running ESPHome as preferred voice satellite, with M5 ATOM Echo as an option to have lots of cheap “ears” around the house.

2 Likes

To not clogg the thread with too generic content not related specifically to chapter 5, I set up a dedicated poll over in the voice assistant section of the forum:

2 Likes

I completely agree. Just wanted to add, that BOX/Atom solution should have local wake word recognition first - otherwise HA server will be pretty overloaded. It’s on roadmap, and Espressif demo does it - so it’s possible. Fingers crossed. :slight_smile:

Why? I don’t think it’s stable enough to use, if it’s not included by HA devs themselves. I can share, but if you can’t do it yourself, is it worth bringing more potential bugs? :slight_smile:

it’s seems that this feature isn’t available in french; so i opened a issue in GH

1 Like

I can’t remember anymore :smile: A Wyoming/Hermes bridge has been on my TODO list forever. For the purposes of controlling LEDs, the bridge could be pretty simple as you just need to know about specific states.