Year of the Voice - Chapter 5

Can you - or will you be able to - use the Box-3’s screen to display sensors, buttons and the like? I currently use an NSPanel with blackymas’ excellent blueprint and am experimenting with M5Stack Core2 running OpenHASP. It would be amazing if the Box-3 acted as both a voice satellite and a touch-screen interface for HA at the same time.

Same for me

Unfortunately that was a minor oversight where the only sentence that works is with the plural form. I have an open PR that sets straight both this and also makes (basically all) other supported voice actions area aware.

In the meantime, use “turn of the lights” and it will work.

2 Likes

I display HA sensor with clock right now together with being voice satellite, once it’s idle. Just modified their yaml a bit.

1 Like

Answering to myself. You have to define separate sentence with context requirement like this:

- sentences:
      - "(whats | what's | what is) [current] humidity"
  requires_context:
      area:
          slot: true

Then area name will be in slot, and can be used for services.

100% this, expecting an open source software project to provide you with hardware at a similar price point and spec to what for-profit companies produce by the millions as a loss leader to mine your personal data, well that’s just not going to happen.

The price that Nabu Casa would have to sell these small-batch, niche devices for just to break even would lose the interest of most people.

2 Likes

Are you sharing your code?

This is awesome, but speach recognition in Spanish is awful. I haven’t been able to get any sentence correctly recognised in local.
I like assist, but I can’t even make tests of it because of this.
Until this is not solved, the year of the voice is only for English or for nabu cloud, but the local solution which is the goal many of us, not.

1 Like

My understanding is that hardware currently available is up to the task - the limitation is that the Digital Signal Processing software magic is pretty much all currently proprietary.

Seeed have demos of using the multiple mics on their reSpeaker products - but their demos are not open source :frowning: and their device driver uses only one mic. Espressif’s software for the ESP-S3 BOX is really impressive - but only works with their IDE, uses several key proprietary modules, and doesn’t do more than their demo. To be fair, neither of these companies is trying to sell product direct to public - they are selling the building blocks for other companies to use in their own products. And these other companies are (quite reasonably) wanting to get the most advantage from their own software investment … which means locking customers in to cloud servers :frowning:

Agreed that no FOSS project can subsidise the hardware.

BUT it seems to me that we are currently spending about twice as much (for ESP-S3 BOX, RasPi + Jabra conference speaker, RasPi + reSpeaker, etc) as the for-profit mass-produced devices, and getting lower quality - most of which is the lack of DSP. I believe that with better firmware/software that price difference can be justified.

Mike has hinted several times throughout the year that Nabu Casa do see voice satellites as a key component of Home Assistant, and are closely watching developments in that area. In Chapter 5 we see that Nabu Casa have put in effort to bring the ESP-32-S3 BOX 3 from ESP-IDE to ESPHome. I understand that the ESP32-S3 chip includes some specialised AI hardware which could potentially be used to improve sound quality, wakeword detection, and some of the other DSP magic. I don’t care if some parts of it are proprietary and source code is not available (like RasPi video) - more important that it works reliably, independently, and LOCALLY.

The ESP S3 BOX 3 with ESPHome seems pretty close to my ideal voice satellite … and yet it was developed to showcase ESP’s hardware, not to sell direct to public as a finished product. Once the software is more mature, I anticipate Nabu Casa could leverage its ESP expertise to tweak the hardware and market a Voice Assistant Satellite, in much the same way that Home Assistant Blue, Yellow and Green are not totally new products developed from the ground up. Did “these small-batch, niche devices” “lose the interest of most people” and loose money for Nabu Casa ?

Just my thoughts. Nabu Casa will make its own decisions based on its own business intelligence and other factors. For the present time, with Nabu Casa support I am happy to recommend ESP-S3 BOX 3 running ESPHome as preferred voice satellite, with M5 ATOM Echo as an option to have lots of cheap “ears” around the house.

2 Likes

To not clogg the thread with too generic content not related specifically to chapter 5, I set up a dedicated poll over in the voice assistant section of the forum:

2 Likes

I completely agree. Just wanted to add, that BOX/Atom solution should have local wake word recognition first - otherwise HA server will be pretty overloaded. It’s on roadmap, and Espressif demo does it - so it’s possible. Fingers crossed. :slight_smile:

Why? I don’t think it’s stable enough to use, if it’s not included by HA devs themselves. I can share, but if you can’t do it yourself, is it worth bringing more potential bugs? :slight_smile:

it’s seems that this feature isn’t available in french; so i opened a issue in GH

1 Like

I can’t remember anymore :smile: A Wyoming/Hermes bridge has been on my TODO list forever. For the purposes of controlling LEDs, the bridge could be pretty simple as you just need to know about specific states.

I have BOX-3 / 2 Atom Echo running only local. Very happy with the 3 separate wake words I created for them. As a novice I very happy with them, fast wake word action and replies. Would love to use the presence detection as well.

For now, a happy novice camper.

Whisper (1.0.0), Piper (1.4.0), openWakeWord (1.8.2), ESPHome (beta) (2023.11.6)

2 Likes

I am happy to see the Year of the Voice working well for ESP32 & RPi devices.

We standardized on Galaxy Tab A8’s for their performance, price, and high resolution screen. These Android tablets have sufficient processing power to handle a lot of the functionality but I am limited because I have to use Fully-Kiosk browser for these always on wall panels. WallPanel.xyz is not viable because Assist does not function at all.

I would love to see the native Home Assistant Android app to provide similar functionality including on-device wake word detection.

4 Likes

Would you consider posting this or the equiv of this on this
DIY Examples — ESPHome.
page to make it more available to others?

I think it would be very helpful to others and it would be located in a place where many might look for it.
If there is a better place to document this, that would be OK as well. I couldn’t think of one, however. I wanted to be able to do this, and bumped into a link to here in an [FR] forum post, so thanks for your insight. It just was kind of buried here low in another thread.

Nokia do quite an interesting tablet due to its twin mic design and audio dsp with its OZO Spatial Audio Recording and Playback software.
The Nokia T21 is relatively budget but all metal 10.4" for £175 and you can pick up the relatively similar T20 for £100.

There is also the cheaper Acatel which also has x2 Mic and starts @ £100 and lower.
It also boosts Noise cancellation: +3m far-field.

Likely software could be written for.

I spent some time before Xmas and purchased x2 1080p pan tilt cams that many will see for $25 which have 2 way audio.
I got a IMUI Ranger2 and just a generic and the audio on both when it comes to audio out and mic is pretty awful.
So I scrapped that idea until a manufacturer maybe does include a isolated far field mic as think as a device they could be perfect.

The tablets though and the above are just the best budget ones I could find that have a mic array built in could very well make excellent dev bases.
Samsung also a big name, but seemed to be lacking with mic arrays for far field.

https://www.nokia.com/phones/en_gb/nokia-t-21?sku=719901216501

I would LOVE support for the Alcatel tablets, as I already have two of them as wall-panels :stuck_out_tongue:
The ones I have also come with an ‘audio station’ dock that has some very nice speakers in it.

That seems to be a slightly newer version than the ones I have;