Year of the Voice - Chapter 5

Answering to myself. You have to define separate sentence with context requirement like this:

- sentences:
      - "(whats | what's | what is) [current] humidity"
  requires_context:
      area:
          slot: true

Then area name will be in slot, and can be used for services.

100% this, expecting an open source software project to provide you with hardware at a similar price point and spec to what for-profit companies produce by the millions as a loss leader to mine your personal data, well that’s just not going to happen.

The price that Nabu Casa would have to sell these small-batch, niche devices for just to break even would lose the interest of most people.

2 Likes

Are you sharing your code?

This is awesome, but speach recognition in Spanish is awful. I haven’t been able to get any sentence correctly recognised in local.
I like assist, but I can’t even make tests of it because of this.
Until this is not solved, the year of the voice is only for English or for nabu cloud, but the local solution which is the goal many of us, not.

1 Like

My understanding is that hardware currently available is up to the task - the limitation is that the Digital Signal Processing software magic is pretty much all currently proprietary.

Seeed have demos of using the multiple mics on their reSpeaker products - but their demos are not open source :frowning: and their device driver uses only one mic. Espressif’s software for the ESP-S3 BOX is really impressive - but only works with their IDE, uses several key proprietary modules, and doesn’t do more than their demo. To be fair, neither of these companies is trying to sell product direct to public - they are selling the building blocks for other companies to use in their own products. And these other companies are (quite reasonably) wanting to get the most advantage from their own software investment … which means locking customers in to cloud servers :frowning:

Agreed that no FOSS project can subsidise the hardware.

BUT it seems to me that we are currently spending about twice as much (for ESP-S3 BOX, RasPi + Jabra conference speaker, RasPi + reSpeaker, etc) as the for-profit mass-produced devices, and getting lower quality - most of which is the lack of DSP. I believe that with better firmware/software that price difference can be justified.

Mike has hinted several times throughout the year that Nabu Casa do see voice satellites as a key component of Home Assistant, and are closely watching developments in that area. In Chapter 5 we see that Nabu Casa have put in effort to bring the ESP-32-S3 BOX 3 from ESP-IDE to ESPHome. I understand that the ESP32-S3 chip includes some specialised AI hardware which could potentially be used to improve sound quality, wakeword detection, and some of the other DSP magic. I don’t care if some parts of it are proprietary and source code is not available (like RasPi video) - more important that it works reliably, independently, and LOCALLY.

The ESP S3 BOX 3 with ESPHome seems pretty close to my ideal voice satellite … and yet it was developed to showcase ESP’s hardware, not to sell direct to public as a finished product. Once the software is more mature, I anticipate Nabu Casa could leverage its ESP expertise to tweak the hardware and market a Voice Assistant Satellite, in much the same way that Home Assistant Blue, Yellow and Green are not totally new products developed from the ground up. Did “these small-batch, niche devices” “lose the interest of most people” and loose money for Nabu Casa ?

Just my thoughts. Nabu Casa will make its own decisions based on its own business intelligence and other factors. For the present time, with Nabu Casa support I am happy to recommend ESP-S3 BOX 3 running ESPHome as preferred voice satellite, with M5 ATOM Echo as an option to have lots of cheap “ears” around the house.

2 Likes

To not clogg the thread with too generic content not related specifically to chapter 5, I set up a dedicated poll over in the voice assistant section of the forum:

2 Likes

I completely agree. Just wanted to add, that BOX/Atom solution should have local wake word recognition first - otherwise HA server will be pretty overloaded. It’s on roadmap, and Espressif demo does it - so it’s possible. Fingers crossed. :slight_smile:

Why? I don’t think it’s stable enough to use, if it’s not included by HA devs themselves. I can share, but if you can’t do it yourself, is it worth bringing more potential bugs? :slight_smile:

it’s seems that this feature isn’t available in french; so i opened a issue in GH

1 Like

I can’t remember anymore :smile: A Wyoming/Hermes bridge has been on my TODO list forever. For the purposes of controlling LEDs, the bridge could be pretty simple as you just need to know about specific states.

I have BOX-3 / 2 Atom Echo running only local. Very happy with the 3 separate wake words I created for them. As a novice I very happy with them, fast wake word action and replies. Would love to use the presence detection as well.

For now, a happy novice camper.

Whisper (1.0.0), Piper (1.4.0), openWakeWord (1.8.2), ESPHome (beta) (2023.11.6)

2 Likes

I am happy to see the Year of the Voice working well for ESP32 & RPi devices.

We standardized on Galaxy Tab A8’s for their performance, price, and high resolution screen. These Android tablets have sufficient processing power to handle a lot of the functionality but I am limited because I have to use Fully-Kiosk browser for these always on wall panels. WallPanel.xyz is not viable because Assist does not function at all.

I would love to see the native Home Assistant Android app to provide similar functionality including on-device wake word detection.

4 Likes

Would you consider posting this or the equiv of this on this
DIY Examples — ESPHome.
page to make it more available to others?

I think it would be very helpful to others and it would be located in a place where many might look for it.
If there is a better place to document this, that would be OK as well. I couldn’t think of one, however. I wanted to be able to do this, and bumped into a link to here in an [FR] forum post, so thanks for your insight. It just was kind of buried here low in another thread.

Nokia do quite an interesting tablet due to its twin mic design and audio dsp with its OZO Spatial Audio Recording and Playback software.
The Nokia T21 is relatively budget but all metal 10.4" for £175 and you can pick up the relatively similar T20 for £100.

There is also the cheaper Acatel which also has x2 Mic and starts @ £100 and lower.
It also boosts Noise cancellation: +3m far-field.

Likely software could be written for.

I spent some time before Xmas and purchased x2 1080p pan tilt cams that many will see for $25 which have 2 way audio.
I got a IMUI Ranger2 and just a generic and the audio on both when it comes to audio out and mic is pretty awful.
So I scrapped that idea until a manufacturer maybe does include a isolated far field mic as think as a device they could be perfect.

The tablets though and the above are just the best budget ones I could find that have a mic array built in could very well make excellent dev bases.
Samsung also a big name, but seemed to be lacking with mic arrays for far field.

https://www.nokia.com/phones/en_gb/nokia-t-21?sku=719901216501

I would LOVE support for the Alcatel tablets, as I already have two of them as wall-panels :stuck_out_tongue:
The ones I have also come with an ‘audio station’ dock that has some very nice speakers in it.

That seems to be a slightly newer version than the ones I have;

I wasn’t sure about the Alcatel 3T10 as its only the 3T10 that has 2 mic and far field technology.
In charging mode it auto activates Google Assistant and wasn’t sure how much you could hack out of the Rom.
I have purchased a Nokia T20 (It was 2nd user and cheap) as the android version looked more vanilla and editable, I could be wrong (purely to test what to expect in terms of far field and noise).
I think the 2020 is a slight upgrade but both have the 2 mic far field.
The Nokia has a slightly better CPU with 2x Cortex A75 & 6x A55 but really its any tablet with an array microphone that supports far field as does the alcatel.
I might try and get the 3T10 also and show you, as there are Android/Gradle solutions on gitub and will have to check the store.

Apparently GitHub - topjohnwu/Magisk: The Magic Mask for Android will run on the 3T10 so likely making all customisable. Might not even be needed.
You can still test the 3T10 with Google assistant and give some feedback on how it does with far field distance and 3rd party noise.

The display resolution is too low for our dashboards, my partner does not want to scroll.

The Galaxy A8 resolution is 1920x1200 powered by 2x Cortex-A75 + 6x Cortex-A55 running at 2.0GHz. I cannot find anything about the microphone(s) however.

Now I am pondering connecting some sort of microphone array via USB-C or a nearby ESP32-S3…then overlay something when listening on the tablet screen….#OverlyComplicated #ReachAroundMyElbowToGetToMyAss

Yeah the acatel resolution is not that great but the Nokia T20 I went for is 2K also (1200*2000) and pretty sure exactly the same CPU as also 2x A75 + 6x A55.

Samsung is only a single Mic and likely only works well near field, but the reason to get a tablet with a mic array is that its likely they already have mic array DSP software built in as both the acatel & nokia seem to.

Looking at so far what has been done with the ESP32-S3-Box that doesn’t seem to work that great with HA from reading the forum, you can get tablets that are 2x the cost but are a ready made product that can also do much else.

I can make a beamforming 2 mic on a PiZero2 and have a repo with a C/C++ delay sum beamformer but the little bit of soldering and tinkering to create a finnished product seems to be off putting to many who just want a finnished product such as a conference mic.

The Acatel & Nokia are the only 2 budget android tablets I have researched so far but none of the great priced Samsung tablets seemed to have mic arrays or much interest that way.
I think most of the Ipads have array mic but haven’t looked as they carry that Apple $ premium.

If anyone knows of any other mic array tablets please say as you really do have to dredge the specs to be sure. As in the Acatel 3T8 is single mic whilst 3T10 is dual so you have to really check the specs.

So its any ready made tablet already with mic array and software and hopefully budget and if anyone knows of others please post.
I will prob post some APK’s here that anyone can side load as likely will bypass much of what I don’t like and try to use the Conversation API | Home Assistant Developer Docs direct.

I’m trying to use area awareness in german voice assistant and it does only worlk for lights.

How can I use your example to have area awareness for switch and cover domain? I tried this, but it does not work for switch domain

# Example on_off.yaml entry
language: "de"
intents:
  HassTurnOn:
    data:
      - sentences:
          - "schalte <name> ein [(in|im) <area>]"
          - "schalte <name> [(in|im) <area>] ein"
        requires_context:
          domain: "switch"
      - sentences:
          - "schalte <name> ein"
        requires_context:
          domain: "switch"
          area:
            slot: true
  HassTurnOff:
    data:
      - sentences:
          - "schalte <name> aus [(in|im) <area>]"
          - "schalte <name> [(in|im) <area>] aus"
        requires_context:
          domain: "switch"
      - sentences:
          - "schalte <name> aus"
        requires_context:
          domain: "switch"
          area:
            slot: true