Year of the Voice - Chapter 5

stuartiannaylor · December 29, 2023, 11:36pm

I wasn’t sure about the Alcatel 3T10 as its only the 3T10 that has 2 mic and far field technology.
In charging mode it auto activates Google Assistant and wasn’t sure how much you could hack out of the Rom.
I have purchased a Nokia T20 (It was 2nd user and cheap) as the android version looked more vanilla and editable, I could be wrong (purely to test what to expect in terms of far field and noise).
I think the 2020 is a slight upgrade but both have the 2 mic far field.
The Nokia has a slightly better CPU with 2x Cortex A75 & 6x A55 but really its any tablet with an array microphone that supports far field as does the alcatel.
I might try and get the 3T10 also and show you, as there are Android/Gradle solutions on gitub and will have to check the store.

Apparently GitHub - topjohnwu/Magisk: The Magic Mask for Android will run on the 3T10 so likely making all customisable. Might not even be needed.
You can still test the 3T10 with Google assistant and give some feedback on how it does with far field distance and 3rd party noise.

PureeTofu · December 31, 2023, 4:59pm

The display resolution is too low for our dashboards, my partner does not want to scroll.

The Galaxy A8 resolution is 1920x1200 powered by 2x Cortex-A75 + 6x Cortex-A55 running at 2.0GHz. I cannot find anything about the microphone(s) however.

Now I am pondering connecting some sort of microphone array via USB-C or a nearby ESP32-S3…then overlay something when listening on the tablet screen….#OverlyComplicated #ReachAroundMyElbowToGetToMyAss

stuartiannaylor · December 31, 2023, 5:35pm

Yeah the acatel resolution is not that great but the Nokia T20 I went for is 2K also (1200*2000) and pretty sure exactly the same CPU as also 2x A75 + 6x A55.

Samsung is only a single Mic and likely only works well near field, but the reason to get a tablet with a mic array is that its likely they already have mic array DSP software built in as both the acatel & nokia seem to.

Looking at so far what has been done with the ESP32-S3-Box that doesn’t seem to work that great with HA from reading the forum, you can get tablets that are 2x the cost but are a ready made product that can also do much else.

I can make a beamforming 2 mic on a PiZero2 and have a repo with a C/C++ delay sum beamformer but the little bit of soldering and tinkering to create a finnished product seems to be off putting to many who just want a finnished product such as a conference mic.

The Acatel & Nokia are the only 2 budget android tablets I have researched so far but none of the great priced Samsung tablets seemed to have mic arrays or much interest that way.
I think most of the Ipads have array mic but haven’t looked as they carry that Apple $ premium.

If anyone knows of any other mic array tablets please say as you really do have to dredge the specs to be sure. As in the Acatel 3T8 is single mic whilst 3T10 is dual so you have to really check the specs.

So its any ready made tablet already with mic array and software and hopefully budget and if anyone knows of others please post.
I will prob post some APK’s here that anyone can side load as likely will bypass much of what I don’t like and try to use the Conversation API | Home Assistant Developer Docs direct.

ha_frw · December 31, 2023, 6:34pm

I’m trying to use area awareness in german voice assistant and it does only worlk for lights.

How can I use your example to have area awareness for switch and cover domain? I tried this, but it does not work for switch domain

# Example on_off.yaml entry
language: "de"
intents:
  HassTurnOn:
    data:
      - sentences:
          - "schalte <name> ein [(in|im) <area>]"
          - "schalte <name> [(in|im) <area>] ein"
        requires_context:
          domain: "switch"
      - sentences:
          - "schalte <name> ein"
        requires_context:
          domain: "switch"
          area:
            slot: true
  HassTurnOff:
    data:
      - sentences:
          - "schalte <name> aus [(in|im) <area>]"
          - "schalte <name> [(in|im) <area>] aus"
        requires_context:
          domain: "switch"
      - sentences:
          - "schalte <name> aus"
        requires_context:
          domain: "switch"
          area:
            slot: true

formatBCE · December 31, 2023, 6:58pm

My conversation part is identical to yours, but I use custom intent scripts, not extending existing ones. AFAIK light context is working by default - IDK about switch - so looks like your changes don’t affect default intent.

ha_frw · December 31, 2023, 7:26pm

would you be willing to share your code, with custom intent script?

PureeTofu · December 31, 2023, 7:48pm

Wow, I wish I had known about the NOKIA T20 when I was shopping. I managed to get the Galaxy Tab A8’s on sale for $109 USD including two year warranties.

I am okay near field assistant use for the time being. If beam forming or similar becomes necessary then I would be willing to solder something together, especially if I can recycle some of my ESP32 or ODROID N2+ boards.

My biggest hurdle right now is that I have not found any method to have FullyKiosk “always listen” for the wake word.

stuartiannaylor · December 31, 2023, 8:03pm

I think they had supply probs and made a revision, hence the T21 which is near the same.
The T21 is £175 but remnants of stock for the T20 is being sold off for £100 (I got a 2nd user T20 for dev for £80)

I think I can write a manifest that when in charging mode I can run certain apps.
So idea is to get a wireless charging dock so it works like google assistant and in your hand the normal tablet.
Dunno but been meaning to play with Android studio and already done a bit of research, but most code already exists on github.

formatBCE · December 31, 2023, 8:05pm

Sorry, away from PC now. But it should be just switch.turn_off service with template for switches in area. You have to create your own name for this intent though - like “MySwitchTurnOff” instead of “HassTurnOff”, to avoid clashing.
P.S. like here: Assist - custom sentences - Home Assistant

ha_frw · December 31, 2023, 8:33pm

thank you, I tried it but it does still not recognize the area.

Sharing any days later, when you are online again would be appreciated.

formatBCE · December 31, 2023, 8:51pm

Yeah will do later.

Make sure you’re calling it from satellite with area assigned - e.g. from phone it won’t work, because phone assist doesn’t have area.

UPD: Here’s my code to ask for current humidity:

yaml file in custom_sentences folder:

language: "en"
intents:
  GetHumidityInArea:
    data:
      - sentences:
          - "(whats | what's | what is) [current] <area> humidity"
          - "(whats | what's | what is) humidity in [the] <area> [currently]"
          - "(whats | what's | what is) [current] humidity in [the] <area>"
      - sentences:
          - "(whats | what's | what is) [current] humidity"
        requires_context:
          area:
            slot: true

corresponding section in configuration.yaml:

intent_script:
  GetHumidityInArea:
    action:
      - service: script.humidity_get_for_area
        data:
          f_area: "{{ area }}"
        response_variable: result
      - stop: ""
        response_variable: result
    speech:
      text: |-
        {% set hum = action_response["humidity"] %}
        {% if hum > 0 %}
          Humidity is {{ hum }}%
        {% else %}
          Can't get humidity data
        {% endif %}

Here you can see how i use area in script field ( "{{ area }}" ).
That script takes humidity level in area and returns it in result. Then i take value from result via action_response["humidity"] and speak it.

So you can do the same. I have plenty of that already.

ha_frw · January 1, 2024, 6:42am

Thank you! and HNY!
My Use case was to say turn on TV and depending on area the TV is turned on in that area.

So I guess the magic is your script: script.humidity_get_for_area to find the humidity device for that area, would you mind sharing that?

formatBCE · January 1, 2024, 7:46am

HNY!
Will post my script. It’s pretty easy. Drinking now

RT1080 · January 1, 2024, 7:26pm

Interesting, in my case it picks up the temperature sensor of my aqara zigbee unit whilst i have multiple climate entities. I have multiple sensors, is there a way to set preference in the area tab?

intent:
  name: HassGetState
slots:
  name: Living room
  domain: sensor
  device_class: temperature
details:
  name:
    name: name
    value: Living room
    text: living room
  domain:
    name: domain
    value: sensor
    text: ''
  device_class:
    name: device_class
    value: temperature
    text: ''
targets:
  sensor.sensor_livingroom:
    matched: true

formatBCE · January 2, 2024, 2:47am

Don’t think so - that’s why i have my own scripts for things like this. I guess it picks first in the list by default.

formatBCE · January 2, 2024, 2:51am

So my script is like this:

alias: "Humidity: get for area"
sequence:
  - variables:
      result:
        humidity: >-
          {{ (expand(states.climate)|selectattr('entity_id', 'in',
          area_entities(f_area))|first).attributes.current_humidity }}
  - stop: Result
    response_variable: result
mode: parallel
icon: mdi:water-percent
fields:
  f_area:
    selector:
      text: null
    name: Area
    required: true
max: 10

Case is that in all areas of interest i have Mysa thermostats, that have humidity readings - so for uniformity i use them. THey’re pretty accurate as well.
But if you want to have exact set of sensors, just create group with that sensors (make sure that you have area set for each of them), and use expand(group.my_humidity_sensors) instead of climate entities.

ha_frw · January 2, 2024, 6:20am

appreciated, I even got mine working for TV in different areas, with area awareness, which is not working in German translation, so I have to make it for lights TV and Covers. Is all now good and working.

The only thing I stil struggle is with brightness control and area awareness.

Hedda · January 4, 2024, 12:13pm

donburch888:

Endlessvoid:

expecting an open source software project to provide you with hardware at a similar price point and spec to what for-profit companies produce by the millions

Endlessvoid:

The price that Nabu Casa would have to sell these small-batch, niche devices for

Agreed that no FOSS project can subsidise the hardware.

BUT it seems to me that we are currently spending about twice as much (for ESP-S3 BOX, RasPi + Jabra conference speaker, RasPi + reSpeaker, etc) as the for-profit mass-produced devices, and getting lower quality - most of which is the lack of DSP. I believe that with better firmware/software that price difference can be justified.

Mike has hinted several times throughout the year that Nabu Casa do see voice satellites as a key component of Home Assistant, and are closely watching developments in that area. In Chapter 5 we see that Nabu Casa have put in effort to bring the ESP-32-S3 BOX 3 from ESP-IDE to ESPHome. I understand that the ESP32-S3 chip includes some specialised AI hardware which could potentially be used to improve sound quality, wakeword detection, and some of the other DSP magic. I don’t care if some parts of it are proprietary and source code is not available (like RasPi video) - more important that it works reliably, independently, and LOCALLY.

Perhaps it would be more realistic to expect Nabu Casa to design different satellite products as custom Carrier Boards (CB) built in a modular way so that it built depend on a reusable and replacable SoM (System on Module) with long lifecycle like the Raspberry Pi Compute Module 4 (CM4) or similar.

Nabu Casa itself could perhaps design a completely new reusable custom SoM compute module based on a similarly powerful SoC with less expensive parts and more importantly release it under an open-source hardware license so that others could also manufacture compatible compute modules and carrier boards without royalty fees.

An idea could be to reach out to Amlogic (headquarters in the USA) or Rockchip (headquarters in Chinese) about the possibility of collaboration or help with designing/building an open-source hardware licensed swappable SoM board based on one of their SoCs for this purpose if could find a good choice that keep costs down and lifecycles long, (that could also be reusable for other DIY-projects though Amlogic and Rockchip only deal directly with companies).

Their best-suited SoCs today is probably Amlogic A111, A112, A113 processors (primarly designed for AI IoT smart speakers) or Rockchip RV1109 and RV1126 SoC (primarily designed for AI IoT smart cameras).

Not as a price comparison but check out, for example, the official "Rockchip PRO-RV1126 Core Board as well as existing third-party SoM modules like this MINI1126 SoM, Think Core TC-RV1126, Runwelltek RWA023, and LY1126-1GF8GLC-Y.

PS: I understand that Xiaomi Mi Speaker L09G model is based on Amlogic A113X and it is relatively inexpensive + widely available so maybe one of those could be hacked as proof-of-concept?

stuartiannaylor · January 5, 2024, 1:06pm

Really its not about hardware, its the software and the complex DSP audio algs.
The ESP32-S3 contains the only free even though a closed source blob of BSS (Blind Source Seperation) likey a DUET alg where out of a binaural mix 2x positionally unique signals are seperated.
This works well for smart speakers as 80/20 mostly you have command speech & 3rd party noise, also because it positionally seperates it deverberates which is a huge problem solved for far-field.

There is also beamforming, which can be less effective than BSS as really it just focusses and dereverberates. Many conference mics such as the Jabra use beamforming, but unlike a smart speaker it has no mechanism to lock onto the speaker direction for that command sentence. So in the presence of noise or other voice it will just jump to the loudest.

As said its software and the critical important start of chain audio DSP to get a clear voice, from differing volumes and distance.
Just because a piece of hardware can employ multiple mics its a myth that is all that is needed as it needs DSP algs containing quite high-end science.
Google and even Xiaomi have resources and contacts where they can get these things commisioned. In fact Google go one further than BSS and use targetted voice extraction that is a type of BSS that works with user profiles.
The only thing about hardware as like Google, Amazon & Xiaomi is that hardware and software dictate is extremely beneficial as the models for ASR are trained specifically for that hardware and earlier in the chain software.

Google & Amazon are miles ahead, 1stly because they have the resources but also because the have the weight and engineers to create application specific SoCs with absolutely huge purchase power through economies of scale.
Even then Google for them makes a small loss whilst Amazon is currently leaking like sieve.
The difference with the best in academia working for and posting papers on the latest technical innovation and some very capable HA Python ESPHome programmers, is still huge and in the DSP world is completely dependent on free software and opensource provided by the big guys & academia, which is in very short supply.
Basically you have the BSS blob provided by Esspressif, a delay-sum beamformer by myself and various filters such as dtln, deepfilternet & conv-tasnet that are a massive evolution over early attempts such as RnNoise.

Hardware the Amlogic chips you mention are very low-end as they are expecting custom embedded systems written in a performant language like C/++ or even Rust or Go.
Python is more of RAD language hence why we never see it in kernels or drivers, so likely to make provision for Python and coding that is far removed from performant custom embedded systems hardware choice is like in need of Victorian engineering and compensate by going higher end.

The Orangepi5 is considerabilly better than the Rpi5 at similar price and near all RK3588(s) boards have recieved good support especially mainline.
Okdo now suppply the OKdo ROCK ZERO 3W 1GB with Wi-Fi/BLE without GPIO - OKdo which is a Cortex-A55 is an even bigger step up over the Rpi02W again at a similar price, but unlike the RK3588s support currently its not good, but likely with some community backing can be quickly supported.

When a Rpi5 4gb is only £2 more than a Rpi4, the Rpi4 doesn’t make sense anymore, or likely its CM4 as as similar CM5 is likely.
The Rpi5 for Arm is strangely inefficient and one reason why I have a preference for the Opi5 that posts nearly 2x Gflops/watt.

Raspberry Zero2W is an underclocked Pi3 and £17, whilst the £18 RadxaZero3W has a lot more under the hood.

Be it Raspberry, OrangePi or Radxa the current SBC that standout are the above Pi5 & Zero ones.

[EDIT] Radxa Zero3W likely will have an image by GitHub - Joshua-Riek/ubuntu-rockchip: Ubuntu 22.04 for Rockchip RK3588 Devices who does the best Ubuntu RK3588(s) images and would be great also to have Ubuntu images for the Radxa Zero3W as the Radxa ones still look bad.
OrangePi3B likely to be supported also.

stuartiannaylor · January 8, 2024, 11:35am

@PureeTofu Got an update on a 2nd user Nokia T20 and it wasn’t good. The Mics cancelled noise but volume seems to be really low.
Thing is it was cheap on ebay and could be faulty but wondering if its a reason why they made a quick revision to the T21.

@Fraddles I also got a Alcatel 3T10 2020 and the far field is pretty damn good on those, Google assistant is working brilliantly from some distance. Need to check how resilient it is for noise…

Seems to only upgrade to Android 10 and think its Android 12 that allows specific alternative wake-word android api to run in-place of Google assistant, but guess there will be other ways.

The Nokia T20 felt a lot more response and there are mic amplifier apps, but they don’t seem to get much of a review so just returned.
If I get hold of a T21 or anyone else can then please review.

The ultra budget Alcatel 3T10 2020 has quite impressive far-field and likely going to stick a wireless usb-c charging pad to its back so I can swap between the wall mount and table mount I have and fit wireless chargers.