You will not get a hardware modkit for Google and Alexa products before the software is ready.
You said yourself you will not pay for raspis and addons, so what are the chances for paint for something that would require a not yet developed software.
The developers can make the software and those of us that hate/fear the big data gathering giants can then start to use it, then eaåsy DIY esp projects will emerge and then the modkits will wrap it up.
My thought as well…
Most interested would probably be coming from another platform (Alexa, Google…) Which means unless they can somehow use the hardware they already have, this may not be as attractive.
This was my thought, HA Assistant being a backup for when my internet, or Amazon’s servers, goes offline.
The biggest issue is the amount of data those Open Source approaches have. Not just quality wise but overall.
As i see Mozilla DeepSpeech is supported as well. This is powered by Mozillas Open Plattform CommonVoice where everyone can provide samples and confirm others:
This thing is now live for a long time (years?) but the amount of data is still very low.
For example just about 1.2K hours of German samples.
So the issue is:
People are lazy and just want a solution.
They dont want to spend time.
That goes by not even wanting to train their own language models.
This is not going to work out, unless there is some sort of auto collection that by default needs opt-out (with the drawback of then having to train your own language models and not just take from the community).
–
Another hope would be Alexa or Google Open-Sourcing their Language Models as/if they drop Alexa/TTS-Support or put it on second line / life support.
But that is very unlikely to happen.
I think this is going to a completely different direction than I was expecting. I own a Google home and I have home assistant at home. They are interconnected so if I want to control a Home assistant entity with voice I can do it and vice versa if I have a light that doesn’t support transition, I can send a rest command in an automation to Google home to wake or sleep a light.
I hardly speak out any commands though. Usually when people ask me what is a smart home my reply is that it is a home that does stuff for you instead of you doing it. Some people get a smart device and they use their phone to turn it on/off and they think that they have a smart home. Others just have Alexa or Google home and speak out commands but for me the smartest home is the one that I don’t need my phone, I don’t need my voice, I don’t need switches. I just do stuff that I do every day, I open a door, I lie to the bed, I sit on the sofa, I watch tv, I work, I am on the phone, etc etc. For me the smart home is the one that can understand what I am doing and can adjust accordingly without me having to in any way instruct it to do a certain action.
Anyway, I wish you best of luck with the project!
I’ve heard this a lot, and it’s a great idea. Obviously it works for you, and for many people.
I’m the opposite. I don’t want some computer doing things for me. I want to be able to control devices myself, with the minimal effort. I’m not a fan of “learning” thermostats. I want to decide when I want it warmer or cooler. I don’t want lights coming on when I enter a room. I don’t want mood music to start playing. I don’t even like the fact that my washing machine won’t unlock the door until it completes some pre-programmed cycle.
I’m not opposed to the idea of a smart home. It’s just never going to be smart enough to read my mind. For me, a Home Assistant is good enough.
I curious about idea of using echos or google home as parts (mic, speaker, case and power supply) and adding esp.
I’ve read the blog and don’t see anywhere that this is proposed as a product. Or did I miss it? I would just like to see a proof of concept that can be built upon.
I primarily want to avoid a cloud-based solution. I don’t mind that Alexa is always listening for a trigger word, but I am really annoyed by the “suggestions” and unsolicited help from Alexa.
I was thinking exactly like you until I had kids and they started to grow up. There was a point that my kids would leave all the lights on, or they would try to speak to Google assistant (in English while it’s not their native language) and it would do nothing. It was no use to me as well to shout commands to a speaker when they sleep.
That’s when I stumbled uppon home assistant. The best thing is that you can expose home assistant entities to Google home and you can run home assistant scripts via Google assistant voice command. It’s like having the best of both worlds.
Having said that, it feels like the HA team is trying to re-invent the wheel. The only things that are different is that it will be local and that it will support more languages, but it will be hard to replace existing setups.
Don’t get your hopes up people.
What the majority fails to recognize a technology in Alexa and Google devices that is both extremely advanced and patented to death: beam forming phased array microphone, (along with local, low power, voice recognition for activation phrase).
You see, what makes those devices so useful is their ability to pick up the meaningful signal that carries the instructions. It’s hard to appreciate what an incredible job these devices do without a solid understanding of DSP and psycho-acoustics. These two companies bought the entire field of research, and lodged a huge amount of patents.
Don’t expect to be able to get the same convenience and results anytime soon with what is available in the market. And as someone pointed out, nobody so far has been able to hack those devices either.
So, while we might be able to run decent voice recognition, you will still have to be reasonably close to the transducer in a relatively quiet environment for it to work.
Many people refuse to use cloud services for privacy reasons, including a non négligeable number of home assistant users.
I totally understand. There’s real power in automations. I use them for notification and even critical functions like switching to the backup sump pump if the primary fails. Believe me, I know kids can do unpredictable things and automations must really help in detecting and correcting those.
HA can do so many things that almost anyone can find their ideal configuration. I’m not opposed to HA experimenting with improved voice functionality. It sounds kinda fun.
My concern is that narrowing HA’s focus could make it more restrictive rather than more diverse. To me, the focus should be on the needs of the average user with a minimal hardware investment. I feel that paid developer resources would be better applied to work on the hundreds of great ideas actual users have submitted via FRs and WTHs.
Having been a Rhasspy user for a few years, tell me the weather
or what is the weather
were one of my first Rhasspy intents. At that time, weather was being provided by the Ecobee thermostat integration. I’ve since added an outdoor weather station, but haven’t switched the Rhasspy automation over to that very local information, just yet.
Automation:
data_template:
target: |-
{% if trigger.event.data._intent.siteId == "Listener1" %}
192.168.1.xx
{% elif trigger.event.data._intent.siteId == "Listener2" %}
192.168.1.yy
{% elif trigger.event.data._intent.siteId == "Listener3" %}
192.168.1.zz
{% endif %}
payload: >-
The weather is currently {{ states.weather.downstairs.state }} The forecast
is {{ states.weather.downstairs.attributes.forecast[0].condition }} The
temperature is {{ state_attr('weather.downstairs','temperature') }} degrees
The humidity is {{ state_attr('weather.downstairs','humidity') }} percent
service: rest_command.rhasspy_speak
configuration.yaml REST command
#
# RESTful commands for
# Rhasspy
#
rest_command:
rhasspy_speak:
url: 'http://{{ target }}:12101/api/text-to-speech'
method: 'POST'
payload: '{{ payload }}'
content_type: 'text/plain'
What a great investment!
My greatest concern, like many others have flagged, is the hardware. It’s going to take a lot more than a Raspberry Pi with a cheap microphone. To do it properly, it’ll need an array of microphones, far field tuning, echo cancellation,DSP filtering, etc.
Seed Studio have tried to do something like this in a relatively attractive case: ReSpeaker USB Mic Array - Seeed Wiki
But these are just a microphone, so need to be paired with a RPI or perhaps an ESP32.
I think the path to success would be to partner with one of the Arduino hardware designers (eg Seed Studio, DF Robot, TTGo, Adafruit, ETC). They’ll have the supply chain to be able to design and manufacture a custom product to support this (and other open source) project.
A possible solution - is to hook the sound by ESP32+Mic and transmit it to the HA server for further processing. Better to use ESP32 with Ethernet connectivity (e.g. WT32-ETH1) to avoid delays as compared to WiFi.
So the solution to the voice assistant privacy issue is to have a network connected microphone streaming every word you say 24/7 over the network, in every room… How ironic.
That’s even more creepy than having cameras inside the house (which is creepy enough). If someone gets into that network… fun times. Not even talking about family trust issues when they realize that you have the technical means to listen to every conversation at any time and even record everything.
If the HA guys want to support a privacy minded voice assistant, then they need to do it properly. With on device wake word recognition, beam forming with mic arrays, etc. Not only is a simple ESP not powerful enough to do the required signal processing without additional hardware, it is also heavily patent encumbered, as someone above said. Hardware ia going to be a massive problem if people expect this to work as well as the current voice assistants.
Please read the Raspy docs. The recommended hardware does have local on device wake word monitoring.
The recommended Rhasspy hardware for wake word recognition is a Raspberry Pi. Good luck with that.