Jarvis like automation

OK. This is kind of long shot.

If I can let my imagination run wild, I would love to see HASS has the capability to customize into something like Jarvis from the movie Ironman. That means, these are the features I would love HASS to have…

  1. Voice/facial recognition. That means, it can only accept voice command from the assigned users.

  2. Able to identify which device is sending the command and responding only to that device.

  3. I will place speakerphones (attached to Raspberry Pi) or Android tablet in every rooms that act as a device (client) and always ready to listen to voice command.

  4. Send random notification for the same automation to simulate normal conversation. For example, instead of just saying “Welcome home master”, I can specify a set of sentences to send randomly, such as “I am glad to see you again”, “How was your day master?”, “Glad you are home early. It is about to rain soon”, etc…

  5. There will be an app for Android/IOS/Linux/Windows that can accept notification from HASS in text form. Then TTS (text to speech) app such as Shouter in the device will read out the notification. The app will also listen for voice command (point #3).

These are all I can think of right now.

Anymore idea on how to make HASS more like Jarvis?

1 Like

Very ambitious, but wouldn’t it take a lot of processing power to detect which voice is which? Especially when it comes to lip-reading and 3D-based real-time face detection from a camera, as to detect whether a person is alive and did not come from a still photo?

Companies would certainly want to invest in proprietary solutions such as what you are talking about and may only work with their products and not Home Assistant.

From a more technical standpoint, this is something that HSA (Heterogeneous System Architecture) can do that requires the work of both CPU and GPU to process information to make sure it knows who you say you are.

The voice/facial recognition can be done on the client side. The client is most probably a tablet that is always plugged in so battery life is not a concern. However the challenges are how to make it act like Echo which always listen for command.

I came across this project; Jasper Integration which I think is a step closer to this idea.

Anyway, there are still rough edges to this idea. Please feel free to share your thoughts on this.

Today, I came across Mycroft. It seems like a pretty good example of what I have in mind. It is always listening for its name and it is powered by Raspberry Pi. The casing looks pretty cool too. Unfortunately, it connects to ‘cloud’ and rely on internet for it to function. Which is why I love HASS.

Perhaps I can make something similar using Jasper using the above guide.

Oh, yeah. I read the thread about Jasper some time ago (probably last week or so). Could Jasper do voice-print authentication? And if there is some program that can do face analysis for authentication, Home Assistant could become a powerhouse! All the cool technologies could be within our reach so long as we don’t have to stand in front of microphone and camera.

“Voice print and face recognition analysis enabled. Welcome back, Ambassador Spock.”

Since TTS component is already available, now we are one step closer to Jarvis. Actually the voice component is also available, but it is still very basic and not practical. I guess when a dedicated Android app is ready, I hope it will always listen to command like Amazon Echo.

By my side, i installed jarvis on a raspberry pi 2, it works fine with new snowboy STT module included.
I can use vocal command to push value on home assistant or trigger script by example :

I parse some websites (surfcast condition) to extract wave height and local conditons

When i ask to jarvis “How are the surf conditions today ?”
Jarvis execute a HA script which get informations parsed and return via TTS component the response.

It’s really fun and it works fine.

I decided to install opencv on raspberry to test facial recognition, it’s really slow but it works.
I had to get a python code (Treasure box project) and modify it to write in a file the name of the person detected.

And HA get the information from file with a sensor (command file) which cat the file :slight_smile:

When HA detect state from Nobody to Arnaud (my name), HA executes an automation.

It works but with a lag (4-5 seconds) …
I think PI2 is too slow to accomplish that…

wow. that’s great. what do you mean…

STT module is included? can you give me some pointer on how to make use of this?

If a more powerful machine can improve that, I don’t mind to invest into that. But please elaborate on this. Any pointer will be greatly appreciated. Thanks.

The Jarvis 's SST module ;p

After one year, point no. 2 to 5 have come true. Please see My Jarvis using Zanzito (Skills Collection)

1 Like