Appdaemon: voice controlled commands (Appi is born)

after that i did try to use Alexa installed on my RPI to command HA i decided that i needed to look in another direction.

Alexa on the RPI was nice but i ran into a lot off trouble to connect to HA.
so i decided to keep it closer to home.

lets see, i can manipulate HA with appdaemon with any python code.
is there a decent speechrecognition in python?
after some searching i found that pretty quick.
installing was quite easy also.

lets see what we can do with it.

like making a loop for my TTS i started in that direction. and i took note from what @aimc has given me :wink:
so i made a workerloop that listens to me.
and it works!!
i am now able to turn on and of 1 off my lights by voice.

i have sound feedback so i know when the commands are recognized.
next step:
creating more commands and trying to make it work for everyone. so lets take some things to the config.
and maybe i should try to find out how to work with callbacks. because that would make things a bit more easy.

after that some voice feedback!

and what will the endresult be?
our very own appdaemon Alexa.

but lets not call it Alexa!
lets call it Appi

###########################################################################################
#                                                                                         #
#  Rene Tode ( [email protected] )                                                            #
#  ( with a lot off help from andrew cockburn (aimc) )                                    #
#  2017/01/18 Germany                                                                     #
#                                                                                         #
###########################################################################################

import appdaemon.appapi as appapi
import datetime
import time
import random
import speech_recognition as sr
from queue import Queue
import threading
import tempfile
import subprocess

class hearspeech(appapi.AppDaemon):

  def initialize(self):
    # Create Queue
    self.speechlog("workerloop; OOOOOOOOOOOOOOOOOOOOOOO Initialize start 00000000000000000000000000")
    self.queue = Queue(maxsize=0)
    self.startcommando = "hallo"
    self.commandlisten = 0
    self.startcommandseen = False
    time = datetime.datetime.now() + datetime.timedelta(seconds=10)
    repeat = 1
    self.r = sr.Recognizer()
    self.m = sr.Microphone()
    self.log("A moment of silence, please...")
    with self.m as source: self.r.adjust_for_ambient_noise(source)
    self.speechlog("Set minimum energy threshold to {}".format(self.r.energy_threshold))

    # Create worker thread
    t = threading.Thread(target=self.worker)
    t.daemon = True
    self.speechlog("workerloop; OOOOOOOOOOOOOOOOOOOOOOO Initialize done 000000000000000000000000000")
    t.start()

  def worker(self):
    while True:
      if self.startcommandseen:
        self.speechlog("workerloop; --------------------start listening for command--------------------------")
      else:
        self.speechlog("workerloop; ====================start listening for startcommand=====================")
      spokentext = self.listen()
      if spokentext != "":
        self.speechlog("workerloop; text heard: " + spokentext)
        self.speechlog2(spokentext)
        command = self.find_known_command(spokentext)
        if self.startcommando in spokentext and command == "":
          self.commandlisten = 0
          self.startcommandseen = True
          self.speechlog("workerloop; startcommand found in: " + spokentext)
          self.startcommand_beep()
        elif command != "" and (self.startcommandseen or self.startcommando in spokentext):
          self.speechlog("workerloop; known command found: " + command)
          self.recognize_beep()
          self.do_command(command)
          self.startcommandseen = False
          self.commandlisten = 0          
          self.speechlog("workerloop; ****************************command done and loop reset******************************")
        elif command == "" and (self.startcommandseen or self.startcommando in spokentext):
          self.speechlog("workerloop; no command found in: " + spokentext)
          self.commandlisten = self.commandlisten + 1
      else:
        self.speechlog("workerloop; heard nothing")
      if self.commandlisten > 5:
        self.startcommandseen = False
        self.commandlisten = 0
        self.speechlog("workerloop; it took to long to hear a command")
        self.end_beep()
         
  def find_known_command(self,spokentext):  
    known_command_list= {"wandmeubel uit":"turn_off wandmeubel", "meubel uit":"turn_off wandmeubel", "Wandmeubel uit":"turn_off wandmeubel", "Meubel uit":"turn_off wandmeubel","wandmeubel aan":"turn_on wandmeubel", "meubel aan":"turn_on wandmeubel", "Wandmeubel aan":"turn_on wandmeubel"}
    for known_command_key,known_command_value in known_command_list.items():
      #self.speechlog("f_command:  command: " + known_command_value + " text spoken: " + spokentext)
      if known_command_key in spokentext:
        return known_command_value
    return ""
 
  def do_command(self, command):
    self.speechlog("commando;   " + command)
    if command == "turn_on wandmeubel":
      self.turn_on("switch.ewandmeubel")
    elif command == "turn_off wandmeubel":
      self.turn_off("switch.ewandmeubel")

  def listen(self):
    value = ""
    niksgezegd = False
    self.speechlog("listen;     start")
    try:
      with self.m as source: audio = self.r.listen(source)
      self.speechlog("listen;     stopped listening")
      try:
        self.speechlog("listen;     try recognizing")
        value = self.r.recognize_google(audio, language="nl-NL")
      except:
        self.speechlog("listen;     didnt recoqnize any text")
        niksgezegd = True
    except:
      self.speechlog("listen;     didnt hear any text")
      niksgezegd = True
    self.speechlog("listen;     i heard: " + str(value))
    return value

  def speechlog(self,logtext):
    runtime = datetime.datetime.now().strftime("%d-%m-%Y %H:%M:%S.%f")
    try:
      log = open("/home/pi/.homeassistant/speech.log", 'a')
      log.write(runtime + ";" + logtext + "\n")
      log.close()
    except:
      self.log("SPEECHLOGFILE NIET BEREIKBAAR!!")

  def speechlog2(self,logtext):
    runtime = datetime.datetime.now().strftime("%d-%m-%Y %H:%M:%S.%f")
    try:
      log = open("/home/pi/.homeassistant/speech2.log", 'a')
      log.write(runtime + ";" + logtext + "\n")
      log.close()
    except:
      self.log("SPEECHLOGFILE NIET BEREIKBAAR!!")

  def recognize_beep(self):
    self.speechlog("beep;       recognize")
    self.beep("/home/pi/SPUTTER2.mp3")
  def end_beep(self):
    self.speechlog("beep;       recognize")
    self.beep("/home/pi/WARBLE.mp3")
  def startcommand_beep(self):
    self.speechlog("beep;       recognize")
    self.beep("/home/pi/POING.mp3")


  def beep(self,filename):
    self.speechlog("beep;       " + filename)
    cmd = ['mpg321',filename]
    with tempfile.TemporaryFile() as f:
      subprocess.call(cmd, stdout=f, stderr=f)
      f.seek(0)
      output = f.read()

8 Likes

remember folks, this is a work in progress!

a little tweak already!

i split up entity and command recognation.
at this point i can turn on and off 3 switches and 1 input_boolean.
and is very easy to add others.
i also have a log registrating the text that is spoken.
so you can readback what it heard when you spoke to it.
you can repeat the same word several times, and see if it understands you most of the time. if it misunderstands you the same way every time, you can simply use that as command or entity.
does it understand my wife different from me? probably. but no problem, i just use both parts in the list and it understand us both in the end.
it is like learning the old speechrecognition systems to know your voice.

possible option:
commands and entity ids in a txt file.
if Appi doesnt know a word, he asks if he should add it to the known list and what entity or command should be connected with it.

so i am on the road to a self learning system.

2 Likes

Pretty neat.

I have a similar project going with the Google Home + IFTTT + HA.

I’ve created scripts for most of the devices I control with HA. I can get their status, turn them on / off (if applicable), open / close, arm / disarm, activate scenes, etc… just basically created a script for every action I could think of, and can access them all through voice command via Google Home.

It’s pretty neat!

Looks like you’re bringing it to a whole new level though! Looks great so far!

2 Likes

google home and amazon Alexa are nice.

but they are still not available here, and if they are, they are in english, and maybe in a short while in german.
and we are dutch.
before that arrivés in those devices we will be in 2018.

so i thought why not create my own.
maybe the speechrecognition wont be that perfect, but i can create more commands so even less perfect recognition will still work oke.

maybe in the near future thinking about a more powerfull server then the RPI.
and then RPIs in every to listen to the text and send it to the server.

at this moment it comes in all google languages! :wink:

Do you have a link to your project? I’d like to learn more!

Awesome job on this @ReneTode! You continue to extend AD in imaginative ways that I think @aimc hoped when he created that platform!

2 Likes

Yep - Rene is my star user!

1 Like

Cool! You might want to have a look at snowboy.kitt.ai for the hotword detection. They provide a python example and you can have the user decide which hotword he’d like to use :slight_smile: jarvis.sh is using the bing api for the actual recognition of spoken words, but I am as well not very happy with sending my voice into the “cloud”.
Hth Hannes

in this code it is also possible to set your own hotword.

at first when i was testing in english i used hello, then i used the dutch hallo, and now i use Appie :wink:

at this moment i use google (so i send it in the cloud also), but with speechrecoqnition you can use several translators.
i could make it optional.
dutch versions locally are probably no good. i think that english is the only thing that locally is goog enough, but when it is complete people can decide for themselve.

1 Like

i am just the most crazy :wink:
others would probably go the way to use HA directly for stuff that i do in appdaemon.
i know that almost all components in HA can be rebuilt with appdaemon. and because i like to play around and love the freedom, i use appdeamon for unusual things :wink:

1 Like

This is more then awesome.

At the end of the day the command used, by me, in Alexa are few.

Having the possibility to talk in my language, shutting off corporation profit minded, the NSA and CIA, is top of the pop! !!!

@ReneTode This is cool. You should definitely take a look at Mycroft (https://mycroft.ai/), which is an Open Source equivalent to Alexa/Google Assistant. They use the same speech recognition APIs internally.

I started work on a Home Assistant skill, but haven’t had time to finish it yet: Mycroft AI

nice project, but i need a new RPI for that, because they only provide SDcard image.

i think i stick with speechrecognition for now and try what i can do with that.
i have used the google part today and that works. but i need to learn more because sometimes it hangs, but i havent played around with timeouts etc.

i will try out spinks too, because it is locally.

it can work with:

CMU Sphinx (works offline)
Google Speech Recognition
Google Cloud Speech API
Wit.ai
Microsoft Bing Voice Recognition
Houndify API
IBM Speech to Text

so i need some testing time :wink:

1 Like

Mycroft have a Debian repo too, which contains armhf packages which should run on the Pi. Their custom HW platform is based on the Pi so it should work pretty well. See the “Installing using apt” section of their getting started page: https://docs.mycroft.ai/development/getting.started

thx.
i cant find anything about it except that they use google for default STT.
do you know if you can set other STT?

i was just looking around in the speechrcognition docs. (Always usefull AFTER you install things :stuck_out_tongue: )
spinxs has no dutch available without doing a big amount of work :frowning:
google wants you to get an account and grants 50 instances a day (but without a key it works (but can be revoked))
dont know if that will cause trouble.

cloud speech, wit, bing and IBM need you to setup an account (and probably you need to pay or give creditcard data)
so those are no favourites anyway.

and houdify too, but they have only 2 or 3 languages.

so i probably stick also with google for now, but will try to get sphinks to understand enough dutch.

Mycroft uses sphinx for the wake word detection, which is done locally. The rest of the phrase is then sent to the cloud for processing. I think the default it google, but they have support for others, which can be set up in the config file. It’s been a while since I looked at it. Probably best to ask on their support forum: https://community.mycroft.ai/

because i actaually want commands, i probably will go to spinx to.
but if mycroft uses google too, it has the same problem. you need an account and/or it is restricted.
unless google has given up the restrictions then speechrcognition has no problem too.

i actually see no reel advantage to use mycroft over doing it directly with this code.

in mycroft i need to create skills, and thats actually all i need to do with this program right now.
but it helped me to learn to programm the skills part :wink:

Cool, each to their own. I look forward to seeing your results. Good luck!

1 Like

by the way @aimc

this kind off apps dont get reset like they should.
if i take out the app from the config it says it delete the entry but it stays running.
so i end up restarting appdaemon over and over now.

also the connections to alsa etc. are not reset. so if i edit the code i cant save it without errors and resttting appdaemon after that.

is there something we can do about that?