Telegram2Snips for external access using HA only

Hi,

I want to present my first (alpha) approach for bridging commands from telegram to snips using homeassistant only :grinning: Looking through the forum, I found several users that tried to establish the same, but none of the solutions seemed ready and convenient for me. So, I decided to build my own. Before going into the details, let me tell you my motivation.

Motivation
As the most of you, I want to access my homeassistant from outside my LAN. However, I am a paranoid regarding opening ports because I fear every opened port to be a potential threat. So far, I used OpenVPN to establish a connection to my LAN and than I was able to mess around with my home. However, this is sometimes annoying when I just want to check a temperature or control one switch. Furthermore, a suitable internet connection is required, which may not be the case everywhere.

I was very happy when I found the telegram polling component that is able to receive text from trusted users without opening ports. It is easy to use and does not necessarily require a potent internet connection. However, I was too lazy to write an automation for every command that may be received. Furthermore, my wife would go crazy when I would try to explain to her which command she may be required to use for what switch. At this point, I recognized that bridging the commands fro telegram to the powerful (offline) voice control โ€œSnipsโ€, that I already used for quite some time, would solve a lot of my aforementioned requests:

  • External restricted access without opening a port
  • Using natural language for controlling and querying my home switches and sensors
  • No automations besides the already existent intent_scripts are required for controlling
  • Besides potential vulnerabilities in the telegram polling component, I think it is pretty safe.
  • The bridge may be switched on or off

Idea
Snips internal communication works via MQTT: hermes protocol. Hacking into this communication should enable to inject commands without actually recording these. I think a conventient possibility for this is mimicking a (self designed) satellite for a snips multi-room setup. First, I started writing a python program that may run on any computer within the LAN. However, I hypothesized that an implementation in HA itself, would be a greater benefit for most potential users (furthermore, I did not have to struggle with the telegram implementation :wink:).

Prerequisites

  1. Telegram polling component needs to be ready for use
  2. Snips component including some skills and an assistant need to be ready for use

Implementation
Please recall that this implementation is an early alpha version, which is ready for use but has plenty of room for potential improvements that I will highlight later.

We need two MQTT sensors:

sensor:
  - platform: mqtt
    name: "snips_asr"
    state_topic: "hermes/asr/startListening"
    expire_after: 30
    json_attributes:
      - "siteId"
      - "sessionId"
  - platform: mqtt
    name: "snips_tts"
    state_topic: "hermes/tts/say"
    expire_after: 30
    json_attributes:
      - "text"
      - "siteId"
      - "sessionId"

These sensors receive necessary informations from snips (i.e., the sessionId and the response text).

So far, one automation is sufficient for bridging the commands:

- alias: 'Telegram2Snips'
  # We don't hide the entry in order to allow a fast on/off switch
  hide_entity: false
  trigger:
    platform: event
    event_type: telegram_text
  action:
    # We publish that we received a hotword for initializing the dialogue
    - service: mqtt.publish
      data:
        topic: "hermes/hotword/telegram/detected"
        payload: '{"siteId":"telegram","modelId":"telegram","modelType":"universal"}'
        retain: false
    - delay:
        seconds: 1
    #  We transmit the received command from telegram while respecting the initiated sessionId
    - service: mqtt.publish
      data_template:
        topic: "hermes/asr/textCaptured"
        payload_template: >
          {"text":"{{ trigger.event.data.text }}","likelihood":0.1,"seconds":2.0,"siteId":"telegram","sessionId":"{{ state_attr('sensor.snips_asr','sessionId') }}"}
        retain: false
    - delay:
        seconds: 1
    # We catch the response from the communication between the Snips Dialogue manager and the TTS process
    - service: telegram_bot.send_message
      data_template:
        target: '{{ trigger.event.data.user_id }}'
        message: "{{ state_attr('sensor.snips_tts','text') }}"
    # We close the session by telling the Snips Dialogue Manager that the response has been played (not working yet)
    - service: mqtt.publish
      data_template:
        topic: "hermes/audioServer/telegram/playFinished"
        payload_template: >
          {"siteId":"telegram","sessionId":"{{ state_attr('sensor.snips_asr','sessionId') }}"}
    retain: false

Thatโ€™s it. If you find something to improve, please tell me, I am quite new to home-assistant :slight_smile:

Shortcomings and Potential Improvements

  • The delays in the automation are necessary for the Snips Dialogue Manager to have to time react. However, additional MQTT sensors and a wait statement would be more appropriate and may speed up the response time.
  • So far, the session is not finished in a clean way. At some point Snips experiences a timeout thereby finishing the session. I have not found a way to extract the requestID that is most likely required for this from the Snips to audioServer request.
  • A true dialogue is not possible yet. In my opinion small adjustments to the automation may be enough to enable this (some kind of session monitoring). But they are not implemented yet. Furthermore, the timeout time of snips would need to be extended because writing is slower than speaking
  • It would be cool to bridge also audio commands. I think this would require changes in the telegram polling component to support incoming audio messages (or does it?). Furthermore, I would expect an one-the-fly encoding/decoding process which may be challenging in HA only.

If you have any other ideas or improvements, please tell me. What to you think about the idea and implementation?

3 Likes