Snips vs Zanzito

There is a new player in town; Snips.

From my initial understanding, it seems a lot like Zanzito but runs on Raspberry Pi.

The similarities are…

  1. Uses MQTT to communicate
  2. Can do speech recognition
  3. Works without internet (except voice command. zanzito still needs internet for voice command to work).

However these are the pros and cons of Snips when compare to Zanzito.

Pros:

  1. STT done locally. Doesn’t need internet to work
  2. The statistic looks promising (I haven’t personally try it out yet)
  3. Can teach it to understand more complicated voice commands like “Brew me a strong cappuccino with skimmed milk and three extra sugars
  4. Can use any USB microphones that is supported by RPi (better hardware choices)

Cons:

  1. Doesn’t seem to have option to choose which TTS engine or what voice to use.
  2. Custom hot word detection not yet supported.
  3. Cannot use existing MQTT broker yet. Must use built in broker at the moment.
  4. More complicated to install compare to Zanzito.
  5. Bluetooth speaker support in RPi is hit and miss.
  6. RPi cannot work during power outage unless plug in to UPS (costly). Android phones or tablet already comes with battery that can work very long during power outage.

While I love Zanzito for its simplicity, generally I think Snips is more proper replacement for Alexa if the cons I listed here are addressed. What I foresee is Zanzito will remain the best companion in my Android phone but Snips might take the spot once taken by my Echo dots.

So what do you guys think?

I am hoping, once they have their MQTT broker situation fixed, that I will be able to run SNIPS on a Pi Zero W with a microphone attached, and connect that to HA. Which, if it works, would emulate an Echo dot quite closely.

Currently they specify a PI 3 as the supported hardware, but I think it would be worth trying.

1 Like

I think Zanzito is for the phone so very portable, Snips for a non movable pi3 acting as a Echo.

Two different programs

1 Like

Agree. I initially wanted to replace my echo with zanzito install in old tablets/phones. Now Snips seems more promising.

Although now with the new intercom feature of the Echo (and I think also soon the multiroom) for a 50$ object is hard to beat.

Yes I know SNIPS is not cloud based and you don’t have a powerful corporation in the house but … I will switch to SNIPS if I see very fast response and I find a good speaker/microphone system, which is very difficult to find as per the Echo quality

Yes. Echo mic is hard to beat.

This is a very good summary. Another (temporary) con, although not specifically in the comparison to Zanzito, would be no multi-zone/room support yet. They did say in a Reddit thread that it’s a planned feature, which I think would definitely help sell it to many more people.

1 Like

Fixed mqtt but without username and password

Other difference is the developer

Zanzito: you write, the developer answers in 1 hour, next day you have a fix.

Snips: you write, the developer answers after 1 week, maybe after a month you have a fix

Crazy question, but not having looked too deeply into the snips setup, is there any way to use zanzito to transmit voice commands to the snips MQTT server and then let snips interpret and process the intents in HA? I’m looking into adding some Pi’s w/ mics, but would still prefer tablets for the visual option, as well as still being able to use my phone to command things when I’m out of the house.

1 Like

That’s what I’m interested in, too. Home Assistant is already able to receive transcribed speech from Chrome (mobile or desktop) for its conversation component. It seems arbitrary that snips requires its speech recognition engine to be used, when it could instead take text as input.

You absolutely can do that with snips.

mosquitto_pub -h SNIPS_MQTT_IP -p 1883 -t 'hermes/nlu/query' -m '{"input": "play christmas music"}'

Snips will parse that and go through the rest of the chain. Snips has 5 programs that communicate via mqtt so you can send various info at different points. For example, i cheat and use the audio server to play wav files generate by amazon polly to avoid using the pictts it uses by default.

1 Like

Well zanzito can do the voice recognition and send it as an mqtt message to snips as I detailed below. But zanzito uses google voice recognition so at that point you may as well look into using google voice assistant.

What would be more interesting to me is being able to send an arbitrary wav file to snips to be parsed. You could sort of do this using pulseaudio and piping something to the mic device.

But snips does support running just their audio engine on other linux servers, be them raspberry or not. The audio server actually pipes its output to mqtt but it doesn’t seem to really work in a quick test. I tried toggling the ASR on then piping to the audio server but not luck. This could definitely be done, but not sure it’s worth the effort to be honest. Snips would also need to be configured with a second siteid which isn;t documented yet.

Interesting idea. Ideally, I’m agnostic regarding which service handles the speech recognition, with migrating from MQTT triggered automations to intent based controls the goal. If I’m reading this correctly, will sending the voice command as a payload to hermes/nlu/query on the Snips MQTT server allow it to be parsed?

No, I haven’t figured out how to do that, although there are ways to do it. You could use pulse audio to handle the mic input which would allow you to send the audio from multiple sources into snips.

Hermes/nlu/query take text input, you could use the google asr to parse the audio and then send the text to snips.

Thanks, That’s what I was trying to say.

The snips component listens to the MQTT server you have configured in Home Assistant.
If you publish the Json to hermes/nlu/intentParsed, snips will pick it up.

The json should be in a format snips can work with, you can check the format in the console of snips.io
If have succesfully posted a command (turn on the light) just via a publish to the MQTT server.
My json looked like this:

{
  "input": "turn on the light",
  "intent": {
    "intentName": "ActivateObject",
    "probability": 0.89950687
  },
  "slots": [
    {
      "rawValue": "light",
      "value": {
        "kind": "Custom",
        "value": "light"
      },
      "range": {
        "start": 12,
        "end": 17
      },
      "entity": "objectType",
      "slotName": "objectType"
    }
  

This is the output from the testconsole on snips.io. But I needed to change the
“intentName”: “Alpha:ActivateObject”, to “intentName”: “ActivateObject”.
The result was the exact json as send from my snips addon to the MQTT server

That’s awesome - thanks for sharing that. I can turn that into a shell_command pretty easily, but I’d still have to look into how to get HASS to pass along arbitrary transcribed speech to it. (My reading of the conversation component seems like it’s only built to map strings to intents.)

Why would you send the intent to snips though? HA listens for that exact topic so you could just send it to HA directly. Or at that point, just send some other arbitrary command that did what you wanted without getting into formatting a json intent.

I meant, the snips component, not the snips platform. Sorry for the confusion :wink:
Indeed, when you post a usable format for snips to that topic on HA, HA will process it.

So in fact, you can use ANY software which can publish json to a MQTT broker to connect with the snips component