Wyze Cams using wz_mini_hacks, go2rtc RTSP, motion detection, and two-way audio. Sorta

I’m running Home Assistant in Docker for a little over a year at this point. I got sick of writing little isolated scripts trying to connect different smart devices and would have started sooner if I realized how much HA can glue it all together.

I’ve been using Wyze Cams for a few years now. I started with some V2 cams using one of the earlier RTSP beta firmware versions. At some point I added a few more V3 cams. I’m up to about a half-dozen of each and one Pan V1.

RTSP was always clunky, but worked fine in iSpy where I used it primarily. When I added the cameras to Home Assistant it bothered me that the audio channel didn’t work right, but it wasn’t a show-stopper. I could always just pop open the Wyze app on my phone if I really needed audio.

Over the past couple months I’ve been playing with wz_mini_hacks and found that by switching to go2rtc audio worked. It also loaded the camera feeds faster since wz_mini_hacks provides a snapshot URL. I played with the cmd waitMotion to do area detection within Home Assistant, initially via API, and now via an Automation Webhook. Everything says two-way audio doesn’t work with the Wyze cams out of the box.

I noticed though that I could do cmd aplay, isn’t that sort-of two-way audio? I played with converting files to proper low quality .wav format and could trigger those via script. Then played with the file pipe (mkfifo) method listed in the wiki.

I figured out I could use the gstreamer configuration to configure a media_player in Home Assistant that writes out to a fifo. Now I have HA writing to a fifo in Docker:

  - platform: gstreamer
    name: 'tts'
    pipeline: "audioresample ! audioconvert ! audio/x-raw,rate=8000,channels=1,format=S16LE ! wavenc ! filesink location=/media/voice/snapcast_gstreamer"

and cmd aplay listening to a fifo on the cam

cmd aplay /opt/wz_mini/tmp/stream 50 2> /dev/null &

I’m looking for a better way to connect the two together. Ideally with something like snapcast, which I’m just dipping my toe into now, so I could potentially pick which cameras are listening to which streams, perhaps have multiple streams going around the house at the same time.

So far this works, sorta. I can start a listener in the Docker container, probably outside of it as well, and can probably mount the same directory I could use in snapcast for the pipe into HA Docker as well, but this is what I’m doing:

while true; do cat /config/voice/snapcast_gstreamer; done | nc -l -p 4001

Then on the Wyze cam I can connect to that listener and output the stream to the pipe that cmd aplay is listening to.

nc 4001 >> /opt/wz_mini/tmp/stream &

And if I time it all right, I can play a media file in Home Assistant and it will come out of the speaker of the Wyze Cam. That’s two-way audio, right? The problem is, and I haven’t figured out all the exact timing, that I have to start it all in the right order and then it only plays once. I think this has to do with nc liking to exit when it gets an EOF. I think I got the server/listener side working, but both the Wyze nc “client” and the cmd aplay command exist after the stream ends.

I’m sure I could loop the client side too, maybe digging into snapcast is the right path, and I can just have it send silence which will keep the stream going and keep it all active and I’m over thinking it. Either way I need to write up some daemons or command_line integrations to start these processes and keep them running on HA and each of the cameras.

Instead of continuing to build the Rube Goldberg contraption I’ve started I ask you is there a better, more direct way to get audio out of Home Assistant? I think I want it to show up as a media_player, but could be convinced I’m doing it wrong. I’d love to eventually have a PTT (push-to-talk) button on my glance camera cards that could just record a bit of audio and stream it to the speaker. If I have to go into Media → Text-to-speech and type it out, that’s fine too.


1 Like