I tried all the camera platforms so you don't have to

Hi, forgive my noobness to HA here
I’m planning my setup so I find this thread very interesting.
I’ll most probably host HA as a VM on a qnap nas, which should also save the camera streams.
Now I haven’t decided yet if I will go with unify or generic Chinese cams (those clunky plasticky big boxes honestly look awful in a newly renovated apartment imho)

Now regarding the tech, I’ve got some questions
Is it possible to open iframes to the cameras directly? that would open a socket to the camera page itself bypassing any additional component

Another interesting thing to do, would be to stream the zoneminder/sinology/qnap/shinobi/ispy/whathaveyou overview page straight to HA
Is that doable/worth it?

Incoming

Go Hikvision turret cameras. They are nice and small and quite well priced

Yes, but iFrames will only work with the same encryption you view HA with. ie: if you view HA with HTTPS, then the cam feed needs to be encrypted too (HTTPS). Or they both need to be just HTTP.

1 Like

We’re all in it together.

Thanks for doing all this work! Really appreciated.

I think part of the delay you see when the stream component is enabled is a deliberate feature, not completely a bug. I notice that when popping up the “live” view of a camera, you can drag the scrub control at the bottom of the window all the way to the right, and end up with a shorter delay, maybe 5 seconds? I’m not certain and I might vary with the I-frame interval as well.

It would be great if you could skip this if you didn’t care about seeing recent history. I believe the buffer is there so that upon some (motion detection, etc) event, you can start recording 10 seconds ago. That’s a really great feature, but perhaps not what’s expected in a live camera view.

I opened a WTH does a streaming camera display live video 15 seconds delayed by default post to this effect as well.

1 Like

As @HeyImAlex points out, HLS will have latency by its nature. We are taking a stream and packaging it up into segments which get served over http. Since segments are discrete there will be a latency of at least one segment length. Then the player itself might buffer another few segments. Our implementation of HLS already uses a non-standard fragment duration (Apple recommends 6 seconds, we use 2 seconds). It will probably be hard for us to get to less than ~4-5 seconds of latency using HLS. That might be reasonable if you’re viewing the stream remotely. If you’re local and you want to reduce latency, you can probably use other methods which may have less latency but use more bandwidth (MJPEG) or require other TCP/UDP ports (just use the original RTSP stream).

The Low Latency HLS link you’ve shared is not just about tweaking/tuning. It’s an extension to HLS which involves a different encoding/http pipeline (it needs to produce encoded chunks before the whole segment is done muxing/encoding and it uses http/2 to send these chunks as soon as they are available). Unfortunately the pyav library we use to access libav won’t make it easy to do this. Edit: The PyAV package we use to access the ffmpeg libraries doesn’t make it easy to do the chunked encoding part, but we may be able to work around that. A bigger issue is actually http/2, since aiohttp does not look like it will support that anytime soon, but it looks like the http/2 requirement may have been relaxed. Anyway, as @hunterjm pointed out, ll-hls player/browser support is not really there either. We may revisit this sometime in the future, but probably not anytime soon.

Note - one significant source of lag can come from the use of H264+/H265+. These non-standard codecs reduce bandwidth by significantly reducing the number of keyframes sent. This ends up increasing the segment duration which increases latency. Also since they will cause the segment durations to vary and differ from the target segment durations, this will affect things like the lookback period in stream recordings. We should probably recommend against using these if latency is of any concern.

4 Likes

Thanks Justin, it’s always really valuable to hear the real technical reasons behind the issues, it makes understanding what’s actually happening and responding to it in the right way in our configs much easier.

When you say “just use the original RTSP stream”, is there a way to use the stream component while still pointing to the original RTSP stream? The only alternative I know to the stream component is to repackage as MJPEG and that takes a lot of CPU for a h.264 stream.

I meant maybe look for another way to display the RTSP locally, such as what @HeyImAlex suggested with Fully Kiosk.
The problem with latency comes from trying to get something from a packetized stream (RTSP) into a format which we can serve over HTTP (HLS). The latency comes because we have to batch everything into segments before we send it. We can get around this by using MJPEG since each picture can be like its own segment, but as you noticed, this takes a lot of CPU and bandwidth (it essentially sends a stream of pictures instead of a compressed video - you lose all the bandwidth benefits of compression across time and you also have to use CPU to convert from video to pictures). Going straight to the RTSP source gets around the latency limitations imposed by trying to batch everything to send over HTTP - you have one connection continuously open so you don’t have to send everything in chunks.
Low latency HLS is probably the right path for us, but that will take quite some time.

Great writeup. I also have a couple of Hikvision cameras, most of them 2mpx, and pretty much stuck to the generic camera serving the last picture snapshot once every second.

It works for a couple of seconds at full resolution and then goes blank.

The ffmpeg camera never worked for some or other reason on my hikvision. Will give ONViF a try.

I also use frigate for person detection but still rely on normal snapshots to take pics of other events (such as the gates outside opening).

Also @scstraus, I did some further research into displaying rtsp natively on Fully. Sadly it doesn’t work too well. Fully uses the Webview rtsp decoder supplied with Android, which is apparently very picky about what kind of streams it accepts. And it doesn’t seem to like my Hikvision streams, it just flat out refuses to open them. With an old DLink DCS-2330-something I still had lying around, it works perfectly. Go figure. Then again ffmpeg also seem to have a lot of issues with Hikvision streams, I often get image corruption when decoding recorded footage pulled as rtsp from the nvr. Not too sure who is to blame here.

Anyway, I got in touch with Alexey from Fully, and this is what he had to say (hope he doesn’t mind quoting him here):

Basically, opening an RTSP URL in browser should do the job if the Play Videos in Fully is enabled. However the RTSP stream format may be incompatible with those that are supported by Android. For my experience some camera streams a re working, others not.

Supported media formats  |  Android media  |  Android Developers

1 Like

The only web native true real-time streaming system at the moment is WebRTC.

In theory it would be possible to have home assistant extract the raw H264 data from the RTSP stream and represent it as WebRTC for a browser to consume. That would get very low latency, and fairly low server side cpu (the mandatory encryption accounting for most of it). The big problem is that it is a peer-to-peer style protocol over UDP, and will likely need at least a STUN server in remote access cases.

The complexity of setting all that up would be pretty remarkable.

1 Like

Yes, WebRTC is the final answer to all issues. Someday… Hopefully…

Hey
I have xiaomi home 360 camera
Is it possible to integrate it in HA, did anybody try and succeed
Thx

Unfortunately WebRTC was designed for real time peer-to-peer communication. Most cameras don’t have native peering capabilities, and having Home Assistant act as a proxy peer is non-trivial. The maintainer of pyAV also produces a library called aiortc which I was looking to use when building the stream component initially. Unfortunately WebRTC has a lot of rules around how video and audio are packaged and transmitted which basically forces transcoding on the RTSP stream from the camera. If you think you see high CPU usage with MJPEG, wait until you try transcoding video on a small ARM device like the Pi. With the plethora of supported devices available to run Home Assistant on, any amount of transcoding would raise a new WTH for “why is my Pi on fire?”

4 Likes

It is good to have a low CPU option for Pi users, and I agree that nothing will probably beat a HLS for that use case. But an awful lot of us are also running on decent Intel hardware. I, for one, find little use in a camera stream that’s delayed by 13 seconds. I run the cameras so I can see when someone is coming to the door or pulling up in the driveway. By the time I get that information, it’s not usable any more. Indeed, I am planning a hardware upgrade just so that I can transcode everything to MJPEG and get those nice realtime streams on all my devices. So, it’s also good to think about use cases for guys like us.

I think in the end, it would be nice to merge everything into only one kind of camera component and have a number of options to it which make it easy to switch the options per camera and make it clear what is happening and what the tradeoffs are such as below. You could make an even nicer table showing all the tradeoffs… I would be happy to do the first pass of documentation for such a component based on my understanding. Understanding what you are actually doing is very difficult with the camera components we have now (the reason this post was needed to begin with). This would make it much simpler and clearer.

Camera Options

Name Type Default Supported options Description
input_format string h264_rtsp h264_rtsp | mjpeg | h264_sip | h264_onvif Encoding of input stream
output_format string h264_hls h264_hls | mjpeg | h264_sip | h264_hlsll | h264_webrtc Encoding of output stream
proxy boolean off on | off Proxy the camera stream so that only one stream is opened to the camera (only valid with h264_rtsp and h264_onvif input stream)

h.264 input and h264_hls output are the recommended options as they produce the lowest CPU usage, although HLS does increase latency of the stream.

MJPEG and SIP options are low latency but will increase CPU usage greatly. If the encoding of the input stream and output stream don’t match, it will require 1 instance of FFMPEG transcoding per camera being displayed on each device. This means that 1 camera being displayed on 4 devices will spawn 4 FFMPEG transcoding sessions and the associated CPU overhead. Adding WebRTC has highest CPU usage and will add additional transcoding load above that of just normal transcoding, but has best browser support.

2 Likes

Implementing that in Home Assistant is way out of scope. It’s a home automation platform, not a NVR or camera viewer. I’m actually curious on what your real-time use cases are. Even with the delay I don’t use the live feed because by the time I or the other devices I would send it to notice an event, most of the time it is said and done. If I’m casually watching things around the house a 5-10 second delay isn’t the end of the world. If I’m remote viewing I don’t even notice it. The only time it’s really noticed is if I’m trying to test something and am walking in front of my cameras while looking at the browser.

1 Like

Great write up!

This gave me some insight into my efforts and failures with the same. I became so frustrated with live video in the HA interface that I gave up on it. Now I just use camera cards to display snapshots and use my Blue Iris UI for actual monitoring.

Realistically, while it would be nice to display live video in HA, BI can do it better.

I’m not asking for any NVR features. And the fact that it has camera support at all means that it is a de-facto camera viewer already. I am not asking for camera-wall type features, just what I consider MVP for any kind of camera integration.

Also, it should be noted that though I listed SIP and WEBRTC as possible input and output formats in my example, my point wasn’t so much about implementing those as it was to simply take the technology that’s already in homeassistant today and merge it into a single component that can be more easily documented and understood. Cameras are currently so fragmented and the tradeoffs between the options so poorly documented that it’s very confusing to even try to choose the appropriate solution for your needs.

As for what I use the cameras for, I use the cameras for a number of things:

  1. Watching for when packages are coming in the front of the house and seeing if someone else opened the door for the delivery guy (it takes me a while to get to the gate buzzer and usually my wife does it)
  2. Watching the kids as they are playing outside, making sure they don’t kill themselves (there’s a lot of parts of the outside that you can’t see from the inside and we have a 3 year old boy)
  3. Checking for intruders if we get an alert about a human on premises at night. Hass is a bit quicker to load up than synology, and I can click directly on the notification from hass to get to a camera. Right now the cameras aren’t really reliable enough to do this, but I would like that they were.
  4. Being able to combine the camera views with other information about the motion sensors or human detection such as histories or real time status. Also having them in small thumbnails always updating next to the dashboard stuff on the tablets is nice, sometimes I see something useful there like someone pulling up to the house.
  5. I would like to use it in the future for doorbell cameras. I already have tablets around the house that usually display a general info feed, but which popup the camera feeds when a human is spotted outside. I would also like to put these tablets near the intercom to the doorbell so that we can see who’s there when someone is at the front door, but at the rest of the time display the normal dashboard.

Honestly, I have the opposite problem, trying to come up with any use case that’s served by a camera with a long delay…I would think most of those are things like seeing if the lights are on or if a door is open which would be better served with other types of sensors anyway.

I did run the HLS cameras for quite a while but realized after a while that they weren’t serving any purpose for me, and that an unreliable low latency stream was actually still a lot more useful than a reliable high latency stream

1 Like

I’m not really trying to start an argument on the internet here, so I hope you didn’t take it that way. I guess we just have differing opinions on what qualifies as a “long delay”. I typically see an 8 second delay in the HLS stream of my cameras. In my opinion, a 5 to 15 second delay does not make a bit of difference in real world scenarios. I personally actually like the delay, because if I hear my children scream outside it may take me 5 to 15 seconds to pull up the feed, or glance at the PIP on my screen. At that point, I can see what caused the disturbance instead of only looking at the aftermath. Same with package deliveries. When I get a notification to my phone, it takes time to load. Even with the delay the person/vehicle of interest has already left the frame by the time the live feed is up and I have to review recordings or snapshots from my object detection setup.

Your 5th use case is really the only one I see potentially needing a real-time feed for, because the offset of video from hass and audio from the intercom would be a little awkward. At that point though, they also make video intercom systems which might be a better solution.

Edit: I do agree with you, however, about the confusion that can be caused by the plethora of camera integrations that are under various stages of broken or active development/maintenance. Unfortunately, not all cameras are created equal and support the same thing.

1 Like

I’m also not trying to argue. I think we are both making valid points. And, yes, seeing what happened after the kids screamed was one area where the delay was useful, though it’s much easier to add a delay if needed than it is to take one away.

I think the difference in our package delivery use cases is due to the fact that I have to go buzz someone in. By the time I see them on the camera (my delay is 13s), they have already long since buzzed the door and any chance I had to make sure my wife was getting it or begin the trip out of the office down to the buzzer is long gone. Or if I want to see if she did buzz them in already, it’s actually faster for me to get up from the desk and walk out on to the balcony to look than it is to check it on the camera feed. But everyone has different use cases and preferences. It would be nice if there were good solutions for each.

Anyhow, the main point was that if we merged the cameras into one integration and then made everything options for that one camera integration, then it would be much easier to explain when you would use each option rather than doing what I’m trying to do in this thread and try to explain all the different camera integrations and what they do differently and the same and what combinations of different things produces good outcomes for good use cases. What each integration is actually doing in the background is very opaque. I don’t think it needs to be this hard.

2 Likes