Also @scstraus, I did some further research into displaying rtsp natively on Fully. Sadly it doesn’t work too well. Fully uses the Webview rtsp decoder supplied with Android, which is apparently very picky about what kind of streams it accepts. And it doesn’t seem to like my Hikvision streams, it just flat out refuses to open them. With an old DLink DCS-2330-something I still had lying around, it works perfectly. Go figure. Then again ffmpeg also seem to have a lot of issues with Hikvision streams, I often get image corruption when decoding recorded footage pulled as rtsp from the nvr. Not too sure who is to blame here.
Anyway, I got in touch with Alexey from Fully, and this is what he had to say (hope he doesn’t mind quoting him here):
Basically, opening an RTSP URL in browser should do the job if the Play Videos in Fully is enabled. However the RTSP stream format may be incompatible with those that are supported by Android. For my experience some camera streams a re working, others not.
The only web native true real-time streaming system at the moment is WebRTC.
In theory it would be possible to have home assistant extract the raw H264 data from the RTSP stream and represent it as WebRTC for a browser to consume. That would get very low latency, and fairly low server side cpu (the mandatory encryption accounting for most of it). The big problem is that it is a peer-to-peer style protocol over UDP, and will likely need at least a STUN server in remote access cases.
The complexity of setting all that up would be pretty remarkable.
Unfortunately WebRTC was designed for real time peer-to-peer communication. Most cameras don’t have native peering capabilities, and having Home Assistant act as a proxy peer is non-trivial. The maintainer of pyAV also produces a library called aiortc which I was looking to use when building the stream component initially. Unfortunately WebRTC has a lot of rules around how video and audio are packaged and transmitted which basically forces transcoding on the RTSP stream from the camera. If you think you see high CPU usage with MJPEG, wait until you try transcoding video on a small ARM device like the Pi. With the plethora of supported devices available to run Home Assistant on, any amount of transcoding would raise a new WTH for “why is my Pi on fire?”
It is good to have a low CPU option for Pi users, and I agree that nothing will probably beat a HLS for that use case. But an awful lot of us are also running on decent Intel hardware. I, for one, find little use in a camera stream that’s delayed by 13 seconds. I run the cameras so I can see when someone is coming to the door or pulling up in the driveway. By the time I get that information, it’s not usable any more. Indeed, I am planning a hardware upgrade just so that I can transcode everything to MJPEG and get those nice realtime streams on all my devices. So, it’s also good to think about use cases for guys like us.
I think in the end, it would be nice to merge everything into only one kind of camera component and have a number of options to it which make it easy to switch the options per camera and make it clear what is happening and what the tradeoffs are such as below. You could make an even nicer table showing all the tradeoffs… I would be happy to do the first pass of documentation for such a component based on my understanding. Understanding what you are actually doing is very difficult with the camera components we have now (the reason this post was needed to begin with). This would make it much simpler and clearer.
Proxy the camera stream so that only one stream is opened to the camera (only valid with h264_rtsp and h264_onvif input stream)
h.264 input and h264_hls output are the recommended options as they produce the lowest CPU usage, although HLS does increase latency of the stream.
MJPEG and SIP options are low latency but will increase CPU usage greatly. If the encoding of the input stream and output stream don’t match, it will require 1 instance of FFMPEG transcoding per camera being displayed on each device. This means that 1 camera being displayed on 4 devices will spawn 4 FFMPEG transcoding sessions and the associated CPU overhead. Adding WebRTC has highest CPU usage and will add additional transcoding load above that of just normal transcoding, but has best browser support.
Implementing that in Home Assistant is way out of scope. It’s a home automation platform, not a NVR or camera viewer. I’m actually curious on what your real-time use cases are. Even with the delay I don’t use the live feed because by the time I or the other devices I would send it to notice an event, most of the time it is said and done. If I’m casually watching things around the house a 5-10 second delay isn’t the end of the world. If I’m remote viewing I don’t even notice it. The only time it’s really noticed is if I’m trying to test something and am walking in front of my cameras while looking at the browser.
This gave me some insight into my efforts and failures with the same. I became so frustrated with live video in the HA interface that I gave up on it. Now I just use camera cards to display snapshots and use my Blue Iris UI for actual monitoring.
Realistically, while it would be nice to display live video in HA, BI can do it better.
I’m not asking for any NVR features. And the fact that it has camera support at all means that it is a de-facto camera viewer already. I am not asking for camera-wall type features, just what I consider MVP for any kind of camera integration.
Also, it should be noted that though I listed SIP and WEBRTC as possible input and output formats in my example, my point wasn’t so much about implementing those as it was to simply take the technology that’s already in homeassistant today and merge it into a single component that can be more easily documented and understood. Cameras are currently so fragmented and the tradeoffs between the options so poorly documented that it’s very confusing to even try to choose the appropriate solution for your needs.
As for what I use the cameras for, I use the cameras for a number of things:
Watching for when packages are coming in the front of the house and seeing if someone else opened the door for the delivery guy (it takes me a while to get to the gate buzzer and usually my wife does it)
Watching the kids as they are playing outside, making sure they don’t kill themselves (there’s a lot of parts of the outside that you can’t see from the inside and we have a 3 year old boy)
Checking for intruders if we get an alert about a human on premises at night. Hass is a bit quicker to load up than synology, and I can click directly on the notification from hass to get to a camera. Right now the cameras aren’t really reliable enough to do this, but I would like that they were.
Being able to combine the camera views with other information about the motion sensors or human detection such as histories or real time status. Also having them in small thumbnails always updating next to the dashboard stuff on the tablets is nice, sometimes I see something useful there like someone pulling up to the house.
I would like to use it in the future for doorbell cameras. I already have tablets around the house that usually display a general info feed, but which popup the camera feeds when a human is spotted outside. I would also like to put these tablets near the intercom to the doorbell so that we can see who’s there when someone is at the front door, but at the rest of the time display the normal dashboard.
Honestly, I have the opposite problem, trying to come up with any use case that’s served by a camera with a long delay…I would think most of those are things like seeing if the lights are on or if a door is open which would be better served with other types of sensors anyway.
I did run the HLS cameras for quite a while but realized after a while that they weren’t serving any purpose for me, and that an unreliable low latency stream was actually still a lot more useful than a reliable high latency stream
I’m not really trying to start an argument on the internet here, so I hope you didn’t take it that way. I guess we just have differing opinions on what qualifies as a “long delay”. I typically see an 8 second delay in the HLS stream of my cameras. In my opinion, a 5 to 15 second delay does not make a bit of difference in real world scenarios. I personally actually like the delay, because if I hear my children scream outside it may take me 5 to 15 seconds to pull up the feed, or glance at the PIP on my screen. At that point, I can see what caused the disturbance instead of only looking at the aftermath. Same with package deliveries. When I get a notification to my phone, it takes time to load. Even with the delay the person/vehicle of interest has already left the frame by the time the live feed is up and I have to review recordings or snapshots from my object detection setup.
Your 5th use case is really the only one I see potentially needing a real-time feed for, because the offset of video from hass and audio from the intercom would be a little awkward. At that point though, they also make video intercom systems which might be a better solution.
Edit: I do agree with you, however, about the confusion that can be caused by the plethora of camera integrations that are under various stages of broken or active development/maintenance. Unfortunately, not all cameras are created equal and support the same thing.
I’m also not trying to argue. I think we are both making valid points. And, yes, seeing what happened after the kids screamed was one area where the delay was useful, though it’s much easier to add a delay if needed than it is to take one away.
I think the difference in our package delivery use cases is due to the fact that I have to go buzz someone in. By the time I see them on the camera (my delay is 13s), they have already long since buzzed the door and any chance I had to make sure my wife was getting it or begin the trip out of the office down to the buzzer is long gone. Or if I want to see if she did buzz them in already, it’s actually faster for me to get up from the desk and walk out on to the balcony to look than it is to check it on the camera feed. But everyone has different use cases and preferences. It would be nice if there were good solutions for each.
Anyhow, the main point was that if we merged the cameras into one integration and then made everything options for that one camera integration, then it would be much easier to explain when you would use each option rather than doing what I’m trying to do in this thread and try to explain all the different camera integrations and what they do differently and the same and what combinations of different things produces good outcomes for good use cases. What each integration is actually doing in the background is very opaque. I don’t think it needs to be this hard.
Yeah. I also think it heavily depends on your workflow as well. All of my notifications are based off of the following workflow, which does not require any viewing of the camera in Home Assistant.
ONVIF Event fires for motion/sound/field detection - Real Time
Trigger TensorFlow Scan for Person/Vehicle detection (and save image in www if found) - adds ~1 second
Send notification to phone with detection results and applicable actionable notifications - adds < 1 second
So all in all, the majority of my use cases where I need verification happen in less than 2 seconds, with picture (not video) confirmation and the ability for me to trigger additional things if necessary. I have taken video out of that workflow for the most part because there is really no need for it. A still is good enough.
I do something very similar for cases where there is a potential intruder. If alarm is on or we are sleeping and a human is detected on the property, send us an emergency notification with a photo. That gives me a pretty good idea if there’s a problem already, but then if someone’s there, I’d definitely want to see what they are doing at that exact moment. I can tap the notification to get to my homeassistant dashboard which has the cameras, which is useful if they are reliable and real time, but currently for these cases I usually skip it and go find my synology app and open that up as it’s more reliable and more real time, but adds ~5 seconds time to getting to what I need. Reliable real time cameras in hass would save me some seconds for this case
For the kids, when someone is detected outside and we are home, it just pulls the feed up on the tablets around the house so we can see what they are doing. For this there are advantages of having real time (such as preventing disasters before they happen), or delayed (seeing what happened after something bad happens)… The ideal would be a real time feed with an “instant replay” button. I think this could be managed with the recorder functionality, but I haven’t tried it yet. Anyhow, again, easier to add delay than take it away.
For package delivery it’s a bit trickier to do with object detection since I will have to monitor the street for objects which means it will pick up passerbys too, but I may give it a shot relatively soon and see if I can somehow tune it to be useful.
But I’d still like that combination homeassistant dashboard and video intercom… I don’t really want to add a dedicated video intercom if I can have both.
I did. It didn’t work very well. The main problem was that I’m using two streams per camera. The lower res substream is used for viewing, especially remotely. But when something happens, like an intrusion event is fired, I want to record the main 4k stream. The problem here is that the stream component first has to open the stream before being able to record it, and that takes a very long time. Too long to make it useful. And I don’t want multiple 4k streams running 24/7. I also never got the lookback feature working when calling the record service, even if the stream was already open.
So after a lot of frustrating days trying to make this work I decided to buy an external hardware NVR instead (A DS-7608NI-I2/8P). I control it from HA over its REST API, where I can start and stop stream recording, access recorded footage, search for past events, etc. It has a very well working pre-record feature, which gives me up to 10 seconds of video before the event occurred. I wrote some connector code that lets a HA automation pull recorded footage from the NVR for instant playback. I also use the feature to automatically create three still shots (one second before the event occurred, one on the event, and one 2 seconds after) and add them to a timeline I can view remotely.
That said I have to agree with @hunterjm about live viewing. I managed to get my delay down to around 4 to 5 seconds. For me that’s fine. I don’t actually view live feeds all that often. I mostly use the timeline or the instant playback feature when an event was triggered that someone entered the yard or drove up the driveway or something like that.
I’d like to post one more thing in this thread just to finish up my general thoughts on this topic.
I think we’ve discussed the stream component a lot and that this component is getting a lot of attention and is really doing about the best it can within the limits of HLS, so until we get HLS-LL support in browsers, we probably can’t expect much better there. It is great for people who want low CPU usage and don’t mind some lag. I also think we discussed other protocols, it would be great to see those, and indeed I think some use cases like video doorbell/intercom are really only possible with SIP or WebRTC, but I acknowledge that is a lot of work. We also discussed merging the components into something a bit less fragmented and better documented. Also likely a lot of work.
What I think we haven’t discussed enough is some potentially simple things that can be done with the existing non-stream based camera platforms for people who need lower latency. I feel like many of these camera platforms feel half finished and could possibly be relatively easily improved compared to the other things we discussed.
Do we really need one FFMPEG for every browser that accesses each cameras? Ie if we have 3 browsers open to 4 cameras, do we really need 12 FFMPEG processes to handle that? Couldn’t we make the MJPEG stream once and just send that to all clients? Could reduce CPU and also open less streams to the camera.
What about adding hardware acceleration. I moved to a intel i5-7500 and was surprised that FFMPEG still took about 20% of a core when it took maybe 25% of a core on the old Core 2 Duo. If we had FFMPEG hardware acceleration, I think I’d get a lot more out of this newer processor than I am. As it is, it’s really only the extra 2 cores that are helping out, not the better GPU capabilities.
If the camera fails to load in the UI, maybe we could get an error message there that tells us what went wrong?
It would be good to know in which cases which integration is opening new streams to the camera or in which cases it uses the fixed image URL. It’s often hard to tell what resource I’m consuming depending on how I have things configured. I don’t know how many times it’s going back and opening the same stream to the camera either.
Why is the ONVIF integration so much more reliable than the other integrations using the same streams? Maybe there is some technology there that we could bring over to the others?
It would be good to have control over options about when the streams connect and disconnect. For example with the proxy cameras, I’d like to tell them to always leave the connection to the camera open and reconnect if they lose it so that they are ready to go when needed.
Just a few ideas, I will think of more about 1 hour after I press send on this. I wish my programming skills were up to taking this on myself, but unfortunately I’m still riding the trike and just getting basic python scripts going.
I am so confused by this. I’m just lost as to how to set this up as recommended by you. I currently have 4 rtsp http cameras running locally that I have integrated with ffmpeg. It’s not great, seriously would love to improve my feeds in hassos. And having a similar server age/ability to yours, I think I could gain a lot from learning what you’re trying to teach here.
I guess to get started, how does one use the onvif integration? I tried to just muscle through install of it, no dice. Missing some info. Any help you can give would be great. Thanks for your time and effort in testing all these platforms.
Ah, I kinda figured looking into it further. I doubt this firmware is compliant. Wyze, highly doubt it. Stuck with this then I suppose. Other then disabling the stream component, I’m at a loss I believe. Not that I’m incredibly bothered by it, they are only $25 cameras.