2021.4 has given me all sorts of problems with my cameras on the tablet and just generally crashing fully kiosk so I’ve been experimenting again. My latest tests have given me worse results with the framerate on the proxy component (averaging 1 frame every 7 seconds), so I’ve updated that.
Also I’ve tried AlexIT’s WebRTC card which promises the best of both worlds, high frame rate realtime video with low lag. And mostly it delivers that, but unfortunately the initial load up time was quite slow (something like 10 seconds), and on my kindle fires they would regularly drop connection and always display a banner that says “connected” over the image, so I haven’t been able to find a good use for them yet as on my desktops and phones where they work well, the FFMPEG cameras work equally well and load faster. I will be checking back in with them from time to time and continue testing with different settings. But I have posted my initial findings on the original post.
Thanks for the updates. For me, I have decided to drop the use of HA on camera streams entirely. I feel that development on that aspect really stalls and even regresses, there have been no real improvements on live camera support in HA since last year. And the current support is, frankly, not good. I used to be more or less happy with my 4-5 sec lag, but with every major HA update this lag seems to be increasing. I also had a real life situation recently where I needed to monitor the live streams remotely - the lag turned out to be a real problem when you actually need the damn things for once. This situation is a little surprising considering how popular and important cameras are these days.
AlexIT’s WebRTC thing doesn’t really work well for me, especially with multiple cameras. When it works, it’s nice, but it’s very unreliable. Streams would often just not open at all, which is arguably worse than having them open with lag. It also requires you to open a port (or to VPN into your LAN all the time), you can’t just reverse proxy the stream as you could with HA.
HA camera support is still good for generating preview shots every n seconds.
I agree. I’ve had to use the sub-streams of my cameras for displaying in HA so that I could setup the cameras with the sub-stream at a lower resolution. My NVR records the full HD streams and HA just displays the low res. Otherwise HA often wouldn’t even open them, and my HA runs on an i7 machine with 16Gb of RAM.
I’m still waiting for proper integration of audio from camera streams as my front door camera has two-way audio, but this is not at all accessible using HA. Currently HA only supports a few audio codecs and none of them are the common CCTV camera ones. Pretty poor really.
I have a few Dahua Dome Lite 4MP cameras which I use to see what is happening in my apartment and for motion detection for switching the lights. Latter mostly works, but I am having some problems with the first. Onvif integration sets up only substream1 which is unusable because of low resolution. I believe it is because of h265 codec? Is there a way to make this work?
I am also curious if it is possible for HA to work as a NVR for the cameras?
Thank you for your great work in documenting the various camera platforms. Really save me tons of time to test which works best.
Just to update on my findings of AlexIT WebRTC card. I’ve been using AlexIT WebRTC card for 3 weeks now on 2 sites. So far its been promising and loads successfully all the time. The lag is less than a second but sometimes it will skip so frames which could be a combination of camera CPU or network issues. I’m using it for to open/close my gate so its a good upgrade from the 4-5s lag.
@HeyImAlex the latest version of AlexIT card does not need port forwarding. It works for me and I did not open any ports on my router.
Yes AlexITs card is a really great option for low CPU and real-time, there’s still nothing else with those 2 features other than the proxy cameras which no longer give me full frame rate. I’m starting to suspect that I might be having intermittent WiFi problems as my routers are rather old (holding out for WiFi 6e before making the upgrade).
Actually it does, your router just opened the port for you without telling you WebRTC uses a (random) UDP port it opens by itself, many consumer level routers with asymmetric cone style NAT will allow this through UDP hole punching, trading off convenience for security. But that won’t work with SSH reverse tunneled connections, as only a single port is forwarded. Your home IP will never be exposed to the remote device, so it can’t establish the UDP connection.
In the latest release of HA, the camera component has been moved to the “core” startup component so it changes when and how it starts at the boot up of HA. I have observed some weird behaviors of my camera components, crashing other integrations, most notably cloud ones, when a camera is offline as a result of it. As a reminder, I am have forked my owned version of HA to massively improve the camera stream handling from the ffmpeg component. (by using openCV’s ffmpeg instead of ffmpeg directly, then by using GPU decoding and finally having the camera return a numpy array to the video processing component instead of decoding, reincoding to jpeg and then redecoding to numpy). This has reduced the CPU loading by 80% per camera.
Wow I didn’t know that, thanks for the lesson. It’s less secure but the performance is hard to ignore. Was putting up with 4-5s lags until Alexit card, hard to go back.
Well technically it’s not that much of a security concern. The port is random and only open during streaming. So as long as you don’t stream 24/7, it’s not that bad. This is a common technique for other protocols too, like peer-to-peer stuff. But it just doesn’t work with certain more secure ways to remotely access HA (or only with a much more complex setup that adds maintenance and lag again). So it’s not a magic bullet, sadly.
Sorry to hear that the stream component isn’t working out for many of you. As discussed previously, the latency is a function of the HLS protocol. Glad to see alternatives out there like AlexxIT’s WebRTC addon - it looks pretty cool and seems to be working well for some people. The WebRTC protocol is probably your best option if latency is your main concern. Of course as with most protocols there are tradeoffs: Since the transport is not over http, the connection from HA to your client will be different than the rest of HA. Generally this is not an issue, and the transport is probably “faster” than going over http in most cases, but since the protocol allows for trying several different connection types, your experiences may differ depending on what network your client is on. Another thing to consider is that WebRTC is driven by Google, so it doesn’t and most likely won’t ever support H.265 which is the preferred codec used on today’s cameras (HLS supports H.265, but support on clients is also limited thanks to … Google). WebRTC also doesn’t support AAC audio, but it does support PCM codecs which HLS doesn’t support (why is everything so difficult?!). Also, fundamentally WebRTC is generally very secure, but there have been security issues related to the protocol (see e.g. https://www.rtcsec.com), and since there are more different parts involved with the add-on, the attack surface will be larger.
I do have a PR open to add LL-HLS to stream which should improve latency by quite a bit, but it has yet to be tested by anyone but me, so I’m not sure how well it will work across different hardware. The latency definitely won’t be as good as with WebRTC, but it might be sufficient for most. In addition to the tradeoffs mentioned above, there are actually some other advantages to the HLS protocol - since segments are being transferred instead of packets, problematic connections will probably see buffering instead of visual artifacts. And this is a more technical detail, but also because of segments vs packets, it can probably deal better with specialized codec variations with very low I-frame frequency such as H.264+/H.265+ (those codec variations can offer massive bandwidth savings).
A note (excuse?) about the pace of development - this is an open source project so contributors are really just other users working in their spare time. It’s hard to juggle our own things sometimes, and to get things done we have to submit PRs and reviews so we have to coordinate with other devs who are also juggling their own things (that’s why turnaround is generally much faster for custom components). Also, there are always issues that pop up, especially with a component like stream which is used on all types of setups/hardware. One more note - the slower speed of open source is compounded because the libraries we use are also open source, so if we identify an issue (or even if we submit a PR to address the issue) in a library, it may take months or even longer before the issue is addressed. To illustrate, to prepare for the aforementioned LL-HLS pull request I first had to wait for the hls.js and the ExoPlayer projects to be updated with LL-HLS client support. I then submitted this PR along with another one on the ExoPlayer repo back in March, and it has yet to be reviewed and merged. And that is already reasonably quick since the ExoPlayer project is driven by Google.
BTW, I’m not even a developer by trade (my first GitHub PR was in 2020) - I just had some programming background and decided to start contributing because there were a few things I wanted to fix in my HA setup. In this year and a half I’ve learned quite a bit about video streaming, networking, and programming, and I now maintain a custom component and am a codeowner on two core integrations (although I sorely need to update one of them). Anyway, maybe my experience can inspire someone else to get started contributing to HA - I highly recommend it.
Justin,
I think I speak for most people here when I say that you did a tremendous job on the stream component. The issues are indeed do to the underlying limitations of HLS, which was never meant to be a realtime streaming protocol. I think everybody is well aware of that. Your component is working great considering the technical constraints of trying to squeeze near RT streaming into HLS.
I’m excited about your work on LL-HLS ! While WebRTC is certainly the best choice in terms of latency, it does come with its own unique issues, as you mentioned. LL-HLS could be at an interesting in-between spot. I’d love to try out your PR, if you feel it is ready for testing.
I’d like to echo the comments above. My intention was never to bag on the HLS component and I hope it never came across that way. I still think the HLS component is by far the most reliable and easy way to start with cameras in hass and should be the starting point for most people. It just didn’t happen to fit my use case.
In this way, it’s good that we have options. I just wish they were all as good of quality as yours is, and were as vebose in their communication about what they are doing as AlexeyIT’s is. Then we could truly have an option for each situation and could effectively debug them. Hopefully we will get there eventually if people continue to maintain the other camera platforms.
I still wish we could have a “master” camera platform where you just choose some parameters and were certain that it was as up to date as the HLS platform is.
With LL-HLS, WebRTC, and MSE, I think we are now becoming quite modern with our camera handling, and I have great hope that between these options alone we can cover most use cases. But the more I mess with this stuff the more I realize that every option still has it’s plusses and minuses and there are so many factors coming into play in the transport, encoding, server, network, and endpoint layers that it will probably always be a bit of a black art getting everything working perfectly.
@HeyImAlex Thanks for the kind words. I wouldn’t call stream my component - @hunterjm is the original author. I’ve just made some contributions to it (@allenporter also did a large refactoring earlier this year in order to better support cameras with expiring tokens).
I agree that having different/overlapping ways to set up cameras can be confusing. I myself am not familiar with anything except generic (and the stream component, but that’s integrated into camera and not a camera platform). I find FFMPEG and MJPEG inefficient with processing and network resources, so I wouldn’t ever use those myself. I am sure some people find them useful for certain cases, like using ffmpeg for reducing lag, changing codecs, or scaling, but I would consider these advanced cases and there may be better ways to achieve the same goal. Anyway, given that they are fundamentally different components that were contributed by different developers and use different parameters (e.g. MJPEG uses MJPEG instead of RTSP streams), I don’t see a good way to put them under one umbrella (well, they are essentially under one umbrella now - the camera platform). I think documentation or user guides could be helpful - @scstraus maybe you could transfer some of your camera experiences from this post to a PR on the documentation repo.
On a tangential note, I think particularly with how seamlessly we are able to experience multimedia across our devices today, it’s easy for users to develop unrealistic expectations of video capabilities. Any tablet or phone from the past decade has been able to display smooth, full screen video. But what codecs does the device support? An older device most likely won’t play HEVC videos, for instance. Also, while they might seem similar, displaying full screen video is one thing, but paneling several feeds together is another thing. A device can only decode so many streams at once. Tablets are also misunderstood - I’ve seen some people complain about not being able to view their full resolution 8MP camera feeds on tablets. Guess what - many tablets have less capable GPUs than top end phones and many can’t decode video much higher than 1080p (2MP cameras are 1080p and most of today’s security cameras are way above that). Try opening a 4K video on your Android tablet. Network is another issue - MJPEG uses a lot of bandwidth, so it’s no surprise that a panel of high resolution MJPEG feeds doesn’t work reliably over WiFi. The point here is not to rant, but rather to communicate that there are some constraints that HA can’t work around, and that these constraints will come into play differently across different hardware/frontend setups/user requirements. Users will have to experiment to see what works for them, and they will need to understand that there are limitations and work around them accordingly. There’s no magic bullet from the developer side, I’m afraid.
As for LL-HLS, yes feel free to give the PR above a shot. You will need to drop the files from the PR into the correct folders on your HA install, and you will need to use the generic component like so:
You will need to be on 2021.5 for the frontend to be updated with support (although Android and iOS shouldn’t require this). The Android app will have the wrong starting position until the ExoPlayer PR I mentioned previously is merged, so you’ll need to drag the timeline forward a few seconds after starting the feed. Post on the GitHub PR if you have any questions.
I’ve considered doing that, but I feel like my experience is a bit to anecdotal to be useful for documentation. What would be excellent if someone who actually understood the workings of these components could verify some of my guesses as to why things are the way they are and those could go into the documentation. For now they are just observations with guesses as to what the causes might be. I haven’t been able to concretely prove almost any of my guesses yet… I just know that some things work better than others in my environment, no idea how much of that maps to a different environment or why/why not.
Some things you mention would be great candidates… What is the bandwidth usage of a stream in different camera components? Which options can be used to increase/decrease CPU usage? Which technologies use more or less resources on the frontend? Which browsers/OS’ have better support for which technologies? None of these things are things I know for sure, but I’d bet some of you devs do.
If you open a PR, I’d be happy to provide some input, and I’m sure others will chime in as well. Actually I think a lot of the people in this thread would be able to provide good input too. I am a bit busy these few weeks so I won’t have that much time (and I just got summoned to troubleshoot an old issue), but it might actually not take too much time if we have a few people working together.
Yeah, you could make a draft PR with something like that. I guess we have two different inputs to better documentation - you know what questions you want the answers to and we (hopefully) know the answers to some of them.
I was thinking of a simple analogy to compare the the mjpeg, ffmpeg, and stream (hls) components and came up with this. The video comes in from the camera and is encoded by hardware using a video codec like h.264. This is like a present that is packed nicely in a box and wrapped up in the box. In this example, demuxing the stream is like unwrapping the box and decoding the codec is like unpacking the box. The unwrapping and rewrapping is pretty easy, the unpacking is harder, and repacking is the hardest.
The box is easiest to transfer when it is still boxed up. We can transfer it when it is unboxed but then it’s an unruly mess that takes up a lot more space.
For the video feed to get from the camera to the client device, we have to transport it from the camera to HA to the client device.
For the mjpeg component, you have to access a mjpeg feed from the camera. mjpeg is much larger than regular video codecs because it doesn’t compress anything across time. The analogy is that the camera doesn’t give you a nice small box, it gives you a huge box that is not very efficient space wise. Depending on the camera, this might add a lot of processing load to the camera itself if it cannot do this encoding in hardware. So the downsides of this method are that it might load the camera CPU, and it also takes a lot of bandwidth, both from the camera to HA and from HA to the client. The upside of this method, using the analogy, is that since the box is packed so loosely, it’s very easy to unpack, so as long as the transport is OK the client should be able to open it easily.
The ffmpeg component is basically like a repacking station centered at the HA server. Using the analogy, it will unpack the box and repackage the box, but how it repackages everything depends on the options used. It usually repackages everything into an mjpeg stream, so it ends up with a similar result to the mjpeg component, only the work is done on the HA server rather than at the camera. The downside of this is high load on the HA server and high bandwidth from the HA server to the client.
The stream (hls) component is basically like a rewrapping station. It doesn’t do any transcoding (unboxing or reboxing in the analogy), so there is not much CPU overhead (although there will be more with ll-hls) and the size is small all the way to the client. The downside of this is that the client needs to be powerful enough/have the right codecs to decode the stream, which isn’t generally a problem but can be when devices are old/cheap or there are too many streams to decode (an aside - I don’t think it makes sense to have a panel of live streams - with HLS you might get too many streams to decode on the client, and with the other methods there will be high bandwidth usage. Using substreams may help. I don’t use a live view on my own setup.) The other downside of HLS is the latency, which has been discussed ad nauseam above.
WebRTC is similar to stream in that there’s no transcoding, but as discussed earlier the transport is different and the supported codecs are different for whatever reason. Besides the differences in codecs, the upside vs HLS is better latency, but the downside is less robustness to missing packets.
Sorry for the muddled analogy, it was just a shower thought and then it lost traction as I was typing. Anyway, hope this helps.
Okay, let’s give it a try. Rather than trying to open a PR with text I don’t know yet, I opened an issue here in order to get the data together so that I can compile the actual text to include in the PR. Anything I can get confirmed well enough to include will be brought into a PR I will write later.