In terms of actual video streaming (i.e. not mjpeg with high refresh rates), on basically everything other than iPhones (and older pre-iPadOS iPads), it is not hard to do better than normal HLS. The trick is the concept that LL-HLS calls partial segments, and CMAF calls chunks. These are basically much smaller fragments (moof+mdat
) that don’t necessary start with or contain a keyframe. Chunks can be as small as 1 frame’s worth of samples. For LL-HLS you want them bigger, since one HTTP request is required per LL-HLS “partial segment”, (unless doing fancy things with byte ranges). Officially chunks/“partial segments” are required to consist of a subset of the video frames from a full sized segment.
I’ve done some experimentation using a slightly modified version of the stream component, set up to generate chunks of 100ms in duration (or at least one frame of video slower than 10 fps), and send them to the browser over websockets. The browser uses Media Source Extensions to pass this into a video tag (which is why iPhones and older iPads won’t work, since apple deliberately disabled MSE on those devices). Using that, I was able to get latency in a browser that is lower than using VLC to watch the RTSP stream with default settings. (By default VLC uses a 1 second buffer). Under this technique, latency is also independent of key frame interval, which only influences how long it takes to load the stream.
My experimentation was only with 1 camera at a time, and the code I used is not really suitable for merging into the stream component, since I took the easy path of breaking HLS support while testing.
To avoid breaking HLS support, I would need to create both chunks and segments, which is needed for LL-HLS anyway. Per the new HLS spec, it is legal for a segment to consist of concatenated chunks [0], so this is not necessary particularly difficult.
To do this right we would just need to render to smaller segments (which we label as chunks) based only on time (ignoring keyframes). On top of that track when we would want a new full segment to begin. At that point, as soon as the next keyframe is seen, force a new chunk to start, even if it is “too soon” per the 100ms timeframe. Keep track of which chunks belong to a complete segment.
When requesting a full segment via HLS, just serve up the concatenation of the chunks that occurred in that segment (without the “initialization section” of each chunk of course). Later when LL-HLS support is added, the chunks would become the “partial segments”.
For a low latency websocket connection, we would simply push chunks as they are generated. The first chunk pushed would include the initialization segment, all others would omit it.
Footnotes:
[0] If one is using sidx
boxes, a top level index for the whole segment really ought to be made that points to the chunk level indexes, even though it is not strictly required.