Building livestreaming and chat services is hard, thankfully some very kind people have already done most of the legwork.
In part 1 we looked at some existing solutions, and the problems they pose. In this part we look at some of these services in a bit more detail to work out what they are good at, and what challenges we will need to overcome.
This is part two of a three part series:
In this course of doing all the research trialing the above, I stumbled across OvenMediaEngine. By itself it doesn’t do much, it focuses purely on the ingestion of a source and sending that to multiple outputs. However, as a foundation it seems to have all the requirements for the video streaming side of things.
Essentially all that was missing was a chat system, but given it didn’t come with a ready-made interface, that would be a problem for me to solve anyway.
OME out of the box supports several ingestion protocols, including:
- MPEG-2 TS
For this use case I’ll be using RTMPS as it’s the most widely supported. TLS termination can be done by either OME itself or by a reverse proxy in-front if you want to support use LetsEncrypt.
In order to reliably stream to different devices, you unfortunately need to support several protocols, because people can’t agree on which one to use.
By far the most important here for low latency is WebRTC, the protocol used for video calls. It offers near realtime latency with only a minimal delay for transcoding where needed. This works on most devices, but relies on a few non-standard ports and often needs ICE to perform firewall punching in order to establish a connection. OME also currently has experimental support for P2P distribution via WebRTC, although this already includes direct fallback in the event a peer cannot establish a connection.
In order to support a full array of devices though, HLS and MPEG-DASH are also required. These are buffered protocols that split the stream into multiple short chunks (usually 5 seconds), and publish an updating playlist telling clients where they can find the next chunk. Clients will usually buffer a few chunks ahead of what is displayed so there aren’t interruptions if there is a delay in fetching the next chunk. Obviously all this buffering server and client side leads to quite a noticeable delay in the stream, often around a minute or more.
In order to work smoothly on different internet connections, most streaming services offer multiple resolutions and frame rates. This ensures that clients can select a quality that is appropriate for their network connection and hardware capabilities, downgrading to a lower quality if either one is a bottleneck. Different output methods also need different encoding, something generally not provided by most sources.
OME supports this by transcoding all input streams into every required output stream format/codec as soon as it is ingested. The number of qualities here is something to watch, as this is very heavy on both server CPU and memory. GPU acceleration is supported, but I haven’t experimented with this at this time.
To go with OME, OMP provides a ready to use player library that supports all the features of OME. It comes with cross-platform support, and all the required signaling for WebRTC video streaming.
If you look into the self-hosted chat world for long, you’ll come across matrix. Matrix is a federated chat protocol based on JSON over HTTPS, and supports multiple media types including images, stickers, and VOIP. Quite importantly, matrix itself does not specify exactly what software you need to use. Whilst there are reference implementations, it is just a formal protocol for clients and servers to follow.
Each client in matrix talks to a primary homesever. There are several options to choose from here, all written differently depending on the use case. Whilst the spec does say which features need to be supported, it’s a bit hit and miss if a server actually provides those features, let alone enables them. In order for this all to work, the homserver needs to support appservices and guest access.
Guest access to matrix, whilst officially supported by the protocol, has very intermittent support. The official reference client, element, can be very finicky with when it decides to grant guest access, and when it will allow a guest to post to a room. This likely means we will need to use a different client for our system.
Cactus Comments is a small service built ontop of the matrix protocol, infact it’s even on this page! It has an appservice to manage the chat rooms, and a very basic client to create a guest account and interact with the chat room. Due to its compatibility, we can interact with the stream chat from any matrix client. Thanks to the appservice, the owner of the stream can have moderation rights to the room and can therefore moderate the chat as appropriate. Unfortunately due to its open nature, banning a user from the chat is a bit more difficult.
Now we know a bit more about the technologies involved, it’s time to start looking into how we can tie these all together and create our own streaming system.