Real time interactions on live video
This note is based on a InfoQ talk about Streaming a Million Likes/Second: Real-Time Interactions on Live Video from Linkedin.
Traditional way to stream likes
The persistent connection is using HTTP Long Poll with “Server Sent Events”.
- Client sends
GETrequest withAccept: text/event-stream - Server responses
200 OKwithContent-Type: text/event-stream - Connection is established without closing
- Server sends
data:{"like", object}ordata:{"comment", object}to client
// Client lib
var evtSource = new EventSource("https://www.linkedin.com/realtime/connect");
evtSource.onmessage = function(e) {
var likeObj = JSON.parse(e.data);
}

Challenges
Tons of connections

Linkedin uses Akka and Play framework for connection management.
- Each Akka actor listens on the like events and manages one connection.
- Each Akka manages a mailbox(event queue) which holds the events to be published.

Subscriptions
We could not blindly broadcast the likes to all clients, because different users are watching different live videos.
- Maintain an in-memory subscription table.
- A client likes
Live Video 2, server could get all the connection IDs which are subscribed toLive Video 2. - Server just sends
likeevents to a subset of connections.

Scale to 10K or more viewers
- Add an abstraction between clients and backend dispatcher, known as
frontend server. Each frontend server handles a portion of connections. - Frontend server subscribes to
Live Video 2. - Dispatcher maintains an in-memory subscription table maps
Live Video 2tofrontend-server-x. - Frontend server maintains an in-memory subscription table maps
Live Video 2toconnection-1.

Dispatcher is the bottleneck
How to handle the 1000 likes per second?
- We could have multiple dispatcher nodes, and allow a balanced number of clients to be connected to dispatcher nodes.
- All
likescould be sent to any dispatcher nodes and render to clients. - This requires to pull out the
in-memorymapping table out to global key-value store.

Multi data centers
We don’t have subscribers for red-video from DC-1 and DC-2, but there are only subscribers from DC-3.

Cross data center subscriptions
- Frontend nodes subscribe to all dispatchers in all DCs.
- DC-1’s subscribe table is similar to this:
live-video-red: dc-3-front-1
live-video-green: dc-2-front-1
live-video-green: dc-1-front-1
...
likesof live-video-1 send to dispatcher in DC-1 and will be dispatched todc-3-front-1
Few points to keep in mind:
- Cross datacenter sub/un-sub might not update all subscribe tables at the same time (Data inconsistent)
- Some frontend nodes will not receive the
likes. - Some frontend nodes will still receive the
likeseven it un-sub the video.
- Some frontend nodes will not receive the
Publish likes to all data centers

likesare sent to dispatcher in DC-1.- Dispatcher in DC-1 will send the
likesto all other dispatchers in all DCs.
Few points to keep in mind:
- Might not have data inconsistency issue in terms of sub/un-sub.
- Have high volume of traffic sent to all DCs no matter there are subscription or not.
- In the case of DC-2, it will always receive
likesof live-video-1(the red one).
- In the case of DC-2, it will always receive