Skip to the content.

Design Netflix or Youtube

Requirements

Functional requirements

Non-functional requirements

Assumptions

Data models

type Account struct {
  CustomerID string // unique identifier for the account
  Email string
  Password string // hashed string which stores the user logging password
  Name string
  PlanID string // indicates which plan the account is subscribed to
  Locale string // indicates the language used for i18n
  ...
}

type PaymentMethod struct {
  FullName string
  BillingAddress string
  CardNumber string // unique identifier of a payment method
  CVS string
  CustomerID string // linked to the account
}

type Billing struct {
  CustomerID string // which account the billing applies to
  Date Date // when the bill is filed
  Amount float32 // the amount has been charged
  CardNumber string // which card used of the payment
}
type Video struct {
  VideoID string // unique identifier of a video
  Title string
  Subtitle string
  Description string
  SeriesName string // only applies to series
  Episode int // only applies to series
  Likes int
  MediaURL string // the endpoint to access the media ?
  ...
}

type Comment struct {
  CustomerID string // the account who puts the comment
  VideoID string // to which video the comment is applied to
  Content string
}

type ViewHistorys struct {
  Email string // the unique id of account
  History [] struct {
    Video Video
    Date string
    WatchTime string
    ...
  }
}

Storage

Netflix uses MySQL cluster and Cassandra

Member Viewing History

Netflix uses Cassandra to store the viewing history.

Member Viewing History - First solution

RowKey: customerID Column: each column is the viewing history


Problems:

Improvements:

Second solution

Above improvements still have problems when user data is huge, either the Cassandra row data is huge or the large data could not fit into a single EVCache entry.

So the idea of LiveVH and CompressedVH is raised.


vh-rollup

Problems:

Improvements:

compressed-vh-chunks

There are more improvements have been done, please refer to this blog for more details.

viewing-datastore-rearchitecture

Comments

It might be a good idea to use NoSQL database to store comments as well since we have already been using wide column database for viewing history. So it would be easier for dev team to develop and maintain the same tech stack. If using wide column, then the rowKey could be the videoID, the columnKey could be customerID.

Comments - First solution

RowKey: videoID Column: timestamp + customerID


The same ideas with viewing history on the improvements.

Videos

Netflix uses S3 to store the initially uploaded videos and then transcode them to be pushed to remote CDNs. Object storage could be a good choice to store media files, and it also has REST API support.

How admin upload video to Netflix or users upload video to Youtube

Usually a movie is shot in 8K and could be several hundreds GBs or maybe using TB to measure it. A Youtube video could have several GBs as well. So there are few challenges here:

The evolution of the architecture to support the Big File Upload functionality can be summarized as follows ( this is ranging from early multi-tier web-applications to modern-day architecture).

When uploading a large file, the client side will usually do the following(more details could be found here):

For using S3, individual Amazon S3 objects can range in size from a minimum of 0 bytes to a maximum of 5 terabytes. A 4K movie is 100 GB on average, so it is feasible using S3 object storage.(Reference)

For large files, user could use multipart upload from S3(Reference)

How to upload large video files to thousands of CDN servers

cdn

When transfer the video, the same idea as large file upload could be used. (split into chunks and transfer in parallel)

The following two blogs describes how Netflix fills their videos to CDN servers:

How to stream/download large video files from CND servers

How to build personalized home page per user profile

How to auto switch between different resolutions while streaming

https://en.wikipedia.org/wiki/Adaptive_bitrate_streaming

How to remember last played time

Solution 1

How user search the video

Spark, collaborative filtering and content based filtering

recommendation

https://netflixtechblog.com/system-architectures-for-personalization-and-recommendation-e081aa94b5d8

Offline jobs

offline-jobs

Model training and batch computation of intermediate or final results could be processed in the offline mode. The result will be used at a later time either for subsequent online processing or direct presentation to the user.

Event and Data distribution

event-and-data-distribution

Netflix wants to collect as many user inputs as possible. Those events can be aggregated to be the base data for ML algorithms.

Recommendation results

recommendation-results

The recommendation computed results are stored in database or cache depends on the requirements.

References