Skip to the content.

Dynamo

Paper

What is Dynamo

A highly available key-value storage system that some Amazon’s core services use to provide an “always-on” experience. It does not mean to provide a fit-all solution.

Note: Dynamo is not the same as DynamoDB which is a leader based key-value database.

Why does Amazon need Dynamo

What are the requirements and assumptions

Design considerations

Design details

APIs and keyspace

Context has the information such as the version of the object.

Partitioning algorithm

Replication

From above diagram, if a key k is assigned to Node B and replicated to Node C and Node D. Nodes [B,C,D] are a preference list. Node B is called coordinator. In order to handle failure case, the preference list has more nodes than the replication factor.

Data versioning and conflict handling

Read and write flow

Client to node load balancing

Note: Facebook’s shard manager(Aka: Akkio) solves this in a generic way.

Quorum mechanism for consistency

Handle temporary failure: Hinted handoff

Handle permanent failure: Replica sync

It is possible that a coordinator node is permanently down before it sends out the data replication requests, and in this case data will be lost. Dynamo tries to mitigate this issue by using an anti-entropy process to keep replicas are in-sync.

Anti-entropy: A background process looks for diffs in the data between replicas, and copy the data from one to another.

Merkle tree: Since the replicas are distributed, how could we minimize the data transfer when comparing two data replicas? Dynamo uses merkle tree to solve the problem.

Membership and failure detection

Flow of membership change

As a result, each node knows the key range handled by its peers, so it could redirect the request to proper node directly.

Failure detection

The reason of having failure detection is that we do not want to send requests to those nodes which are down already.

Adding/Removing storage nodes