Skip to the content.

FOQS: Scaling a distributed priority queue

A fully managed, horizontally scalable, multitenant, persistent distributed priority queue built on top of sharded MySQL. See here for more details on designing a distributed delay queue.

Why FOQS

Facebook’s distributed systems and microservices benefit from running the workload asynchronously:

User stories

Design Details

Item

This is similar to Task from Design distributed delayed job queuing system. Facebook has the following properties:

Enqueue

enqueue

Dequeue

dequeue

ACK and NACK

If the ack or nack operations get lost for any reason, such as MySQL unavailability or crashes of FOQS nodes, the items will be considered for redelivery after their leases expire.

Pull vs Push

Facebook uses Pull model which is similar to designing a distributed delay queue.

Checkpoint

FOQS uses MySQL and looks like there is a background threads keep querying on MySQL to understand what items are ready to be delivered, and what items exceed the lease. This is so INEFFICIENT. In ths blog, it suggests using a lower bound to narrow down the query: WHERE <checkpoint> <= timestamp_column AND timestamp_column <= UNIX_TIMESTAMP().

Personal suggestion: Using conditional variable like what Java does for the delayed queue would be a good idea.

Disaster Readiness

References