Skip to the content.

Design distributed message broker(RabbitMQ) and message streaming platform(Kafka)

Requirements

Functional requirements

Non-functional requirements

Assumptions

128 byte/message * 1000 messages/second/producer * 10000 producers = 1280 MB/second

Data models and APIs

type Message struct {
  ID string // the uuid of the message
  Topic string // used for topic based message routing
  Labels map[string]string // used for content based message routing
  Content []byte // the message payload
}

func (Producer p) Publish(messageContent []byte, topic string, labels map[string]string) {
  p.Send(Message{
    ID: xxx,
    Topic: topic,
    Labels: labels,
    Content: messageContent,
  })
}

func (Consumer c) Consume(topic string, labels) []byte {
  message := c.Consume(topic, labels)
}

Architecture

Kafka Architecture

kafka-architecture

How does Kafka know which consumers subscribe to a specific topic

In summary, Kafka uses a combination of consumer groups, group coordinators, metadata, heartbeats, rebalancing, and offset tracking to manage subscriptions and dynamically deliver messages to the appropriate consumers within each group. This system ensures efficient and resilient message flow while adapting to changes in group membership and partition leadership.

RabbitMQ Architecture

rabbitmq-architecture

Data persistence

Delivery guarantees

reliable-transfer

How to guarantee at-least-once

How to guarantee at-most-once

How to guarantee exactly-once

Two-phase commit:

Scalability

rabbitmq-queue-partition

kafka-partition

Availability

Fault tolerance

Producer failure

Producer fails before sending message

Producer could have its own retry logic.

Producer fails after sending the message but before getting the ack

Could have a separate store for message status, so that the producer could check the message status asynchronously. The broker could update the message status asynchronously as well(RabbitMQ could update when message is consumed/deleted from the queue; Kafka could update the message status in batch when ZooKeeper updates the last consumed offset.)

Producer fails after receiving the ack

No need to retry, just resume after the restart.

Middleware side failure

Host down or network partition

In all above cases, both leader and follower could take the read/write requests. The only difference is the write request will be redirected to current leader. This is implemented by Raft algorithm already.

Process crashes

Consumer failure

Consumer crashes

References