Skip to the content.

Distributed monitoring system

This contains notes after the research on Prometheus. Google Monarch has a completely different design, see this article for more details.

Architecture

Prometheus server

PushGateway

AlertManager

TSDB within Prometheus Server

The following is based on V3 design. (Prometheus TSDB from scratch)

Good to know

Compare prometheus with others

This section is a summary of this doc

VS Graphite

VS Graphite Scope

Prometheus scrape data VS Graphite passively waits data to be sentref.

VS Graphite Data model

Graphite uses statsD aggregated data with dots-separate components in metrics name

stats.api-server.tracks.post.500 -> 93

Prometheus uses label(key-value) and preserve the instance as a dimension

api_server_http_requests_total{method="POST",handler="/tracks",status="500",instance="<sample1>"} -> 34
api_server_http_requests_total{method="POST",handler="/tracks",status="500",instance="<sample2>"} -> 28
api_server_http_requests_total{method="POST",handler="/tracks",status="500",instance="<sample3>"} -> 31

VS Graphite Storage

Graphite stores time series data on local disk in the Whisper format, an RRD-style database that expects samples to arrive at regular intervals.

Prometheus uses different storage mechanism mentioned in above.

VS InfluxDB

VS InfluxDB Scope

Kapacitor together with InfluxDB, as in combination they address the same problem space as Prometheus and the Alertmanager. Kapacitor’s scope is a combination of Prometheus recording rules, alerting rules, and the Alertmanager’s notification functionality.

VS InfluxDB Data model

InfluxDB has a second level of labels called fields, which are more limited in use. InfluxDB supports timestamps with up to nanosecond resolution, and float64, int64, bool, and string data types. Prometheus, by contrast, supports the float64 data type with limited support for strings, and millisecond resolution timestamps.

VS InfluxDB Storage

InfluxDB uses a variant of a log-structured merge tree for storage with a write ahead log, sharded by time.

Prometheus is append-only file per time series.

VS InfluxDB Architecture

Prometheus servers run independently of each other and only rely on their local storage for their core functionality: scraping, rule processing, and alerting. The open source version of InfluxDB is similar.

The commercial InfluxDB offering is, by design, a distributed storage cluster with storage and queries being handled by many nodes at once.

VS InfluxDB Summary

Where InfluxDB is better:

Where Prometheus is better:

VS OpenTSDB

VS OpenTSDB Scope

The same as mentioned in vs-graphite

VS OpenTSDB Data model

VS OpenTSDB Storage

OpenTSDB’s storage is implemented on top of Hadoop and HBase. This means that it is easy to scale OpenTSDB horizontally, but you have to accept the overall complexity of running a Hadoop/HBase cluster from the beginning.

Prometheus will be simpler to run initially, but will require explicit sharding once the capacity of a single node is exceeded.

References