Prometheus at scale with Thanos

  1. 1. Data retention costs money, the more historical data you keep, the more SSD you need, and that gets expensive.
  2. 2. You scale prometheus by sharding, and the ensuing contortions you go through to get a unified view involves all sorts of madness with federation, grafana queries, and whatnot.
  3. 3. An HA setup inevitably runs into the entertainment associated with de-duplicating prometheus data.
  4. 4. If you run into more than one of the above, then whatever you do ends up being somewhat Rube-Goldberg-ish, with associated maintenance headaches, replicability, inefficiency, etc.
Enter Thanos, which integrates transparently with Prometheus, and pretty much takes care of all of the above issues, and remarkably elegantly at that. It consists of a bunch of modular components that can be snapped together to — transparently! — provide a unified view across your existing Prometheus setup, while — inexpensively! — letting you scale.
Unified View 
At the simplest level, it provides a “meta” service (the Querier) that sits in front of your existing prometheus servers. You can now hit this single end-point for everything — no more worrying about federation, no more grafana chaos (since a query can now “hit” multiple prometheus servers!), automatic de-duplication of data, oh, this is good stuff indeed!
(Technically speaking, you also add a sidecar to each prometheus server. The Querier and the sidecars all gossip away and figure out whats what)
Scaling
The same prometheus sidecars can also be configured to back their data up to GCS or S3. The cool thing is that if you now add the Store component in front of GCS/S3, then the Querier will — transparently! — hit that along with the existing prometheus servers, and you’ll never notice the difference. The point here being that you can reduce the retention period on your prometheus servers (object storage is way, way cheaper than the SSD on your prometheus instance!), and not worry about losing historical data — the Querier will happily get historical data from GCS/S3, and current data from prometheus for optimal Grafana goodness 😆
Downsampling
As a bonus, the Compactor component will trawl through GCS/S3, and happily compact and downsample all your historical data, for when you want to look over historical data over large time periods (“show me memory pressure over the last year!”). It does this without clobbering the raw data, which means you get the ability to drill down into the gory stuff at will, but without the latency associated with having to decompress a bajillion samples
It’s really, really good stuff, and it works remarkably well. For a much more detailed description, do check out this blog post. And for those who are interested, there is an excellent writeup on the architecture and design on GitHub.

Comments

Flávio Stutz said…
We've done something like this but using plain Prometheus in hierarchical federation for sharing scrapes and doing aggregations through federation to avoid too much metrics diversity (our deployment has thousand of servers and we have hundreds os apps). The storage is low because we need just to keep some history on the last aggregated level, which has low diversity. We've released the utility we created for configuring Prometheus dynamically at https://github.com/flaviostutz/promster

Popular posts from this blog

Cannonball Tree!