Building on my previous learning exercises, I need to learn how to query Prometheus so that I can work with production systems running on Kubernetes. For large scale datasets, I find histograms to be an excellent tool for summarizing and visualizing throughput and latency. However, this is all a bit confusing and new to me in Prometheus.

## Summaries, Histograms, Oh My!

Prometheus has two metric types for histograms -- Summaries and Histograms.

### Summaries

Summaries are calculated client side, meaning the CPU must dedicated cycles to computing these values it could be serving your customers. Exact percentiles are precomputed and stored in a ready-to-use state in Prometheus. However, you can't calculate any new percentiles that weren't explicitly calculated ahead of time. You can think of a Summary as being stored inside Prometheus something like this:

You can see in the diagram that the metric name provided in the application code `http_response_duration_ms`

is stored with explicit labels for the exact percentile (called `quartiles`

). You can retrieve any of those time series with an instant selector, but you can't apply any new functions. Here's a sample PromQL for the median value from the diagram above:

```
http_response_duration_ms{quartile="0.5"}
```

Sadly, because the percentiles are all precomputed, you can't combine this value from multiple kubernetes pods at query time. Some other system would have to calculate a Summary ahead of time...so we're not going to talk about Summaries anymore.

### Histograms

Histograms are calculated server side, within Prometheus itself at query time. Prometheus stores histograms internally in buckets that have a max size (labeled `le`

), but no minimum size. You must configure the number and max size of each bucket ahead of time. Each bucket time series will contain the count of observations that was less than or equal to its `le`

value for a given timestamp.

Things look quite a bit different. Each bucket is stored in a separate metric suffixed with `_bucket`

and the maximum value for that time series is in the label `le`

. There is always a largest bucket with infinite maximum value `{le="Inf"}`

which will always have the same value as `_count`

. Because you can't meaningfully use the buckets directly AFAICT, you generally use the function histogram_quantile() to estimate whatever quantile you want. The accuracy will be based mostly on the sizes of buckets you choose in the client. The more and smaller the buckets, the more accurate but the more data required to store and compute values.

## Querying Histograms

Let's start playing with the `histogram_quantile()`

function. Here's a simple PromQL query against the histogram for the diagram:

```
histogram_quantile(.99, rate(http_response_duration_ms[1m]))
```

Even though this is the simplest usage of `histogram_quantile()`

, there's kind of a lot. Let's break it down:

`.99`

- this is the "rank" or the "percentile" being called in a range from 0 to 1`rate()`

- we'll talk more about this, but basically this converts a counter metric into a usable form`[1m]`

- we're aggregating at 1m intervals here

### Counters and `rate()`

If you read the docs, you'll see that counters are "monotonically" increasing. That means that a given metric name with a given set of labels will always increase, but the problem is graphing a line that always goes up just isn't very helpful. So, `rate`

will break down time slices, see the increase during the time slice, and make that the value. Let's take some observations at different times of a counter as an example:

Time | Counter Value |
---|---|

T1 | 3 |

T2 | 5 |

T3 | 7 |

T4 | 7 |

T5 | 17 |

So, if we apply `rate()`

to this time series, then the values will be changed like this:

Time | `rate(counter)` Value |
---|---|

T1 | 0 |

T2 | 2 |

T3 | 2 |

T4 | 0 |

T5 | 10 |

So, basically, because sending histograms to Prometheus generates a whole bunch of different counters (specifically the `_bucket`

metric names) we normally need to use `rate()`

first on the get usable data.

### Aggregating Histograms by Series

So, let's say you have a lot of pods running a service and you add labels to all your metrics that includes the pod. For this particular metric, you've also included the http status code. However, you now want to build a dashboard out of one metric showing the 95th percentile of something, for example `http_response_duration_ms`

. If you naively use the code I shared above you'll see a time series graphed for the combination of each unique pod and status code!!

Let's not do that. Instead try this:

```
histogram_quantile(.99, sum(rate(http_response_duration_ms[1m])) by (le, status_code))
```

The main thing that's new and different here is that `sum(..) by (..)`

will look for *any* series that share the same labels `{le=.., status_code=..}`

and add all their values together into a single series. Let's take an example using a single sample/timestamp across many series to show what happens:

le | pod | status_code | value |
---|---|---|---|

500 | pod1 | 200 | 5 |

500 | pod2 | 200 | 5 |

1000 | pod3 | 200 | 10 |

1000 | pod4 | 200 | 10 |

1000 | pod5 | 404 | 1 |

After running the promql above, the above series sum up any series that have the same `le`

and `status_code`

, so the end result looks like:

le | status_code | value |
---|---|---|

500 | 200 | 10 (5+5) |

1000 | 200 | 20 (10+10) |

1000 | 404 | 1 |

Now, you'll see one series per status code across your whole cluster. If we had any other labels (like `endpoint`

, `service`

, etc), those would also disappear after the aggregation above.

### Don't Forget Your `le`

!

Behind the scenes, at query time, the `histogram_quantile()`

function will *secretly* look for all labels of `{le=..}`

on the provided series. Before I understood this, I kept hitting errors saying:

No datapoints found.

Once you know that `histogram_quantile()`

requires the time series passed in to have the labels `{le=..}`

, this actually makes sense. This is why anytime we combine histogram bucket series, we must be careful to preserve the `le`

labels.

## Conclusion

Using histograms as a lens, we've dug deep into how Prometheus:

- expects data to come from client applications
- stores time series
- converts counters into usable time series using
`rate()`

- exposes histograms via
`histogram_quantile()`

function - allows us to combine different labels for a metric name

Hopefully, this lets you start doing interesting analysis of your production systems!