The hunt
Hello everyone, hope you’ve been enjoying my poetry in the last few posts. Today I return to a more technical post. For the last couple of weeks, I’ve been dipping my toes around Prometheus as a tool for monitoring and alerting stuff I cannot discuss. This is what I will be writing about, Prometheus alert severity levels which are hard to find. At least harder than I wished.
If you want to skip the journey and go straight to the answer without scrolling you can click right here.
I like to think that there can be several levels of alerting and yet, it felt like there weren’t many out there for Prometheus users. Googling “Prometheus alert severity levels” returns hundreds of pages of nothing useful. No guide on what are the potential values you can set for your Prometheus alert levels.
After a couple of hours googling, coupled with some Github digging, I figured it out. I’m probably not the first to notice, but considering the time I spent finding a bunch of nothing, I might be the first one to put it in blog form. That or other posts got buried by the algorithm gods.
Putting together the list
The first two were quite easy. Literally, every blog post and example out there feature them. “critical” and “warning”. I found odd these would be the only two options so I dug a little further.
Later on, I decided to head back to the Prometheus website for another read-through of the documentation. Unsurprisingly, I found the definition of the alert severity field. However, the default value surprised me: “error”. That’s three but wait, there’s more.
The final value I found so far was in Github. In the prometheus/alertmanager repository. More precisely as part of some acceptance tests.
The final piece, at least for now, is “info”.
Case closed
There might be more values out there and I will update this post if I find them. Until then this I’m closing this strange case. I would love for the guys developing Prometheus to update their documentation so that we get all the levels in there without having to go for wild goose chases. Since it’s open-source I could even go ahead and add it myself. Wish me luck. Until then, let’s recap the values below.
TDLR; Prometheus alert severity levels
Here are the Prometheus alert severity levels I could gather by decreasing severity order:
- critical
- error
- warning
- info
Thanks for reading, don’t hesitate to share this with your friends and colleagues so that they avoid all this googling I did. That and also it makes the blog more visible to others that might need something in it. Go and have a great day!
EDIT: Returning to this post as after more time using Prometheus and some feedback to this post, it has become clear that the security levels are basically labels. You can choose to use the ones listed above but can also pick anything else that would be coherent in your context. Then you can bind these severity levels to whatever action you need through the tools processing your events.
There is also the severity: page (https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/).
Any idea when this is used over others?
Hi Dennis, from what I see in the doc it seems to be a severity level introduced in the latest (2.24) version of Prometheus. I hadn’t seen it before but looking it up could make for a nice appendix to this post. Thank you.
Slack is also a severity level.
Maybe just an oversight, or me not understanding, but:
Why is in your TL;DR the severity label “warn” instead of “warning”?
Good spot, fixed it. Thank you!