Alerting and Notifications
All about Levitate's declarative alerting capabilities and notifications support
Levitate comes with complete monitoring support, including alerting and notification capabilities. Irrespective of your tool choice, a few problems plague today's alerting journey — coverage, fatigue, and cleanup. Unfortunately, there are no easy answers to these complex problems.
However, with advanced features like Pattern-based Alerting and a redesigned Alert Manager designed with High Cardinality in mind, Levitate helps you stay ahead.
In addition to being fully PromQL compatible, it provides features like a real-time alert monitor and historical health view. You can also perform advanced tasks, such as correlating them with events while focusing on the desired outcome of keeping up with constantly evolving infrastructure and Services.
Features
- Fully PromQL compatible.
- Full compatibility with Prometheus alertmanager, supports migrating alertmanager alerting configurations.
- Automation of alerting using the IaC tool - a declarative way of defining alert rules.
- Support for setting alert rules manually.
- Real-time view for monitoring alerts.
- 14 days Health view to understand the history of alerts.
- Supports understanding why an alert fired using the "Explain alert" functionality.
- Supports PagerDuty, Slack, and OpsGenie as notification channels and allows building powerful incident workflows using the notifications messages.
Alert Groups
Alert groups are collections of alert rules. Groups can indicate membership by team, service, infra component, etc. All the configured alert groups are under Alert Studio -> Alert Groups.
Each alert group will show the configured alert rules and the ability to add more.
Alert Monitor
Alerts that are firing indicate how a system performs at any given point. Levitate provides ways to get that information for the system's current state.
Alert Monitor highlights all alerts that are currently firing. Additionally, you can look back at all alerts firing in the last 15 minutes.
Clicking the alert under fire highlights the other details like rule configuration, threshold, behavior over the past 2 hours, and the ability to debug it further in Embedded Grafana or take actions via runbooks.
System Health
For any time range within the last 14 days, the health tab gives an overview of broken alert rules, which time series (label sets) were under alert, and how long it impacted the system.
Migrating from alertmanager
It is possible to migrate the alert rules YAML file compatible with Prometheus alertmanager to Levitate compatible alert configuration via an HTTP endpoint.
POST /v4/organizations/last9/entities/migrate/alertmanager
Request Body - The YAML config of alertmanager.
Response - The YAML config is compatible with Levitate.
Example:
# sample.yaml contains the above YAML
curl -XPOST https://gamma.last9.io/api/v4/organizations/last9/entities/migrate/alertmanager \
--data-binary @sample.yaml \
-H "X-LAST9-API-TOKEN: Bearer $TOKEN"
Response:
entities:
- name: payment service
type: alert-manager
external_ref: payment service-alert-manager-alert-manager
entity_class: alert-manager
indicators:
- name: "EXPR: HighRequestLatency - breach"
query: job:request_latency_seconds:mean5m{service="payment"} > 0.5
- name: "EXPR: HighRequestLatency - threat"
query: job:request_latency_seconds:mean5m{service="payment"} > 0.1
alert_rules:
- name: High request latency
indicator: "EXPR: HighRequestLatency - breach"
total_minutes: 10
bad_minutes: 10
greater_than: 0
- name: HighRequestLatency - threat
indicator: "EXPR: HighRequestLatency - threat"
total_minutes: 10
bad_minutes: 10
greater_than: 0
Notification Channels
Levitate supports sending alert notifications to PagerDuty, Slack, and OpsGenie. The notification channels can be configured from the alert rules configuration or via the YAML file.
Notification Payloads
Pagerduty
Pagerduty field | Type | Description | |
---|---|---|---|
payload | object | ||
payload.summary | string | Title for the incident | |
payload.timestamp | timestamp | The ending time of this alert, in ISO 8601 format | |
payload.severity | string | critical / warning for alerts marked as breach/threat in alert rule | |
payload.source | string | Dedup key for the incident | |
payload.component | string | Empty | |
payload.group | string | Dedup key for the incident | |
payload.class | string | Alert Rule Type | |
payload.custom_details | object | Described below | |
routing_key | string | Pagerduty integration key | |
event_action | string | 'trigger' for active notifications, 'resolve' for resolved notifications | |
dedup_key | string | Dedup key for the incident | |
client | string | "Last9 Dashboard" | |
client_url | string | Link to health dashboard for the alert in Levitate | |
links | array of objects | Empty array | |
images | array of objects | Empty array |
Custom Details
alert_condition
- Condition set on alert. Static alerts, it is of the format.expr > 10
based on the threshold configured. For pattern-based alerts, it is of the formatalgo_type(tunable, expr)
. For example, for a high spike alert set with tunable 3, this would behigh_spike(3, expr)
.algo_type
- Type of alert (static_threshold
,increasing_changepoint
etc.)client_url
- Link to the health dashboard for this alert on Levitate.description
- Description of the alert. If a description is provided while configuring the rule, it appears here. Otherwise, a default description based on the algorithm, indicator, and entity is shown.start
- Starting time of this alert, in ISO 8601 formatend
- Ending time of this alert, in ISO 8601 formatexpression
- Name of the indicator.entity_name
- Entity name.entity_type
- Entity type.entity_team
- Entity team. IsNone
if not assigned.entity_tier
- Entity tier. IsNone
if not assigned.entity_workspace
- Entity workspace. IsNone
if not assigned.entity_namespace
- Entity namespace. IsNone
if not assigned.severity
- Severity of the alert (breach
/threat
)notification_call
- Whether this alert is sent for the first time or repeated (first
/repeat
)runbook
- Link to the runbook for this alert (has to be configured while setting up alert). This key is omitted if the runbook isn’t configured.- If the entity under alert has
tags
associated with it, they are included in custom details astag_<tag_name>
=true
time_in_alert
- Duration for which this alert was observed. E.g., 8 in 10 minutes.
OpsGenie
Opsgenie field | Description | Type |
---|---|---|
message | Title for the incident | string |
alias | Dedup key for the incident | string |
description | Description of the alert. If a description is provided while configuring the rule, it appears here. Otherwise, this field is omitted. | string |
tags | Tags associated with the entity | array of strings |
actions | ["Debug"] | array of strings |
details | Described below | object |
entity | null | string |
source | Levitate Dashboard | string |
note | A string description of the alert, along with the health dashboard link for the alert | string |
responders | Not used | array of objects |
visibleTo | Not used | array of objects |
priority | Not used | string |
user | Not used | string |
Details
alert_condition
- Condition set on alert. Static alerts, it is of the format.expr > 10
based on the threshold configured. For anomaly alerts, it is of the formatalgo_name(tunable, expr)
. For example, for a high spike alert set with tunable 3, this would behigh_spike(3, expr)
.algorithm
- Type of alert (static_threshold
,increasing_changepoint
etc.)component
-null
last9_dashboard
- Link to the health dashboard for this alert.expression
- Name of the indicator.service
- Name and type of the entity.source
- Dedup key for this incident.entity_name
- Entity name.entity_type
- Entity type.entity_team
- Entity team. IsNone
if not assigned.entity_tier
- Entity tier. IsNone
if not assigned.entity_workspace
- Entity workspace. IsNone
if not assigned.entity_namespace
- Entity namespace. IsNone
if not assigned.severity
- Severity of the alert (breach
/threat
)notification_call
- Whether this alert is sent for the first time or repeated (first
/repeat
)runbook
- Link to the runbook for this alert (has to be configured while setting up alert). This key is omitted if the runbook isn’t configured.