Skip to main content

Alerting and Notifications

All about Levitate's declarative alerting capabilities and notifications support

Levitate comes with complete monitoring support, including alerting and notification capabilities. Irrespective of your tool choice, a few problems plague today's alerting journey — coverage, fatigue, and cleanup. Unfortunately, there are no easy answers to these complex problems.

However, with advanced features like Pattern-based Alerting and a redesigned Alert Manager designed with High Cardinality in mind, Levitate helps you stay ahead.

In addition to being fully PromQL compatible, it provides features like a real-time alert monitor and historical health view. You can also perform advanced tasks, such as correlating them with events while focusing on the desired outcome of keeping up with constantly evolving infrastructure and Services.

Features

  • Fully PromQL compatible.
  • Full compatibility with Prometheus alertmanager, supports migrating alertmanager alerting configurations.
  • Automation of alerting using the IaC tool - a declarative way of defining alert rules.
  • Support for setting alert rules manually.
  • Real-time view for monitoring alerts.
  • 14 days Health view to understand the history of alerts.
  • Supports understanding why an alert fired using the "Explain alert" functionality.
  • Supports PagerDuty, Slack, and OpsGenie as notification channels and allows building powerful incident workflows using the notifications messages.

Alert Groups

Alert groups are collections of alert rules. Groups can indicate membership by team, service, infra component, etc. All the configured alert groups are under Alert Studio -> Alert Groups.

Alert Groups

Each alert group will show the configured alert rules and the ability to add more.

Alert Rule

Alert Monitor

Alerts that are firing indicate how a system performs at any given point. Levitate provides ways to get that information for the system's current state.

Alert Monitor highlights all alerts that are currently firing. Additionally, you can look back at all alerts firing in the last 15 minutes.

Alert Monitor

Clicking the alert under fire highlights the other details like rule configuration, threshold, behavior over the past 2 hours, and the ability to debug it further in Embedded Grafana or take actions via runbooks.

Alert Rule Configuration

System Health

For any time range within the last 14 days, the health tab gives an overview of broken alert rules, which time series (label sets) were under alert, and how long it impacted the system.

System Health

Configuring an Alert

Migrating from alertmanager

It is possible to migrate the alert rules YAML file compatible with Prometheus alertmanager to Levitate compatible alert configuration via an HTTP endpoint.

POST /v4/organizations/last9/entities/migrate/alertmanager

Request Body - The YAML config of alertmanager.

Response - The YAML config is compatible with Levitate.

Example:

# sample.yaml contains the above YAML
curl -XPOST https://gamma.last9.io/api/v4/organizations/last9/entities/migrate/alertmanager \
--data-binary @sample.yaml \
-H "X-LAST9-API-TOKEN: Bearer $TOKEN"

Response:

entities:
- name: payment service
type: alert-manager
external_ref: payment service-alert-manager-alert-manager
entity_class: alert-manager
indicators:
- name: "EXPR: HighRequestLatency - breach"
query: job:request_latency_seconds:mean5m{service="payment"} > 0.5
- name: "EXPR: HighRequestLatency - threat"
query: job:request_latency_seconds:mean5m{service="payment"} > 0.1
alert_rules:
- name: High request latency
indicator: "EXPR: HighRequestLatency - breach"
total_minutes: 10
bad_minutes: 10
greater_than: 0
- name: HighRequestLatency - threat
indicator: "EXPR: HighRequestLatency - threat"
total_minutes: 10
bad_minutes: 10
greater_than: 0

Notification Channels

Levitate supports sending alert notifications to PagerDuty, Slack, and OpsGenie. The notification channels can be configured from the alert rules configuration or via the YAML file.

Notification Channels

Notification Payloads

Pagerduty

Pagerduty fieldTypeDescription
payloadobject
payload.summarystringTitle for the incident
payload.timestamptimestampThe ending time of this alert, in ISO 8601 format
payload.severitystringcritical / warning for alerts marked as breach/threat in alert rule
payload.sourcestringDedup key for the incident
payload.componentstringEmpty
payload.groupstringDedup key for the incident
payload.classstringAlert Rule Type
payload.custom_detailsobjectDescribed below
routing_keystringPagerduty integration key
event_actionstring'trigger' for active notifications, 'resolve' for resolved notifications
dedup_keystringDedup key for the incident
clientstring"Last9 Dashboard"
client_urlstringLink to health dashboard for the alert in Levitate
linksarray of objectsEmpty array
imagesarray of objectsEmpty array

Custom Details

  • alert_condition - Condition set on alert. Static alerts, it is of the format.expr > 10 based on the threshold configured. For pattern-based alerts, it is of the format algo_type(tunable, expr). For example, for a high spike alert set with tunable 3, this would be high_spike(3, expr).
  • algo_type - Type of alert (static_threshold, increasing_changepoint etc.)
  • client_url - Link to the health dashboard for this alert on Levitate.
  • description - Description of the alert. If a description is provided while configuring the rule, it appears here. Otherwise, a default description based on the algorithm, indicator, and entity is shown.
  • start - Starting time of this alert, in ISO 8601 format
  • end- Ending time of this alert, in ISO 8601 format
  • expression - Name of the indicator.
  • entity_name - Entity name.
  • entity_type - Entity type.
  • entity_team - Entity team. Is None if not assigned.
  • entity_tier - Entity tier. Is None if not assigned.
  • entity_workspace - Entity workspace. Is None if not assigned.
  • entity_namespace - Entity namespace. Is None if not assigned.
  • severity - Severity of the alert (breach/ threat)
  • notification_call - Whether this alert is sent for the first time or repeated (first/ repeat)
  • runbook - Link to the runbook for this alert (has to be configured while setting up alert). This key is omitted if the runbook isn’t configured.
  • If the entity under alert has tags associated with it, they are included in custom details as tag_<tag_name> = true
  • time_in_alert - Duration for which this alert was observed. E.g., 8 in 10 minutes.

OpsGenie

Opsgenie fieldDescriptionType
messageTitle for the incidentstring
aliasDedup key for the incidentstring
descriptionDescription of the alert. If a description is provided while configuring the rule, it appears here. Otherwise, this field is omitted.string
tagsTags associated with the entityarray of strings
actions["Debug"]array of strings
detailsDescribed belowobject
entitynullstring
sourceLevitate Dashboardstring
noteA string description of the alert, along with the health dashboard link for the alertstring
respondersNot usedarray of objects
visibleToNot usedarray of objects
priorityNot usedstring
userNot usedstring

Details

  • alert_condition - Condition set on alert. Static alerts, it is of the format.expr > 10 based on the threshold configured. For anomaly alerts, it is of the format algo_name(tunable, expr). For example, for a high spike alert set with tunable 3, this would be high_spike(3, expr).
  • algorithm - Type of alert (static_threshold, increasing_changepoint etc.)
  • component - null
  • last9_dashboard - Link to the health dashboard for this alert.
  • expression - Name of the indicator.
  • service - Name and type of the entity.
  • source - Dedup key for this incident.
  • entity_name - Entity name.
  • entity_type - Entity type.
  • entity_team - Entity team. Is None if not assigned.
  • entity_tier - Entity tier. Is None if not assigned.
  • entity_workspace - Entity workspace. Is None if not assigned.
  • entity_namespace - Entity namespace. Is None if not assigned.
  • severity - Severity of the alert (breach/ threat)
  • notification_call - Whether this alert is sent for the first time or repeated (first/ repeat)
  • runbook - Link to the runbook for this alert (has to be configured while setting up alert). This key is omitted if the runbook isn’t configured.