Declarative Alerting via IaC
Last9 supports configuring alerts and notifications automatically using a Python-based SDK tool which takes care of infrastructure changes
Configurations for alerting and notifications for observability at scale are
hard to start, maintain and fix manually, just like provisioning infrastructure
at scale. With infrastructure changes, itβs important that the observability
stack also catch up with it to avoid the chances of issues because of a lack of
observability or black swarm events. Last9 has introduced.l9iac
tool to solve
the exact same problem.
Installationβ
Last9's IaC (Infrastructure as Code) tool is available as a Docker image, providing a consistent and isolated environment for automating entity creation and alert configuration.
-
Pull the Docker Image
docker pull last9system/iac:latest
The image is available on DockerHub.
-
Prepare Your Working Directory Create a directory containing:
- Your IaC YAML files
config.json
with your refresh tokens (see file structure)- Space for the state lock file
-
Run the Docker Container
docker run --name l9iac -d -v <local-path>:<container-path> last9system/iac:<version>
Example:
docker run -d -v /home/user/iac-files:/app/rules last9system/iac:2.4.2
π‘ Note: If using Docker Desktop, ensure file sharing is enabled for the volume mount.
-
Execute IaC Commands
docker exec -it <container-id> l9iac -mf <model-file-path> -c <config-file-path> <command>
Example:
docker exec -it bcdea6660fd4 l9iac -mf /app/rules/alert-rules.yaml -c /app/rules/config.json plan
Configuration File Structureβ
The IaC tool requires a config.json
file with the following structure:
{
"api_config": {
"read": {
"refresh_token": "<LAST9_API_READ_REFRESH_TOKEN>",
"api_base_url": "https://app.last9.io/api/v4",
"org": "<ORG_SLUG>"
},
"write": {
"refresh_token": "<LAST9_API_WRITE_REFRESH_TOKEN>",
"api_base_url": "https://app.last9.io/api/v4",
"org": "<ORG_SLUG>"
},
"delete": {
"refresh_token": "<LAST9_API_DELETE_REFRESH_TOKEN>",
"api_base_url": "https://app.last9.io/api/v4",
"org": "<ORG_SLUG>"
}
},
"state_lock_file_path": "state.lock" // Should be in the same directory as model_file and config_file
}
Important Notesβ
- The
refresh_token
values can be obtained from the API Access page in the Last9 dashboard (know more) - The
<ORG_SLUG>
can be obtained from the appβs URL:app.last9.io/v2/organizations/<ORG_SLUG>
- For on-premise Last9 setups, contact cs@last9.io to get the correct
api_base_url
- The
state_lock_file_path
should be accessible from the directory where you run the IaC commands
Quick Startβ
-
Create a YAML as per your alert rule configuration
Example: notification_service_am.yaml
# notification_service_am.yaml
entities:
- name: Notification Backend Alert Manager
type: service_alert_manager
data_source: prod-cluster
entity_class: alert-manager
external_ref: unqiue-slug-identifier
indicators:
- name: availability
query: count(sum by (job, taskid)(up{job !~ "ome.*"}) > 0) / count(sum by (job, taskid) (up{job=~".*vmagent.*", job !~ "ome.*"})) * 100
- name: loss_of_signal
query: 'absent(up{job !~ "ome.*"})'
alert_rules:
- name: Availability of notification service should not be less than 95%
description: The error rate (5xx / total requests) is what defines the availability, lower value means more degradation
indicator: availability
less_than: 99.5
severity: breach
bad_minutes: 3
total_minutes: 5
group_timeseries_notifications: false
annotations:
team: payments
description: Error Rate described as number of 5xx/throughput
runbook: https://notion.com/runbooks/payments/error_rates_fixing_strategies -
Prepare the configuration file for running the IaC tool
The configuration file has the following structure. It is a JSON file.
{
"api_config": {
"read": {
"refresh_token": "<LAST9_API_READ_REFRESH_TOKEN>",
"api_base_url": "https://app.last9.io/api/v4",
"org": "<ORG_SLUG>"
},
"write": {
"refresh_token": "<LAST9_API_WRITE_REFRESH_TOKEN>",
"api_base_url": "https://app.last9.io/api/v4",
"org": "<ORG_SLUG>"
},
"delete": {
"refresh_token": "<LAST9_API_DELETE_REFRESH_TOKEN>",
"api_base_url": "https://app.last9.io/api/v4",
"org": "<ORG_SLUG>"
}
},
"state_lock_file_path": "state.lock"
}- The
refresh_token
can be obtained from the API Access page from the Last9 dashboard. You need to haverefresh_tokens
for all 3 operations - read, write and delete as thel9iac
tool will perform all these 3 actions while applying the alert rules. - The
<ORG_SLUG>
is your organization's unique slug in Last9. It can be obtained from the API access page of Last9 dashboard.i - The default
api_base_url
ishttps://app.last9.io/api/v4
. If you are on an on-premise setup of Last9, contact cs@last9.io to get theapi_base_url
. - The
state_lock_file_path
is name of the file wherel9iac
will store the state lock of current alerting state(on the same lines of terraform state.lock).
- The
-
Run the following command to do a dry run for the changes
l9iac -mf notification_service_am.yaml -c config.json plan
-
Run the following command to apply the changes
l9iac -mf notification_service_am.yaml -c config.json apply
We will provision the GitOps flow that will run apply
command once changes are
merged to the master branch in the GitHub repo. Contact cs@last9.io for more
details.
Schemaβ
Here is the complete schema for generating the above .yaml
file:
Entitiesβ
Field | Type | Unique | Required | Description |
---|---|---|---|---|
name | string | false | true | Name of the entity (alert manager) |
type | string | false | true | Type of the entity |
external_ref | string | true | true | External reference for the entity, itβs a unique slug format identifier for each alert manager |
adhoc_filter | object | false | optional | List of common rule filters for the entity |
alert_rules | array | false | optional | List of alert rules for the entity |
data_source | string | false | optional | Data source |
data_source_id | string | false | optional | The ID of the data source |
description | string | false | optional | Description of the entity |
entity_class | string | false | optional | Denotes the class of the entity. Supported values: alert-manager |
indicators | array | false | optional | List of indicators for the entity |
labels | object | false | optional | List of key value pairs of group label names and values |
links | array | false | optional | List of links associated with the entity |
namespace | string | false | optional | The namespace of the entity |
notification_channels | string OR array | false | optional | List of notification channels applicable to the entity |
tags | array | false | optional | List of tags for the entity |
team | string | false | optional | The team that owns the entity |
tier | string | false | optional | Tier of the entity |
ui_readonly | boolean | false | optional | Disable any sort of edits to the alert group from the UI |
workspace | string | false | optional | Workspace of the entity |
Common Rule Filters (Adhoc Filters)β
Field | Type | Unique | Required | Description |
---|---|---|---|---|
labels | object | false | required | List of key value pairs of label names and values |
data_source | string | false | required | Defaults to entity's data source |
Alert Rulesβ
Field | Type | Unique | Required | Description |
---|---|---|---|---|
name | string | true | required | Rule name that describes the alert |
indicator | string | false | required | Name of the indicator |
bad_minutes | integer | false | required | Number of minutes the indicator must be in a bad state before alerting |
total_minutes | integer | false | required | Total number of minutes the indicator is sampled over |
description | string | true | optional | Description for an alert rule that is included in the alert payload |
expression | string | false | optional | Alert rule expression, to be used only for pattern-based alerts |
greater_than | number | false | optional | Alert triggers when the indicator value is greater than this |
greater_than_eq | number | false | optional | Alert triggers when the indicator value is greater than or equal to this |
less_than | number | false | optional | Alert triggers when the indicator value is less than this |
less_than_eq | number | false | optional | Alert triggers when the indicator value is less than or equal to this |
equal_to | number | false | optional | Alert triggers when the indicator value is equal to this |
not_equal | number | false | optional | Alert triggers when the indicator value is not equal to this |
group_timeseries_notifications | boolean | false | optional | If multiple impacted time series in an alert need to be grouped as one notification or not |
is_disabled | boolean | false | optional | Whether the alert is disabled or not |
label_filter | map/object | false | optional | Mapping of the variables present in the indicator query and their pattern for the alert rule |
mute | boolean | false | optional | If alert notifications need to be muted or not |
runbook | false | optional | Runbook link to be included in the alert payload | |
severity | string | false | optional | Can be a threat or breach |
Runbookβ
Field | Type | Unique | Required | Description |
---|---|---|---|---|
link | string | false | required | Runbook link to be included in the alert payload |
Indicatorsβ
Field | Type | Unique | Required | Description |
---|---|---|---|---|
name | string | true, uniqueness enforced at entity level | required | Name of the indicator |
query | string | false | required | PromQL query for the indicator |
data_source | string | false | optional | Data Source of the indicator (Last9) |
description | string | false | optional | Description of the indicator |
unit | string | false | optional | Unit of the indicator |
Linksβ
Field | Type | Unique | Required | Description |
---|---|---|---|---|
name | string | false | required | Display name of the link |
url | string | false | required | URL of the link |
Notification Channelsβ
Field | Type | Unique | Required | Description |
---|---|---|---|---|
name | string | false | required | Name of the notification channel |
type | string | false | required | Type of notification channel. Allowed values: Slack , Pagerduty , OpsGenie |
mention | string OR list (string) | false | optional | Only applicable to Slack. The user(s) to tag in the alert message |
severity | string | false | optional | Severity of the alerts sent through this channel. Allowed values: threat , breach |
Supported Macros by IaCβ
low_spike (tolerance, metric)
high_spike (tolerance, metric)
decreasing_changepoint (tolerance, metric)
increasing_changepoint (tolerance, metric)
increasing_trend (tolerance, metric)
decreasing_trend (tolerance, metric)
Troubleshootingβ
Please get in touch with us on Discord or Email if you have any questions.