Skip to main content

Setup Kubernetes monitoring using kube-state-metrics(KSM) and Prometheus Agent

Step by step guide to enable ingesting Kubernetes metrics via Prometheus Agent and send to Levitate via remote write.

Pre-requisites

  1. Ensure that your kubectl configuration is pointing to the right Kubernetes cluster
  2. Create a Levitate cluster by following Quick start guide

What is kube-state-metrics(KSM)

kube-state-metrics (KSM) is a simple service that listens to the Kubernetes API server and generates metrics about the state of the objects. It is not focused on the health of the individual Kubernetes components but on the health of the various objects inside, such as deployments, nodes, and pods.

The metrics are exported by default to the port's HTTP endpoint /metrics on port 8080. They are served as plaintext. They are designed to be consumed either by Prometheus itself or by a scraper compatible with scraping a Prometheus client endpoint. You can also open /metrics in a browser to see the raw metrics. Note that the metrics exposed on the /metrics endpoint reflect the current state of the Kubernetes cluster. When Kubernetes objects are deleted, they are no longer visible on the /metrics endpoint.

tip

The documentation for the metrics exposed by KSM can be found here.

Automated installation (Preferred)

Step 1: Copy the installation command

Step 2: Run the installation command

Before running the command, update it to use the write token of the Levitate cluster.

<Your Token Key>

Running the command will download the manifest yaml in the current working directory. It is strongly recommended that you check the manifest file in git so that it can be extended later.

You can just follow the video to see the end-to-end setup.

Manual Installation

  1. Clone the GitHub repo

    git clone https://github.com/kubernetes/kube-state-metrics.git
  2. Deployment steps

    To deploy this project, you can simply run kubectl apply -f examples/standard, and a Kubernetes service and deployment will be created.

    kubectl apply -f examples/standard

    Read for more details on deployment here.

  3. Validate corresponding deployment

    kubectl get deployments kube-state-metrics -n kube-system

    This is the sample output that you should see.

    NAME                 READY   UP-TO-DATE   AVAILABLE   AGE
    kube-state-metrics 1/1 1 1 6d1h

Configure remote write to Levitate

If you already have a running Prometheus setup, add the attached scrape configs, and remote write setup to your Prometheus config file to send data to Levitate.

# prometheus.yaml

scrape_configs:
- job_name: "node-exporter"
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_endpoints_name]
regex: "node-exporter"
action: keep

- job_name: "kubernetes-apiservers"
kubernetes_sd_configs:
- role: endpoints
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels:
[
__meta_kubernetes_namespace,
__meta_kubernetes_service_name,
__meta_kubernetes_endpoint_port_name,
]
action: keep
regex: default;kubernetes;https

- job_name: "kubernetes-nodes"
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics

- job_name: "kubernetes-pods"
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels:
[__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name

- job_name: "kube-state-metrics"
static_configs:
- targets: ["kube-state-metrics.kube-system.svc.cluster.local:8080"]

- job_name: "kubernetes-cadvisor"
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor

- job_name: "kubernetes-service-endpoints"
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels:
[__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels:
[__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels:
[__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_name

remote_write:
- url: <Levitate_cluster_remote_write_url>
remote_timeout: 60s
queue_config:
capacity: 10000
max_samples_per_send: 3000
batch_send_deadline: 20s
min_shards: 4
max_shards: 200
min_backoff: 100ms
max_backoff: 10s
basic_auth:
username: <Levitate_Cluster_Id>
password: <Levitate_Cluster_Write_Token>
  • Replace the cluster variable in external_labels as per the description
external_labels:
# TODO - replace xyz.acme.io with a logical name for the cluster being scraped.
# by Prometheus e.g. prod1.xyz.com
cluster: "xyz.acme.io"
tip

If you do not have a Prometheus setup, you can setup vmagent as well.

Steps to uninstall the KSM setup

For automated setup

To uninstall the Kubernetes resources that were created by the automated installation, you can use the kubectl delete command with the -f flag pointing to the same YAML file. This will delete all the resources defined in the file.

kubectl delete -f kube-state-metrics.yml

This command will remove the namespaces, deployments, services, service accounts, and any other resources defined in the kube-state-metrics.yml file.

For manual setup

Delete the created kube-state-metrics objects.

kubectl delete -f examples/standard