Monitoring

Prometheus Support test cases

  1. Install the Prometheus Operator (include a role and service account for it). For example:
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
    name: prometheus-operator
    namespace: default
    roleRef:
    apiGroup: rbac.authorization.k8s.io
    kind: ClusterRole
    name: prometheus-operator
    subjects:
    - kind: ServiceAccount name: prometheus-operator namespace: default
    – apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: prometheus-operator namespace: default rules:
    - apiGroups:
    - extensions resources:
    - thirdpartyresources verbs: [""]
    - apiGroups:
    - apiextensions.k8s.io resources:
    - customresourcedefinitions verbs: ["
    "]
    - apiGroups:
    - monitoring.coreos.com resources:
    - alertmanagers
    - prometheuses
    - prometheuses/finalizers
    - servicemonitors
    - prometheusrules
    - podmonitors verbs: [""]
    - apiGroups:
    - apps resources:
    - statefulsets verbs: ["
    "]
    - apiGroups: [""] resources:
    - configmaps
    - secrets verbs: ["*"]
    - apiGroups: [""] resources:
    - pods verbs: [“list”, “delete”]
    - apiGroups: [""] resources:
    - services
    - endpoints verbs: [“get”, “create”, “update”]
    - apiGroups: [""] resources:
    - nodes verbs: [“list”, “watch”]
    - apiGroups: [""] resources:
    - namespaces verbs: [“list”, “watch”]
apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus-operator
  namespace: default
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: prometheus-operator
  name: prometheus-operator
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus-operator
  template:
    metadata:
      labels:
        app: prometheus-operator
    spec:
      containers:
        - args:
            - --kubelet-service=kube-system/kubelet
            - --config-reloader-image=quay.io/coreos/configmap-reload:v0.0.1
          image: quay.io/coreos/prometheus-operator:v0.36.0
          name: prometheus-operator
          ports:
            - containerPort: 8080
              name: http
          resources:
            limits:
              cpu: 200m
              memory: 100Mi
            requests:
              cpu: 100m
              memory: 50Mi
      securityContext:
        runAsNonRoot: true
        runAsUser: 65534
      serviceAccountName: prometheus-operator
  1. Install a Service Monitor pointing to longhon-backend service by selecting app: longhorn-manager label. For example:
    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    metadata:
    name: longhorn-backend
    labels:
    team: backend
    spec:
    selector:
    matchLabels:
    app: longhorn-manager
    namespaceSelector:
    matchNames:
    - longhorn-system endpoints:
    - port: manager
  2. Install Prometheus (include a role and service account for it). Include the above service monitor in the Prometheus’s config. Expose to the Prometheus instance to outside using a service of type NodePort. For example:
    apiVersion: v1
    kind: ServiceAccount
    metadata:
    name: prometheus
    — apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: prometheus rules:
    - apiGroups: [""] resources:
    - nodes
    - services
    - endpoints
    - pods
    verbs: [“get”, “list”, “watch”]
    - apiGroups: [""] resources:
    - configmaps verbs: [“get”]
    - nonResourceURLs: ["/metrics", “/federate”] verbs: [“get”]
    — apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: prometheus roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: prometheus subjects:
    - kind: ServiceAccount name: prometheus namespace: default
    — apiVersion: monitoring.coreos.com/v1 kind: Prometheus metadata: name: prometheus spec: serviceAccountName: prometheus serviceMonitorSelector: matchLabels: team: backend
    — apiVersion: v1 kind: Service metadata: name: prometheus spec: type: NodePort ports:
    - name: web port: 9090 protocol: TCP targetPort: web ports:
    - port: 9090 selector: prometheus: prometheus
  3. Find the prometheus service and access Prometheus web UI using the nodeIP and the port
Test Scenario Test Steps Expected results
1 All the Metrics are present Pre-requisite:

1. Prometheus is setup is done and Prometheus web UI is accessible.</pre

Test Steps:

1. Go to Prometheus web UI.</pre
2. Verify the metrics are available.
The below metrics should be available:

1. longhorn_volume_capacity_bytes</pre
2. longhorn_volume_usage_bytes</pre
3. longhorn_node_status</pre
4. onghorn_instance_manager_cpu_requests_millicpu</pre
5. longhorn_instance_manager_cpu_usage_millicpu</pre
6. longhorn_instance_manager_memory_requests_bytes</pre
7. longhorn_instance_manager_memory_usage_bytes</pre
8. longhorn_manager_cpu_usage_millicpu</pre
9. longhorn_manager_memory_usage_bytes</pre
10. longhorn_disk_capacity_bytes</pre
11. longhorn_disk_usage_bytes</pre
12. longhorn_node_capacity_bytes</pre
13. longhorn_node_usage_bytes
2 longhorn_volume_capacity_bytes Pre-requisite:

1. Prometheus is setup is done and Prometheus web UI is accessible.</pre

Test Steps:

1. Create 4 volumes of different sizes. (2, 3, 4, 5 Gi)</pre
2. Attach 1st volume to a pod and write 1 Gi data into it.</pre
3. Attach 2nd volume to a pod and don’t write into.</pre
4. Leave the 3rd volume to the detached state.</pre
5. Attach the 4th volume to pod and write 1.5 Gi data into it. Detach the volume.</pre
6. Go to Prometheus web UI.</pre
7. Select longhorn_volume_capacity_bytes and execute.
1. All the volumes should be identified by Prometheus</pre
2. All the volumes should show the capacity as 2 Gi
3 longhorn_volume_usage_bytes Pre-requisite:

1. Prometheus is setup is done and Prometheus web UI is accessible.</pre

Test Steps:

1. Create 4 volumes of different sizes. (2, 3, 4, 5 Gi)</pre
2. Attach 1st volume to a pod and write 1 Gi data into it.</pre
3. Attach 2nd volume to a pod and don’t write into.</pre
4. Leave the 3rd volume to the detached state.</pre
5. Attach the 4th volume to pod and write 1.5 Gi data into it. Detach the volume.</pre
6. Go to Prometheus web UI.</pre
7. Select longhorn_volume_usage_bytes and execute.
1. All the volumes should be identified by Prometheus</pre
2. Volume-1 should show 1 Gi</pre
3. Volume-2 should show 0 Gi</pre
4. Volume-3 should show 0 Gi</pre
5. Volume-4 should show 1.5 Gi
4 longhorn_node_status Pre-requisite:

1. Prometheus is setup is done and Prometheus web UI is accessible.</pre

Test Steps:

1. Power down a node.</pre
2. Disable a node.</pre
3. Add a new node in the cluster.</pre
4. Delete a node from the cluster.</pre
5. Go to Prometheus web UI.</pre
6. Select longhorn_node_status and execute.
1. All the nodes should be identified by Prometheus and one node should be shown in 3 rows based on the condition - mountpropagation, ready, schedulable</pre
2. The correct status should be shown on Prometheus UI.
5 longhorn_instance_manager_cpu_requests_millicpu Pre-requisite:

1. Prometheus is setup is done and Prometheus web UI is accessible.</pre

Test Steps:

1. Create a volume and attach it to a pod.</pre
2. Write 1 Gi data into it.</pre
3. Set multiple recurring backup on the volume.</pre
4. Go to Prometheus web UI.</pre
5. Select longhorn_instance_manager_cpu_requests_millicpu and execute.
1. The reading of cpu_requests should go up for the attached instance manager.</pre
2. The reading of other instance managers should not get impacted.
6 longhorn_instance_manager_cpu_usage_millicpu Pre-requisite:

1. Prometheus is setup is done and Prometheus web UI is accessible.</pre

Test Steps:

1. Create a volume and attach it to a pod.</pre
2. Write 1 Gi data into it.</pre
3. Set multiple recurring backup on the volume.</pre
4. Go to Prometheus web UI.</pre
5. Select longhorn_instance_manager_cpu_usage_millicpu and execute.
1. The reading of cpu_usage should be shown correctly</pre
2. The reading of other instance managers should not get impacted.
7 longhorn_instance_manager_memory_requests_bytes Pre-requisite:

1. Prometheus is setup is done and Prometheus web UI is accessible.</pre

Test Steps:

1. Create a volume and attach it to a pod.</pre
2. Write 1 Gi data into it.</pre
3. Set multiple recurring backup on the volume.</pre
4. Go to Prometheus web UI.</pre
5. Select longhorn_instance_manager_memory_requests_bytes and execute.
1. The reading of memory_requests should go up for the attached instance manager.</pre
2. The reading of other instance managers should not get impacted.
8 longhorn_instance_manager_memory_usage_bytes Pre-requisite:

1. Prometheus is setup is done and Prometheus web UI is accessible.</pre

Test Steps:

1. Create a volume and attach it to a pod.</pre
2. Write 1 Gi data into it.</pre
3. Set multiple recurring backup on the volume.</pre
4. Go to Prometheus web UI.</pre
5. Select longhorn_instance_manager_memory_usage_bytes and execute.
1. The reading of memory_usage should go up for the attached instance manager.</pre
2. The reading of other instance managers should not get impacted.
9 longhorn_manager_cpu_usage_millicpu Pre-requisite:

1. Prometheus is setup is done and Prometheus web UI is accessible.</pre

Test Steps:

1. Create 3 volumes of different sizes.</pre
2. Attach 1st volume to a pod and write 1 Gi data into it.</pre
3. Leave the 2nd volume to the detached state.</pre
4. Attach the 3th volume to pod and write 1.5 Gi data into it. Attach the volume in maintenance mode.</pre
5. Set a recurring backup on volume 1st.</pre
6. Perform revert to snapshot with 3rd volume.</pre
7. Go to Prometheus web UI.</pre
8. Select longhorn_manager_cpu_usage_millicpu and execute.
1. Monitor the graph and the console on the Prometheus server, the cpu_usage should go up.
10 longhorn_manager_memory_usage_bytes Pre-requisite:

1. Prometheus is setup is done and Prometheus web UI is accessible.</pre

Test Steps:

1. Create 3 volumes of different sizes.</pre
2. Attach 1st volume to a pod and write 1 Gi data into it.</pre
3. Leave the 2nd volume to the detached state.</pre
4. Attach the 3th volume to pod and write 1.5 Gi data into it. Attach the volume in maintenance mode.</pre
5. Set a recurring backup on volume 1st.</pre
6. Perform revert to snapshot with 3rd volume.</pre
7. Try to make disk full of a node where longhorn-manager is running.</pre
8. Go to Prometheus web UI.</pre
9. Select longhorn_manager_memory_usage_bytes and execute.
1. Monitor the graph and the console on the Prometheus server, the memory_usage should go up.
11 longhorn_disk_capacity_bytes Pre-requisite:

1. Prometheus is setup is done and Prometheus web UI is accessible.</pre

Test Steps:

1. Create volumes and attach them to each node.</pre
2. Add an additional disk to all the nodes. (Different size)</pre
3. Write into the volumes.</pre
4. Power down a node.</pre
5. Disable a node.</pre
6. Add a new node in the cluster.</pre
7. Delete a node from the cluster.</pre
8. Go to Prometheus web UI.</pre
9. Select longhorn_disk_capacity_bytes and execute.
1. All the disks should be identified by Prometheus.</pre
2. All the disks should show the correct total size of the disks.
12 longhorn_disk_usage_bytes Pre-requisite:

1. Prometheus is setup is done and Prometheus web UI is accessible.</pre

Test Steps:

1. Create volumes and attach them to each node.</pre
2. Add an additional disk to all the nodes. (Different size)</pre
3. Write into the volumes.</pre
4. Power down a node.</pre
5. Disable a node.</pre
6. Add a new node in the cluster.</pre
7. Delete a node from the cluster.</pre
8. Go to Prometheus web UI.</pre
9. Select longhorn_disk_usage_bytes and execute.
1. All the disks should be identified by Prometheus.</pre
2. All the disks should show the occupied size of the disks.
13 longhorn_node_capacity_bytes Pre-requisite:

1. Prometheus is setup is done and Prometheus web UI is accessible.</pre

Test Steps:

1. Create volumes and attach them to each node.</pre
2. Add an additional disk to all the nodes. (Different size)</pre
3. Write into the volumes.</pre
4. Power down a node.</pre
5. Disable a node.</pre
6. Add a new node in the cluster.</pre
7. Delete a node from the cluster.</pre
8. Go to Prometheus web UI.</pre
9. Select longhorn_node_capacity_bytes and execute.
1. All the nodes should be identified by Prometheus.</pre
2. All the nodes should show the total capacity available of disks available.
14 longhorn_node_usage_bytes Pre-requisite:

1. Prometheus is setup is done and Prometheus web UI is accessible.</pre

Test Steps:

1. Create volumes and attach them to each node.</pre
2. Add an additional disk to all the nodes. (Different size)</pre
3. Write into the volumes.</pre
4. Power down a node.</pre
5. Disable a node.</pre
6. Add a new node in the cluster.</pre
7. Delete a node from the cluster.</pre
8. Go to Prometheus web UI.</pre
9. Select longhorn_node_usage_bytes and execute.
1. All the nodes should be identified by Prometheus</pre
2. All the nodes should show the occupied space on all disks attached to the node.

Note: More details can be found on https://longhorn.io/docs/1.2.2/monitoring/

[Edit]