Monitoring

Prometheus Support test cases

Install the Prometheus Operator (include a role and service account for it). For example:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus-operator
namespace: default
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus-operator
subjects:
  - kind: ServiceAccount
name: prometheus-operator
namespace: default
–
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus-operator
namespace: default
rules:
  - apiGroups:
      - extensions
resources:
      - thirdpartyresources
verbs: [""]
  - apiGroups:
      - apiextensions.k8s.io
resources:
      - customresourcedefinitions
verbs: [""]
  - apiGroups:
      - monitoring.coreos.com
resources:
      - alertmanagers
      - prometheuses
      - prometheuses/finalizers
      - servicemonitors
      - prometheusrules
      - podmonitors
verbs: [""]
  - apiGroups:
      - apps
resources:
      - statefulsets
verbs: [""]
  - apiGroups: [""]
resources:
      - configmaps
      - secrets
verbs: ["*"]
  - apiGroups: [""]
resources:
      - pods
verbs: [“list”, “delete”]
  - apiGroups: [""]
resources:
      - services
      - endpoints
verbs: [“get”, “create”, “update”]
  - apiGroups: [""]
resources:
      - nodes
verbs: [“list”, “watch”]
  - apiGroups: [""]
resources:
      - namespaces
verbs: [“list”, “watch”]

apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus-operator
  namespace: default
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: prometheus-operator
  name: prometheus-operator
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus-operator
  template:
    metadata:
      labels:
        app: prometheus-operator
    spec:
      containers:
        - args:
            - --kubelet-service=kube-system/kubelet
            - --config-reloader-image=quay.io/coreos/configmap-reload:v0.0.1
          image: quay.io/coreos/prometheus-operator:v0.36.0
          name: prometheus-operator
          ports:
            - containerPort: 8080
              name: http
          resources:
            limits:
              cpu: 200m
              memory: 100Mi
            requests:
              cpu: 100m
              memory: 50Mi
      securityContext:
        runAsNonRoot: true
        runAsUser: 65534
      serviceAccountName: prometheus-operator

Install a Service Monitor pointing to longhon-backend service by selecting app: longhorn-manager label. For example:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: longhorn-backend
labels:
team: backend
spec:
selector:
matchLabels:
app: longhorn-manager
namespaceSelector:
matchNames:
    - longhorn-system
endpoints:
  - port: manager

Install Prometheus (include a role and service account for it). Include the above service monitor in the Prometheus’s config. Expose to the Prometheus instance to outside using a service of type NodePort. For example:

apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
—
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus
rules:
  - apiGroups: [""]
resources:
      - nodes
      - services
      - endpoints
      - pods
    verbs: [“get”, “list”, “watch”]
  - apiGroups: [""]
resources:
      - configmaps
verbs: [“get”]
  - nonResourceURLs: ["/metrics", “/federate”]
verbs: [“get”]
—
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
  - kind: ServiceAccount
name: prometheus
namespace: default
—
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: prometheus
spec:
serviceAccountName: prometheus
serviceMonitorSelector:
matchLabels:
team: backend
—
apiVersion: v1
kind: Service
metadata:
name: prometheus
spec:
type: NodePort
ports:
  - name: web
port: 9090
protocol: TCP
targetPort: web
ports:
    - port: 9090
selector:
prometheus: prometheus

Find the prometheus service and access Prometheus web UI using the nodeIP and the port

	Test Scenario	Test Steps	Expected results
1	All the Metrics are present	Pre-requisite: 1. Prometheus is setup is done and Prometheus web UI is accessible.</pre Test Steps: 1. Go to Prometheus web UI.</pre 2. Verify the metrics are available.	The below metrics should be available: 1. longhorn_volume_capacity_bytes</pre 2. longhorn_volume_usage_bytes</pre 3. longhorn_node_status</pre 4. onghorn_instance_manager_cpu_requests_millicpu</pre 5. longhorn_instance_manager_cpu_usage_millicpu</pre 6. longhorn_instance_manager_memory_requests_bytes</pre 7. longhorn_instance_manager_memory_usage_bytes</pre 8. longhorn_manager_cpu_usage_millicpu</pre 9. longhorn_manager_memory_usage_bytes</pre 10. longhorn_disk_capacity_bytes</pre 11. longhorn_disk_usage_bytes</pre 12. longhorn_node_capacity_bytes</pre 13. longhorn_node_usage_bytes
2	longhorn_volume_capacity_bytes	Pre-requisite: 1. Prometheus is setup is done and Prometheus web UI is accessible.</pre Test Steps: 1. Create 4 volumes of different sizes. (2, 3, 4, 5 Gi)</pre 2. Attach 1st volume to a pod and write 1 Gi data into it.</pre 3. Attach 2nd volume to a pod and don’t write into.</pre 4. Leave the 3rd volume to the detached state.</pre 5. Attach the 4th volume to pod and write 1.5 Gi data into it. Detach the volume.</pre 6. Go to Prometheus web UI.</pre 7. Select `longhorn_volume_capacity_bytes` and execute.	1. All the volumes should be identified by Prometheus</pre 2. All the volumes should show the capacity as 2 Gi
3	longhorn_volume_usage_bytes	Pre-requisite: 1. Prometheus is setup is done and Prometheus web UI is accessible.</pre Test Steps: 1. Create 4 volumes of different sizes. (2, 3, 4, 5 Gi)</pre 2. Attach 1st volume to a pod and write 1 Gi data into it.</pre 3. Attach 2nd volume to a pod and don’t write into.</pre 4. Leave the 3rd volume to the detached state.</pre 5. Attach the 4th volume to pod and write 1.5 Gi data into it. Detach the volume.</pre 6. Go to Prometheus web UI.</pre 7. Select `longhorn_volume_usage_bytes` and execute.	1. All the volumes should be identified by Prometheus</pre 2. Volume-1 should show 1 Gi</pre 3. Volume-2 should show 0 Gi</pre 4. Volume-3 should show 0 Gi</pre 5. Volume-4 should show 1.5 Gi
4	longhorn_node_status	Pre-requisite: 1. Prometheus is setup is done and Prometheus web UI is accessible.</pre Test Steps: 1. Power down a node.</pre 2. Disable a node.</pre 3. Add a new node in the cluster.</pre 4. Delete a node from the cluster.</pre 5. Go to Prometheus web UI.</pre 6. Select `longhorn_node_status` and execute.	1. All the nodes should be identified by Prometheus and one node should be shown in 3 rows based on the condition - `mountpropagation, ready, schedulable`</pre 2. The correct status should be shown on Prometheus UI.
5	longhorn_instance_manager_cpu_requests_millicpu	Pre-requisite: 1. Prometheus is setup is done and Prometheus web UI is accessible.</pre Test Steps: 1. Create a volume and attach it to a pod.</pre 2. Write 1 Gi data into it.</pre 3. Set multiple recurring backup on the volume.</pre 4. Go to Prometheus web UI.</pre 5. Select `longhorn_instance_manager_cpu_requests_millicpu` and execute.	1. The reading of cpu_requests should go up for the attached instance manager.</pre 2. The reading of other instance managers should not get impacted.
6	longhorn_instance_manager_cpu_usage_millicpu	Pre-requisite: 1. Prometheus is setup is done and Prometheus web UI is accessible.</pre Test Steps: 1. Create a volume and attach it to a pod.</pre 2. Write 1 Gi data into it.</pre 3. Set multiple recurring backup on the volume.</pre 4. Go to Prometheus web UI.</pre 5. Select `longhorn_instance_manager_cpu_usage_millicpu` and execute.	1. The reading of cpu_usage should be shown correctly</pre 2. The reading of other instance managers should not get impacted.
7	longhorn_instance_manager_memory_requests_bytes	Pre-requisite: 1. Prometheus is setup is done and Prometheus web UI is accessible.</pre Test Steps: 1. Create a volume and attach it to a pod.</pre 2. Write 1 Gi data into it.</pre 3. Set multiple recurring backup on the volume.</pre 4. Go to Prometheus web UI.</pre 5. Select `longhorn_instance_manager_memory_requests_bytes` and execute.	1. The reading of memory_requests should go up for the attached instance manager.</pre 2. The reading of other instance managers should not get impacted.
8	longhorn_instance_manager_memory_usage_bytes	Pre-requisite: 1. Prometheus is setup is done and Prometheus web UI is accessible.</pre Test Steps: 1. Create a volume and attach it to a pod.</pre 2. Write 1 Gi data into it.</pre 3. Set multiple recurring backup on the volume.</pre 4. Go to Prometheus web UI.</pre 5. Select `longhorn_instance_manager_memory_usage_bytes` and execute.	1. The reading of memory_usage should go up for the attached instance manager.</pre 2. The reading of other instance managers should not get impacted.
9	longhorn_manager_cpu_usage_millicpu	Pre-requisite: 1. Prometheus is setup is done and Prometheus web UI is accessible.</pre Test Steps: 1. Create 3 volumes of different sizes.</pre 2. Attach 1st volume to a pod and write 1 Gi data into it.</pre 3. Leave the 2nd volume to the detached state.</pre 4. Attach the 3th volume to pod and write 1.5 Gi data into it. Attach the volume in maintenance mode.</pre 5. Set a recurring backup on volume 1st.</pre 6. Perform revert to snapshot with 3rd volume.</pre 7. Go to Prometheus web UI.</pre 8. Select `longhorn_manager_cpu_usage_millicpu` and execute.	1. Monitor the graph and the console on the Prometheus server, the cpu_usage should go up.
10	longhorn_manager_memory_usage_bytes	Pre-requisite: 1. Prometheus is setup is done and Prometheus web UI is accessible.</pre Test Steps: 1. Create 3 volumes of different sizes.</pre 2. Attach 1st volume to a pod and write 1 Gi data into it.</pre 3. Leave the 2nd volume to the detached state.</pre 4. Attach the 3th volume to pod and write 1.5 Gi data into it. Attach the volume in maintenance mode.</pre 5. Set a recurring backup on volume 1st.</pre 6. Perform revert to snapshot with 3rd volume.</pre 7. Try to make disk full of a node where `longhorn-manager` is running.</pre 8. Go to Prometheus web UI.</pre 9. Select `longhorn_manager_memory_usage_bytes` and execute.	1. Monitor the graph and the console on the Prometheus server, the memory_usage should go up.
11	longhorn_disk_capacity_bytes	Pre-requisite: 1. Prometheus is setup is done and Prometheus web UI is accessible.</pre Test Steps: 1. Create volumes and attach them to each node.</pre 2. Add an additional disk to all the nodes. (Different size)</pre 3. Write into the volumes.</pre 4. Power down a node.</pre 5. Disable a node.</pre 6. Add a new node in the cluster.</pre 7. Delete a node from the cluster.</pre 8. Go to Prometheus web UI.</pre 9. Select `longhorn_disk_capacity_bytes` and execute.	1. All the disks should be identified by Prometheus.</pre 2. All the disks should show the correct total size of the disks.
12	longhorn_disk_usage_bytes	Pre-requisite: 1. Prometheus is setup is done and Prometheus web UI is accessible.</pre Test Steps: 1. Create volumes and attach them to each node.</pre 2. Add an additional disk to all the nodes. (Different size)</pre 3. Write into the volumes.</pre 4. Power down a node.</pre 5. Disable a node.</pre 6. Add a new node in the cluster.</pre 7. Delete a node from the cluster.</pre 8. Go to Prometheus web UI.</pre 9. Select `longhorn_disk_usage_bytes` and execute.	1. All the disks should be identified by Prometheus.</pre 2. All the disks should show the occupied size of the disks.
13	longhorn_node_capacity_bytes	Pre-requisite: 1. Prometheus is setup is done and Prometheus web UI is accessible.</pre Test Steps: 1. Create volumes and attach them to each node.</pre 2. Add an additional disk to all the nodes. (Different size)</pre 3. Write into the volumes.</pre 4. Power down a node.</pre 5. Disable a node.</pre 6. Add a new node in the cluster.</pre 7. Delete a node from the cluster.</pre 8. Go to Prometheus web UI.</pre 9. Select `longhorn_node_capacity_bytes` and execute.	1. All the nodes should be identified by Prometheus.</pre 2. All the nodes should show the total capacity available of disks available.
14	longhorn_node_usage_bytes	Pre-requisite: 1. Prometheus is setup is done and Prometheus web UI is accessible.</pre Test Steps: 1. Create volumes and attach them to each node.</pre 2. Add an additional disk to all the nodes. (Different size)</pre 3. Write into the volumes.</pre 4. Power down a node.</pre 5. Disable a node.</pre 6. Add a new node in the cluster.</pre 7. Delete a node from the cluster.</pre 8. Go to Prometheus web UI.</pre 9. Select `longhorn_node_usage_bytes` and execute.	1. All the nodes should be identified by Prometheus</pre 2. All the nodes should show the occupied space on all disks attached to the node.

Note: More details can be found on https://longhorn.io/docs/1.2.2/monitoring/

[Edit]