Prometheus Support test cases
- Install the Prometheus Operator (include a role and service account for it). For example:
apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: prometheus-operator namespace: default roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: prometheus-operator subjects:
- kind: ServiceAccount name: prometheus-operator namespace: default
– apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: prometheus-operator namespace: default rules:
- apiGroups:
- extensions resources:
- thirdpartyresources verbs: [""]
- apiGroups:
- apiextensions.k8s.io resources:
- customresourcedefinitions verbs: [""]
- apiGroups:
- monitoring.coreos.com resources:
- alertmanagers
- prometheuses
- prometheuses/finalizers
- servicemonitors
- prometheusrules
- podmonitors verbs: [""]
- apiGroups:
- apps resources:
- statefulsets verbs: [""]
- apiGroups: [""] resources:
- configmaps
- secrets verbs: ["*"]
- apiGroups: [""] resources:
- pods verbs: [“list”, “delete”]
- apiGroups: [""] resources:
- services
- endpoints verbs: [“get”, “create”, “update”]
- apiGroups: [""] resources:
- nodes verbs: [“list”, “watch”]
- apiGroups: [""] resources:
- namespaces verbs: [“list”, “watch”]
apiVersion: v1 kind: ServiceAccount metadata: name: prometheus-operator namespace: default --- apiVersion: apps/v1 kind: Deployment metadata: labels: app: prometheus-operator name: prometheus-operator namespace: default spec: replicas: 1 selector: matchLabels: app: prometheus-operator template: metadata: labels: app: prometheus-operator spec: containers: - args: - --kubelet-service=kube-system/kubelet - --config-reloader-image=quay.io/coreos/configmap-reload:v0.0.1 image: quay.io/coreos/prometheus-operator:v0.36.0 name: prometheus-operator ports: - containerPort: 8080 name: http resources: limits: cpu: 200m memory: 100Mi requests: cpu: 100m memory: 50Mi securityContext: runAsNonRoot: true runAsUser: 65534 serviceAccountName: prometheus-operator
- Install a Service Monitor pointing to
longhon-backend
service by selectingapp: longhorn-manager
label. For example:apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: longhorn-backend labels: team: backend spec: selector: matchLabels: app: longhorn-manager namespaceSelector: matchNames:
- longhorn-system endpoints:
- port: manager - Install Prometheus (include a role and service account for it). Include the above
service monitor
in the Prometheus’s config. Expose to the Prometheus instance to outside using a service of type NodePort. For example:apiVersion: v1 kind: ServiceAccount metadata: name: prometheus
— apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: prometheus rules:
- apiGroups: [""] resources:
- nodes
- services
- endpoints
- pods
verbs: [“get”, “list”, “watch”]
- apiGroups: [""] resources:
- configmaps verbs: [“get”]
- nonResourceURLs: ["/metrics", “/federate”] verbs: [“get”]
— apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: prometheus roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: prometheus subjects:
- kind: ServiceAccount name: prometheus namespace: default
— apiVersion: monitoring.coreos.com/v1 kind: Prometheus metadata: name: prometheus spec: serviceAccountName: prometheus serviceMonitorSelector: matchLabels: team: backend
— apiVersion: v1 kind: Service metadata: name: prometheus spec: type: NodePort ports:
- name: web port: 9090 protocol: TCP targetPort: web ports:
- port: 9090 selector: prometheus: prometheus - Find the
prometheus
service and access Prometheus web UI using the nodeIP and the port
Test Scenario | Test Steps | Expected results | |
---|---|---|---|
1 | All the Metrics are present | Pre-requisite: 1. Prometheus is setup is done and Prometheus web UI is accessible.</pre Test Steps: 1. Go to Prometheus web UI.</pre 2. Verify the metrics are available. |
The below metrics should be available: 1. longhorn_volume_capacity_bytes</pre 2. longhorn_volume_usage_bytes</pre 3. longhorn_node_status</pre 4. onghorn_instance_manager_cpu_requests_millicpu</pre 5. longhorn_instance_manager_cpu_usage_millicpu</pre 6. longhorn_instance_manager_memory_requests_bytes</pre 7. longhorn_instance_manager_memory_usage_bytes</pre 8. longhorn_manager_cpu_usage_millicpu</pre 9. longhorn_manager_memory_usage_bytes</pre 10. longhorn_disk_capacity_bytes</pre 11. longhorn_disk_usage_bytes</pre 12. longhorn_node_capacity_bytes</pre 13. longhorn_node_usage_bytes |
2 | longhorn_volume_capacity_bytes | Pre-requisite: 1. Prometheus is setup is done and Prometheus web UI is accessible.</pre Test Steps: 1. Create 4 volumes of different sizes. (2, 3, 4, 5 Gi)</pre 2. Attach 1st volume to a pod and write 1 Gi data into it.</pre 3. Attach 2nd volume to a pod and don’t write into.</pre 4. Leave the 3rd volume to the detached state.</pre 5. Attach the 4th volume to pod and write 1.5 Gi data into it. Detach the volume.</pre 6. Go to Prometheus web UI.</pre 7. Select longhorn_volume_capacity_bytes and execute. |
1. All the volumes should be identified by Prometheus</pre 2. All the volumes should show the capacity as 2 Gi |
3 | longhorn_volume_usage_bytes | Pre-requisite: 1. Prometheus is setup is done and Prometheus web UI is accessible.</pre Test Steps: 1. Create 4 volumes of different sizes. (2, 3, 4, 5 Gi)</pre 2. Attach 1st volume to a pod and write 1 Gi data into it.</pre 3. Attach 2nd volume to a pod and don’t write into.</pre 4. Leave the 3rd volume to the detached state.</pre 5. Attach the 4th volume to pod and write 1.5 Gi data into it. Detach the volume.</pre 6. Go to Prometheus web UI.</pre 7. Select longhorn_volume_usage_bytes and execute. |
1. All the volumes should be identified by Prometheus</pre 2. Volume-1 should show 1 Gi</pre 3. Volume-2 should show 0 Gi</pre 4. Volume-3 should show 0 Gi</pre 5. Volume-4 should show 1.5 Gi |
4 | longhorn_node_status | Pre-requisite: 1. Prometheus is setup is done and Prometheus web UI is accessible.</pre Test Steps: 1. Power down a node.</pre 2. Disable a node.</pre 3. Add a new node in the cluster.</pre 4. Delete a node from the cluster.</pre 5. Go to Prometheus web UI.</pre 6. Select longhorn_node_status and execute. |
1. All the nodes should be identified by Prometheus and one node should be shown in 3 rows based on the condition - mountpropagation, ready, schedulable </pre2. The correct status should be shown on Prometheus UI. |
5 | longhorn_instance_manager_cpu_requests_millicpu | Pre-requisite: 1. Prometheus is setup is done and Prometheus web UI is accessible.</pre Test Steps: 1. Create a volume and attach it to a pod.</pre 2. Write 1 Gi data into it.</pre 3. Set multiple recurring backup on the volume.</pre 4. Go to Prometheus web UI.</pre 5. Select longhorn_instance_manager_cpu_requests_millicpu and execute. |
1. The reading of cpu_requests should go up for the attached instance manager.</pre 2. The reading of other instance managers should not get impacted. |
6 | longhorn_instance_manager_cpu_usage_millicpu | Pre-requisite: 1. Prometheus is setup is done and Prometheus web UI is accessible.</pre Test Steps: 1. Create a volume and attach it to a pod.</pre 2. Write 1 Gi data into it.</pre 3. Set multiple recurring backup on the volume.</pre 4. Go to Prometheus web UI.</pre 5. Select longhorn_instance_manager_cpu_usage_millicpu and execute. |
1. The reading of cpu_usage should be shown correctly</pre 2. The reading of other instance managers should not get impacted. |
7 | longhorn_instance_manager_memory_requests_bytes | Pre-requisite: 1. Prometheus is setup is done and Prometheus web UI is accessible.</pre Test Steps: 1. Create a volume and attach it to a pod.</pre 2. Write 1 Gi data into it.</pre 3. Set multiple recurring backup on the volume.</pre 4. Go to Prometheus web UI.</pre 5. Select longhorn_instance_manager_memory_requests_bytes and execute. |
1. The reading of memory_requests should go up for the attached instance manager.</pre 2. The reading of other instance managers should not get impacted. |
8 | longhorn_instance_manager_memory_usage_bytes | Pre-requisite: 1. Prometheus is setup is done and Prometheus web UI is accessible.</pre Test Steps: 1. Create a volume and attach it to a pod.</pre 2. Write 1 Gi data into it.</pre 3. Set multiple recurring backup on the volume.</pre 4. Go to Prometheus web UI.</pre 5. Select longhorn_instance_manager_memory_usage_bytes and execute. |
1. The reading of memory_usage should go up for the attached instance manager.</pre 2. The reading of other instance managers should not get impacted. |
9 | longhorn_manager_cpu_usage_millicpu | Pre-requisite: 1. Prometheus is setup is done and Prometheus web UI is accessible.</pre Test Steps: 1. Create 3 volumes of different sizes.</pre 2. Attach 1st volume to a pod and write 1 Gi data into it.</pre 3. Leave the 2nd volume to the detached state.</pre 4. Attach the 3th volume to pod and write 1.5 Gi data into it. Attach the volume in maintenance mode.</pre 5. Set a recurring backup on volume 1st.</pre 6. Perform revert to snapshot with 3rd volume.</pre 7. Go to Prometheus web UI.</pre 8. Select longhorn_manager_cpu_usage_millicpu and execute. |
1. Monitor the graph and the console on the Prometheus server, the cpu_usage should go up. |
10 | longhorn_manager_memory_usage_bytes | Pre-requisite: 1. Prometheus is setup is done and Prometheus web UI is accessible.</pre Test Steps: 1. Create 3 volumes of different sizes.</pre 2. Attach 1st volume to a pod and write 1 Gi data into it.</pre 3. Leave the 2nd volume to the detached state.</pre 4. Attach the 3th volume to pod and write 1.5 Gi data into it. Attach the volume in maintenance mode.</pre 5. Set a recurring backup on volume 1st.</pre 6. Perform revert to snapshot with 3rd volume.</pre 7. Try to make disk full of a node where longhorn-manager is running.</pre8. Go to Prometheus web UI.</pre 9. Select longhorn_manager_memory_usage_bytes and execute. |
1. Monitor the graph and the console on the Prometheus server, the memory_usage should go up. |
11 | longhorn_disk_capacity_bytes | Pre-requisite: 1. Prometheus is setup is done and Prometheus web UI is accessible.</pre Test Steps: 1. Create volumes and attach them to each node.</pre 2. Add an additional disk to all the nodes. (Different size)</pre 3. Write into the volumes.</pre 4. Power down a node.</pre 5. Disable a node.</pre 6. Add a new node in the cluster.</pre 7. Delete a node from the cluster.</pre 8. Go to Prometheus web UI.</pre 9. Select longhorn_disk_capacity_bytes and execute. |
1. All the disks should be identified by Prometheus.</pre 2. All the disks should show the correct total size of the disks. |
12 | longhorn_disk_usage_bytes | Pre-requisite: 1. Prometheus is setup is done and Prometheus web UI is accessible.</pre Test Steps: 1. Create volumes and attach them to each node.</pre 2. Add an additional disk to all the nodes. (Different size)</pre 3. Write into the volumes.</pre 4. Power down a node.</pre 5. Disable a node.</pre 6. Add a new node in the cluster.</pre 7. Delete a node from the cluster.</pre 8. Go to Prometheus web UI.</pre 9. Select longhorn_disk_usage_bytes and execute. |
1. All the disks should be identified by Prometheus.</pre 2. All the disks should show the occupied size of the disks. |
13 | longhorn_node_capacity_bytes | Pre-requisite: 1. Prometheus is setup is done and Prometheus web UI is accessible.</pre Test Steps: 1. Create volumes and attach them to each node.</pre 2. Add an additional disk to all the nodes. (Different size)</pre 3. Write into the volumes.</pre 4. Power down a node.</pre 5. Disable a node.</pre 6. Add a new node in the cluster.</pre 7. Delete a node from the cluster.</pre 8. Go to Prometheus web UI.</pre 9. Select longhorn_node_capacity_bytes and execute. |
1. All the nodes should be identified by Prometheus.</pre 2. All the nodes should show the total capacity available of disks available. |
14 | longhorn_node_usage_bytes | Pre-requisite: 1. Prometheus is setup is done and Prometheus web UI is accessible.</pre Test Steps: 1. Create volumes and attach them to each node.</pre 2. Add an additional disk to all the nodes. (Different size)</pre 3. Write into the volumes.</pre 4. Power down a node.</pre 5. Disable a node.</pre 6. Add a new node in the cluster.</pre 7. Delete a node from the cluster.</pre 8. Go to Prometheus web UI.</pre 9. Select longhorn_node_usage_bytes and execute. |
1. All the nodes should be identified by Prometheus</pre 2. All the nodes should show the occupied space on all disks attached to the node. |
Note: More details can be found on https://longhorn.io/docs/1.2.2/monitoring/