Prometheus Support allows user to monitor the longhorn metrics. The details are available at https://longhorn.io/docs/1.1.0/monitoring/
Monitor longhorn
- Deploy the Prometheus-operator, ServiceMonitor pointing to longhorn-backend and Prometheus as mentioned in the doc.
- Create an ingress pointing to Prometheus service.
- Access the Prometheus web UI using the ingress created in the step 2.
- Select the metrics from below to monitor the longhorn resources.
- longhorn_volume_actual_size_bytes
- longhorn_volume_capacity_bytes
- longhorn_volume_robustness
- longhorn_volume_state
- longhorn_instance_manager_cpu_requests_millicpu
- longhorn_instance_manager_cpu_usage_millicpu
- longhorn_instance_manager_memory_requests_bytes
- longhorn_instance_manager_memory_usage_bytes
- longhorn_manager_cpu_usage_millicpu
- longhorn_manager_memory_usage_bytes
- longhorn_node_count_total
- longhorn_node_status
- longhorn_node_cpu_capacity_millicpu
- longhorn_node_cpu_usage_millicpu
- longhorn_node_memory_capacity_bytes
- longhorn_node_memory_usage_bytes
- longhorn_node_storage_capacity_bytes
- longhorn_node_storage_reservation_bytes
- longhorn_node_storage_usage_bytes
- longhorn_disk_capacity_bytes
- longhorn_disk_reservation_bytes
- longhorn_disk_usage_bytes
- Deploy workloads which use Longhorn volumes into the cluster. Verify that there is no abnormal data. e.g: volume capacity is 0, cpu usage is over 4000 milicpu etc.
- Attach a volume to a node. Detach the volume and attach it to a different node. Verify that the volume’s information is reported by at most 1 longhorn-manager at any time.
Configure Prometheus alert manager
- Deploy the Alertmanager as mentioned in the doc.
- Modify the alert configuration file and set email or slack.
- Deploy a service using node port to access web UI of the alert manager as mentioned in the doc.
- Follow the steps from the doc to create PrometheusRule and configure the Prometheus server.
- Go beyond the threshold set for PrometheusRule in the step 4.
- Verify the email or slack, user should get the alert message.
Monitor with Grafana
- Create a ConfigMap referring to the Prometheus. (Refer the doc)
- Deploy the Grafana and a service to access the UI.
- Go to Grafana dashboard and import prebuilt longhorn example.
- Verify the graphs and data are available to monitor.
Monitor with Rancher app
- Create a cluster in Rancher. (1 etcd/control plane and 3 worker nodes)
- Deploy longhorn v1.1.0.
- Enable the monitoring for a project.
- Deploy the ServiceMonitor pointing to longhorn-backend.
apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: longhorn-prometheus-servicemonitor namespace: longhorn-system labels: name: longhorn-prometheus-servicemonitor spec: selector: matchLabels: app: longhorn-manager namespaceSelector: matchNames: - longhorn-system endpoints: - port: manager
- Access the url provided by the app to access Prometheus or Grafana.
- Verify the longhorn metrics are available to monitor.
- Verify that kubelet_volume_* metrics are available if Rancher 2.5 monitoring app is deployed.
- Import Longhorn Example dashboard. Verify that the graph looks good.
- Setup alert and alert rules in Rancher monitoring app. Verify that alerts are working ok.