Related issues:
https://github.com/longhorn/longhorn/issues/2629
Note:
- This is a refine version for match current behavior
- Using below command to restart kubelet
- systemctl restart k3s-agent
- systemctl restart rke2-agent
Case 1: Restart Volume Node Kubelet Immediately
- Create cluster with config of 1 etcd/control plane and 3 worker nodes.
- Deploy Longhorn on the cluster.
- Deploy a statefulSet with Longhorn volume.
- Write some data into the mount point and compute the md5sum.
- Restart the kubelet on the node where the statefulSet Pod is running
- The volume kept healthy
- Scale down then re-scale up the workload. Verify the existing data is correct.
Case 2: Restart Volume Node Kubelet Immediately On Single node cluster
- Create a single node cluster.
- Follow the same steps and expected outcomes as in Case 1.
Case 3: Restart Volume Node Kubelet After Temporary Downtime
- Create cluster with config of 1 etcd/control plane and 3 worker nodes.
- Deploy Longhorn on the cluster.
- Deploy a statefulSet with Longhorn volume.
- Write some data into the mount point and compute the md5sum.
- Stop the kubelet on the node where the statefulSet Pod is running.
- Observe volume status changed.
- RWO volume become unknown.
- RWX volume become degraded.
- Start the kubelet stopped in step 6.
- Volume become healthy.
- Scale down then re-scale up the workload. Verify the existing data is correct.
Case 4: Restart Volume Node Kubelet After Temporary Downtime On Single node cluster
- Create a single node cluster.
- Follow the same steps and expected outcomes as in Case 3, except that in step 6 the RWX volume transitions to a Detached state.