- Setup a cluster of 3 worker nodes
- Install Longhorn and set
Default Replica Count = 2
(because we will turn off one node) - Create a StatefulSet with 2 pods using the command:
kubectl create -f https://raw.githubusercontent.com/longhorn/longhorn/master/examples/statefulset.yaml
- Create a volume + pv + pvc named
vol1
and create a deployment(1 pod) of default ubuntu namedshell
with the usage of pvcvol1
mounted under/mnt/vol1
- Find the node which contains one pod of the StatefulSet/Deployment. Power off the node
StatefulSet
if NodeDownPodDeletionPolicy
is set to do-nothing
| delete-deployment-pod
- wait till the
pod.deletionTimestamp
has passed - verify no replacement pod generated, the pod is stuck at terminating forever.
if NodeDownPodDeletionPolicy
is set to delete-statefulset-pod
| delete-both-statefulset-and-deployment-pod
- wait till pod’s status becomes
terminating
and thepod.deletionTimestamp
has passed (around 7 minutes) - verify that the pod is deleted and there is a new running replacement pod.
- Verify that you can access/read/write the volume on the new pod
Deployment
if NodeDownPodDeletionPolicy
is set to do-nothing
| delete-statefulset-pod
- wait till the
pod.deletionTimestamp
has passed - replacement pod will be stuck in
Pending
state forever - force delete the terminating pod
- wait till replacement pod is running
- verify that you can access
vol1
via theshell
replacement pod under/mnt/vol1
once it is in the running state
if NodeDownPodDeletionPolicy
is set to delete-deployment-pod
| delete-both-statefulset-and-deployment-pod
- wait till the
pod.deletionTimestamp
has passed - verify that the pod is deleted and there is a new running replacement pod.
- verify that you can access
vol1
via theshell
replacement pod under/mnt/vol1
Other kinds
- Verify that Longhorn never deletes any other pod on the downed node.
Test example
One typical scenario when the enhancement has succeeded is as below. When a node (say node-x
) goes down (assume using Kubernetes' default settings and user allows Longhorn to force delete pods):
Time | Event |
---|---|
0m:00s | node-x goes down and stops sending heartbeats to Kubernetes Node controller |
0m:40s | Kubernetes Node controller reports node-x is NotReady . |
5m:40s | Kubernetes Node controller starts evicting pods from node-x using graceful termination (set DeletionTimestamp and deletionGracePeriodSeconds = 10s/30s ) |
5m:50s/6m:10s | Longhorn forces delete the pod of StatefulSet/Deployment which uses Longhorn volume |