Related issue
https://github.com/longhorn/longhorn/issues/6764
Test Orphan Auto Cleanup Settings Replacement
Given A cluster with Longhorn v1.8.x ready
And Enable orphaned replica data auto deletion
$ kubectl -n longhorn-system edit settings.longhorn.io orphan-auto-deletion
And set the value to true
When Upgrade to Longhorn v1.9.0
Then The new setting orphan-resource-auto-deletion
is initialized with replica data enabled
$ kubectl -n longhorn-system get settings.longhorn.io orphan-resource-auto-deletion -o jsonpath='{.value}'
replicaData
And The old setting orphan-auto-deletion
is removed
$ kubectl -n longhorn-system get settings.longhorn.io orphan-auto-deletion
Error from server (NotFound): settings.longhorn.io "orphan-auto-deletion" not found
Test Orphaned Instance Detection
Given A cluster with Longhorn v1.9.0 ready
And The orphan auto deletion is disabled for all kinds of orphans
$ kubectl -n longhorn-system get settings.longhorn.io orphan-resource-auto-deletion
NAME VALUE AGE
orphan-resource-auto-deletion 2d1h
The value should be empty.
And Create orphaned engine and replica instances
- Create a volume
- Attach the volume to a node
- There is an engine running on given node
$ kubectl -n longhorn-system get engine -l "longhornnode=libvirt-ubuntu-k3s-worker1" NAME DATA ENGINE STATE NODE INSTANCEMANAGER IMAGE AGE vol-01-e-0 v1 running libvirt-ubuntu-k3s-worker1 instance-manager-b87f10b867cec1dca2b814f5e78bcc90 longhornio/longhorn-engine:v1.9.0 24m
- There is a replica running on given node
$ kubectl -n longhorn-system get replica -l "longhornnode=libvirt-ubuntu-k3s-worker1" NAME DATA ENGINE STATE NODE DISK INSTANCEMANAGER IMAGE AGE vol-01-r-e7106826 v1 running libvirt-ubuntu-k3s-worker1 a6338052-9852-4c59-b878-bfc4db32b00e instance-manager-b87f10b867cec1dca2b814f5e78bcc90 longhornio/longhorn-engine:v1.9.0 24m
- Suspend the node to simulate the temporary network outage
- Wait for the node unavailable
$ kubectl -n longhorn-system get lhn libvirt-ubuntu-k3s-worker1 NAME READY ALLOWSCHEDULING SCHEDULABLE AGE libvirt-ubuntu-k3s-worker1 False true True 26h
- Detach and delete the volume
- Resume the node to simulate the network resume
- Wait for the node ready
$ kubectl -n longhorn-system get lhn libvirt-ubuntu-k3s-worker1 NAME READY ALLOWSCHEDULING SCHEDULABLE AGE libvirt-ubuntu-k3s-worker1 True true True 26h
Then The engine-instance
and replica-instance
orphan CRs are created
$ kubectl -n longhorn-system get orphan -l "longhorn.io/orphan-type in (engine-instance,replica-instance)"
NAME TYPE NODE
orphan-d09e9176343f60b2aad39683b1781fd1d9bb29e3e986e36bc9f65d43d91db764 replica-instance libvirt-ubuntu-k3s-worker1
orphan-e3b484438aab83100726528c226b09990e9a0b2d0730f2f3489d14242175f9ca engine-instance libvirt-ubuntu-k3s-worker1
And The spec of engine orphan CR matches the engine instance
$ kubectl -n longhorn-system describe orphan orphan-e3b484438aab83100726528c226b09990e9a0b2d0730f2f3489d14242175f9ca
Name: orphan-e3b484438aab83100726528c226b09990e9a0b2d0730f2f3489d14242175f9ca
Kind: Orphan
Spec:
Data Engine: v1
Node ID: <instance node ID>
Orphan Type: engine-instance
Parameters:
Instance Manager: <instance manager ID>
Instance Name: orphan-engine-01-e-0
...
Name
:orphan-<sha256>
Spec.OrphanType
:engine-instance
Spec.NodeID
: instance’s node IDSpec.DataEngine
: instance’s engine type,v1
orv2
Spec.Parameters.InstanceName
: instance’s nameSpec.Parameters.InstanceManager
: instance manager ID
And The spec of replica orphan CR matches the replica instance
$ kubectl -n longhorn-system describe orphan orphan-d09e9176343f60b2aad39683b1781fd1d9bb29e3e986e36bc9f65d43d91db764
Name: orphan-d09e9176343f60b2aad39683b1781fd1d9bb29e3e986e36bc9f65d43d91db764
Spec:
Data Engine: v1
Node ID: <instance node ID>
Orphan Type: replica-instance
Parameters:
Instance Manager: <instance manager ID>
Instance Name: orphan-replica-01-r-0
...
Name
:orphan-<sha256>
Spec.OrphanType
:replica-instance
Spec.NodeID
: instance’s node IDSpec.DataEngine
: instance’s engine type,v1
orv2
Spec.Parameters.InstanceName
: instance’s nameSpec.Parameters.InstanceManager
: instance manager ID
When Delete the instance orphans
$ kubectl -n longhorn-system delete orphan -l "longhorn.io/orphan-type in (engine-instance,replica-instance)"
orphan.longhorn.io "orphan-d09e9176343f60b2aad39683b1781fd1d9bb29e3e986e36bc9f65d43d91db764" deleted
orphan.longhorn.io "orphan-e3b484438aab83100726528c226b09990e9a0b2d0730f2f3489d14242175f9ca" deleted
Then The instances are cleanup from the instance manager
$ kubectl -n longhorn-system describe instancemanager <instance manager ID>
And the orphaned instances are removed from Status.InstanceEngines
and Status.InstanceReplicas
.
Test Orphaned Instance Auto Detection
Given A cluster with Longhorn v1.9.0 ready
And Create orphaned engine and replica instances
- Create a volume
- Attach the volume to a node
- There is an engine running on given node
$ kubectl -n longhorn-system get engine -l "longhornnode=libvirt-ubuntu-k3s-worker1" NAME DATA ENGINE STATE NODE INSTANCEMANAGER IMAGE AGE vol-01-e-0 v1 running libvirt-ubuntu-k3s-worker1 instance-manager-b87f10b867cec1dca2b814f5e78bcc90 longhornio/longhorn-engine:v1.9.0 24m
- There is a replica running on given node
$ kubectl -n longhorn-system get replica -l "longhornnode=libvirt-ubuntu-k3s-worker1" NAME DATA ENGINE STATE NODE DISK INSTANCEMANAGER IMAGE AGE vol-01-r-e7106826 v1 running libvirt-ubuntu-k3s-worker1 a6338052-9852-4c59-b878-bfc4db32b00e instance-manager-b87f10b867cec1dca2b814f5e78bcc90 longhornio/longhorn-engine:v1.9.0 24m
- Suspend the node to simulate the temporary network outage
- Wait for the node unavailable
$ kubectl -n longhorn-system get lhn libvirt-ubuntu-k3s-worker1 NAME READY ALLOWSCHEDULING SCHEDULABLE AGE libvirt-ubuntu-k3s-worker1 False true True 26h
- Detach and delete the volume
- Resume the node to simulate the network resume
- Wait for the node ready
$ kubectl -n longhorn-system get lhn libvirt-ubuntu-k3s-worker1 NAME READY ALLOWSCHEDULING SCHEDULABLE AGE libvirt-ubuntu-k3s-worker1 True true True 26h
And The engine-instance
and replica-instance
orphan CRs are created
$ kubectl -n longhorn-system get orphan -l "longhorn.io/orphan-type in (engine-instance,replica-instance)"
NAME TYPE NODE
orphan-d09e9176343f60b2aad39683b1781fd1d9bb29e3e986e36bc9f65d43d91db764 replica-instance libvirt-ubuntu-k3s-worker1
orphan-e3b484438aab83100726528c226b09990e9a0b2d0730f2f3489d14242175f9ca engine-instance libvirt-ubuntu-k3s-worker1
When Enable orphan auto deletion for engine and replica instances
$ kubectl -n longhorn-system edit settings.longhorn.io orphan-resource-auto-deletion
Set the value to instance
Then The orphans should be deleted in 90 seconds
$ kubectl -n longhorn-system get orphan
No resources found in longhorn-system namespace.
When Create orphaned engine and replica instances again
- Create a volume
- Attach the volume to a node
- There is an engine running on given node
$ kubectl -n longhorn-system get engine -l "longhornnode=libvirt-ubuntu-k3s-worker1" NAME DATA ENGINE STATE NODE INSTANCEMANAGER IMAGE AGE vol-01-e-0 v1 running libvirt-ubuntu-k3s-worker1 instance-manager-b87f10b867cec1dca2b814f5e78bcc90 longhornio/longhorn-engine:v1.9.0 24m
- There is a replica running on given node
$ kubectl -n longhorn-system get replica -l "longhornnode=libvirt-ubuntu-k3s-worker1" NAME DATA ENGINE STATE NODE DISK INSTANCEMANAGER IMAGE AGE vol-01-r-e7106826 v1 running libvirt-ubuntu-k3s-worker1 a6338052-9852-4c59-b878-bfc4db32b00e instance-manager-b87f10b867cec1dca2b814f5e78bcc90 longhornio/longhorn-engine:v1.9.0 24m
- Suspend the node to simulate the temporary network outage
- Wait for the node unavailable
$ kubectl -n longhorn-system get lhn libvirt-ubuntu-k3s-worker1 NAME READY ALLOWSCHEDULING SCHEDULABLE AGE libvirt-ubuntu-k3s-worker1 False true True 26h
- Detach and delete the volume
- Resume the node to simulate the network resume
- Wait for the node ready
$ kubectl -n longhorn-system get lhn libvirt-ubuntu-k3s-worker1 NAME READY ALLOWSCHEDULING SCHEDULABLE AGE libvirt-ubuntu-k3s-worker1 True true True 26h
Then The orphaned instances should be deleted automatically in 90 seconds
$ kubectl -n longhorn-system describe instancemanager <instance manager ID>
And the orphaned instances are removed from Status.InstanceEngines
and Status.InstanceReplicas
.
And No engine-instance
or replica-instance
orphan CR exists
$ kubectl -n longhorn-system get orphan -l "longhorn.io/orphan-type in (engine-instance,replica-instance)"
No resources found in longhorn-system namespace.
Test Orphan Deletion When Evicting Node
Given A cluster with Longhorn v1.9.0 ready
And The orphan auto deletion is disabled for all kinds of orphans
$ kubectl -n longhorn-system get settings.longhorn.io orphan-resource-auto-deletion
NAME VALUE AGE
orphan-resource-auto-deletion 2d1h
The value should be empty.
And Create orphaned engine and replica instances again
- Create a volume
- Attach the volume to a node
- There is an engine running on given node
$ kubectl -n longhorn-system get engine -l "longhornnode=libvirt-ubuntu-k3s-worker1" NAME DATA ENGINE STATE NODE INSTANCEMANAGER IMAGE AGE vol-01-e-0 v1 running libvirt-ubuntu-k3s-worker1 instance-manager-b87f10b867cec1dca2b814f5e78bcc90 longhornio/longhorn-engine:v1.9.0 24m
- There is a replica running on given node
$ kubectl -n longhorn-system get replica -l "longhornnode=libvirt-ubuntu-k3s-worker1" NAME DATA ENGINE STATE NODE DISK INSTANCEMANAGER IMAGE AGE vol-01-r-e7106826 v1 running libvirt-ubuntu-k3s-worker1 a6338052-9852-4c59-b878-bfc4db32b00e instance-manager-b87f10b867cec1dca2b814f5e78bcc90 longhornio/longhorn-engine:v1.9.0 24m
- Suspend the node to simulate the temporary network outage
- Wait for the node unavailable
$ kubectl -n longhorn-system get lhn libvirt-ubuntu-k3s-worker1 NAME READY ALLOWSCHEDULING SCHEDULABLE AGE libvirt-ubuntu-k3s-worker1 False true True 26h
- Detach and delete the volume
- Resume the node to simulate the network resume
- Wait for the node ready
$ kubectl -n longhorn-system get lhn libvirt-ubuntu-k3s-worker1 NAME READY ALLOWSCHEDULING SCHEDULABLE AGE libvirt-ubuntu-k3s-worker1 True true True 26h
When Evict the orphaned instances' node
$ kubectl -n longhorn-system edit lhn <node>
- Disable
spec.allowScheduling
- Enable
spec.evictionRequested
And Orphaned engine and replica should be deleted in 90 seconds
$ kubectl -n longhorn-system get orphan -l "longhornnode=<node>"
No resources found in longhorn-system namespace.
Finally Cancel node eviction
$ kubectl -n longhorn-system edit lhn <node>
- Enable
spec.allowScheduling
- Disable
spec.evictionRequested
Test Orphan Instance CR Is Tracking Node Status
Given A cluster with Longhorn v1.9.0 ready
And The orphan auto deletion is disabled for all kinds of orphans
$ kubectl -n longhorn-system get settings.longhorn.io orphan-resource-auto-deletion
NAME VALUE AGE
orphan-resource-auto-deletion 2d1h
The value should be empty.
And Create orphaned engine and replica instances again
- Create a volume
- Attach the volume to a node
- There is an engine running on given node
$ kubectl -n longhorn-system get engine -l "longhornnode=libvirt-ubuntu-k3s-worker1" NAME DATA ENGINE STATE NODE INSTANCEMANAGER IMAGE AGE vol-01-e-0 v1 running libvirt-ubuntu-k3s-worker1 instance-manager-b87f10b867cec1dca2b814f5e78bcc90 longhornio/longhorn-engine:v1.9.0 24m
- There is a replica running on given node
$ kubectl -n longhorn-system get replica -l "longhornnode=libvirt-ubuntu-k3s-worker1" NAME DATA ENGINE STATE NODE DISK INSTANCEMANAGER IMAGE AGE vol-01-r-e7106826 v1 running libvirt-ubuntu-k3s-worker1 a6338052-9852-4c59-b878-bfc4db32b00e instance-manager-b87f10b867cec1dca2b814f5e78bcc90 longhornio/longhorn-engine:v1.9.0 24m
- Suspend the node to simulate the temporary network outage
- Wait for the node unavailable
$ kubectl -n longhorn-system get lhn libvirt-ubuntu-k3s-worker1 NAME READY ALLOWSCHEDULING SCHEDULABLE AGE libvirt-ubuntu-k3s-worker1 False true True 26h
- Detach and delete the volume
- Resume the node to simulate the network resume
- Wait for the node ready
$ kubectl -n longhorn-system get lhn libvirt-ubuntu-k3s-worker1 NAME READY ALLOWSCHEDULING SCHEDULABLE AGE libvirt-ubuntu-k3s-worker1 True true True 26h
And The engine-instance
and replica-instance
orphan CRs are created
$ kubectl -n longhorn-system get orphan -l "longhorn.io/orphan-type in (engine-instance,replica-instance)"
NAME TYPE NODE
orphan-d09e9176343f60b2aad39683b1781fd1d9bb29e3e986e36bc9f65d43d91db764 replica-instance libvirt-ubuntu-k3s-worker1
orphan-e3b484438aab83100726528c226b09990e9a0b2d0730f2f3489d14242175f9ca engine-instance libvirt-ubuntu-k3s-worker1
When The orphaned instances' node disconnects from the cluster
- Suspend the node to simulate the temporary network outage
- Wait for the node unavailable
$ kubectl -n longhorn-system get lhn libvirt-ubuntu-k3s-worker1 NAME READY ALLOWSCHEDULING SCHEDULABLE AGE libvirt-ubuntu-k3s-worker1 False true True 26h
Then All orphans are removed from the cluster
$ kubectl -n longhorn-system get orphan -l "longhorn.io/orphan-type in (engine-instance,replica-instance)"
No resources found in longhorn-system namespace.
Finally Resume all nodes, and delete the workload, the volume, and any rest orphan CRs