Reusing failed replica for rebuilding
Longhorn upgrade with node down and removal
- Launch Longhorn v1.0.x
- Create and attach a volume, then write data to the volume.
- Directly remove a Kubernetes node, and shut down a node.
- Wait for the related replicas failure. Then record
replica.Spec.DiskID
for the failed replicas.
- Upgrade to Longhorn master
- Verify the Longhorn node related to the removed node is gone.
- Verify
replica.Spec.DiskID
on the down node is updated and the field of the replica on the gone node is unchanged.
replica.Spec.DataPath
for all replicas becomes empty.
- Remove all unscheduled replicas.
- Power on the down node. Wait for the failed replica on the down node being reused.
- Wait for a new replica being replenished and available.
Replica not available for reuse after disk migration
- Deploy longhorn v1.1.0
- Create and attach a volume, then write data to the volume.
- Directly remove a Kubernetes node which has a replica on it.
- Wait for the related replicas failure.
- Verify the Longhorn node related to the removed node is gone.
- Ssh to the node and crash the replica folder or make it readonly.
- Add the node in the cluster again.
- Verify a new replica being rebuilt and available.
- Verify the data of the replica.
[Edit]