Module tests.test_node
Functions
def check_all_replicas_evict_state(client, volume_name, expect_state)
def check_node_auto_evict_state(client, target_node, expect_state)
def check_replica_evict_state(client, volume_name, node, expect_state)
def drain_node(core_api, node)
def get_all_replica_name(client, volume_name)
def get_replica_detail(replica_name)
-
Get allreplica information by this function
def make_replica_on_specific_node(client, volume_name, node)
def random_disk_path()
def reset_default_disk_label()
def reset_disk_and_tag_annotations()
def reset_disk_settings()
def test_auto_detach_volume_when_node_is_cordoned(client, core_api, volume_name)
-
Test auto detach volume when node is cordoned
- Set
detach-manually-attached-volumes-when-cordoned
tofalse
. - Create a volume and attached to the node through API (manually).
- Cordon the node.
- Set
detach-manually-attached-volumes-when-cordoned
totrue
. - Volume will be detached automatically.
- Set
def test_disable_scheduling_on_cordoned_node(client, core_api, reset_default_disk_label, reset_disk_and_tag_annotations, reset_disk_settings)
-
Test replica scheduler: schedule replica based on
Disable Scheduling On Cordoned Node
setting- Set
Disable Scheduling On Cordoned Node
to true. - Set
Replica Soft Anti-Affinity
to false. - Set cordon on one node.
- Create a volume with 3 replicas.
- Set
Disable Scheduling On Cordoned Node
to false. - Automatically the scheduler should creates three replicas from step 5 failure.
- Attach this volume, write data to it and check the data.
- Delete the test volume.
- Set
def test_disk_eviction_with_node_level_soft_anti_affinity_disabled(client, volume_name, request, settings_reset, reset_disk_settings)
-
Steps:
- Disable the setting
Replica Node Level Soft Anti-affinity
- Create a volume. Make sure there is a replica on each worker node.
- Write some data to the volume.
- Add a new schedulable disk to node-1.
- Disable the scheduling and enable eviction for the old disk on node-1.
- Verify that the replica on the old disk move to the new disk
- Make replica count as 1, Delete the replicas on other 2 nodes. Verify the data from the volume.
- Disable the setting
def test_disk_migration(client)
-
- Disable the node soft anti-affinity.
- Create a new host disk.
- Disable the default disk and add the extra disk with scheduling enabled for the current node.
- Launch a Longhorn volume with 1 replica. Then verify the only replica is scheduled to the new disk.
- Write random data to the volume then verify the data.
- Detach the volume.
- Unmount then remount the disk to another path. (disk migration)
- Create another Longhorn disk based on the migrated path.
- Verify the Longhorn disk state.
- The Longhorn disk added before the migration should become "unschedulable".
- The Longhorn disk created after the migration should become "schedulable".
- Verify the replica DiskID and the path is updated.
- Attach the volume. Then verify the state and the data.
def test_do_not_react_to_brief_kubelet_restart()
-
Test the node controller ignores Ready == False due to KubeletNotReady for ten seconds before reacting.
Repeat the following five times: 1. Verify status.conditions[type == Ready] == True for the Longhorn node we are running on. 2. Kill the kubelet process (e.g.
pkill kubelet
). 3. Verify status.conditions[type == Ready] != False for the Longhorn node we are running on at any point for at least ten seconds. def test_drain_with_block_for_eviction_failure(client, core_api, volume_name, make_deployment_with_pvc)
-
Test drain never completes with node-drain-policy block-for-eviction
- Set
node-drain-policy
toblock-for-eviction
. - Create a volume.
- Ensure (through soft anti-affinity, high replica count, and/or not enough disks) that an evicted replica of the volume cannot be scheduled elsewhere.
- Write data to the volume.
- Drain a node one of the volume's replicas is scheduled to.
- While the drain is ongoing:
- Verify that
node.status.autoEvicting == true
. - Verify that
replica.spec.evictionRequested == true
. - Verify the drain never completes.
- Stop the drain, check volume is healthy and data correct
- Set
def test_drain_with_block_for_eviction_if_contains_last_replica_success(client, core_api, make_deployment_with_pvc)
-
Test drain completes after evicting replicas with node-drain-policy block-for-eviction-if-contains-last-replica
- Set
node-drain-policy
toblock-for-eviction-if-contains-last-replica
. - Create one volume with a single replica and another volume with three replicas.
- Ensure (through soft anti-affinity, low replica count, and/or enough disks) that evicted replicas of both volumes can be scheduled elsewhere.
- Write data to the volumes.
- Drain a node both volumes have a replica scheduled to.
- While the drain is ongoing:
- Verify that the volume with three replicas becomes degraded.
- Verify that
node.status.autoEvicting == true
. - Optionally verify that
replica.spec.evictionRequested == true
on the replica for the volume that only has one. - Optionally verify that
replica.spec.evictionRequested == false
on the replica for the volume that has three. - Verify the drain completes.
- Uncordon the node.
- Verify the replica for the volume with one replica has moved to a different node.
- Verify the replica for the volume with three replicas has not moved.
- Verify that
node.status.autoEvicting == false
. - Verify that
replica.spec.evictionRequested == false
on all replicas. - Verify the the data in both volumes.
- Set
def test_drain_with_block_for_eviction_success(client, core_api, volume_name, make_deployment_with_pvc)
-
Test drain completes after evicting replica with node-drain-policy block-for-eviction
- Set
node-drain-policy
toblock-for-eviction
. - Create a volume.
- Ensure (through soft anti-affinity, low replica count, and/or enough disks) that an evicted replica of the volume can be scheduled elsewhere.
- Write data to the volume.
- Drain a node one of the volume's replicas is scheduled to.
- While the drain is ongoing:
- Verify that
node.status.autoEvicting == true
. - Optionally verify that
replica.spec.evictionRequested == true
. - Verify the drain completes.
- Uncordon the node.
- Verify the replica on the drained node has moved to a different one.
- Verify that
node.status.autoEvicting == false
. - Verify that
replica.spec.evictionRequested == false
. - Verify the volume's data.
- Set
def test_node_config_annotation(client, core_api, reset_default_disk_label, reset_disk_and_tag_annotations, reset_disk_settings)
-
Test node feature: default disks/node configuration
- Set node 0 label and annotation.
- Set node 1 label but with invalid annotation (invalid path and tag)
- Cleanup disks on node 0 and 1.
- The initial default disk will not be recreated.
- Enable setting
create default disk labeled nodes
- Wait for node tag to update on node 0.
- Verify node 0 has correct disk and tags set.
- Verify node 1 has no disk or tag.
- Update node 1's label and tag to be valid
- Verify now node 1 has correct disk and tags set
def test_node_config_annotation_invalid(client, core_api, reset_default_disk_label, reset_disk_and_tag_annotations, reset_disk_settings)
-
Test invalid node annotations for default disks/node configuration
Case1: The invalid disk annotation shouldn't intervene the node controller.
- Set invalid disk annotation
- The node tag or disks won't be updated
- Create a new disk. It will be updated by the node controller.
Case2: The existing node disks keep unchanged even if the annotation is corrected.
- Set valid disk annotation but set
allowScheduling
to false, etc. - Make sure the current disk won't change
Case3: the correct annotation should be applied after cleaning up all disks
- Delete all the disks on the node
- Wait for the config from disk annotation applied
Case4: The invalid tag annotation shouldn't intervene the node controller.
- Cleanup the node annotation and remove the node disks/tags
- Set invalid tag annotation
- Disk and tags configuration will not be applied
- Disk and tags can still be updated on the node
Case5: The existing node keep unchanged even if the tag annotation is fixed up.
- With existing tags, change tag annotation.
- It won't change the current node's tag
Case6: Clean up all node tags then the correct annotation should be applied
- Clean the current tags
- New tags from node annotation should be applied
Case7: Same disk name in annotation shouldn't intereven the node controller 1. Create one disk for node 2. Set the same name in annotation and set label and enable "Create Default Disk on Labeled Nodes" in settings. 3. The node tag or disks won't be updated.
def test_node_config_annotation_missing(client, core_api, reset_default_disk_label, reset_disk_and_tag_annotations, reset_disk_settings)
-
Test node labeled for configuration but no annotation
- Set setting
create default disk labeled nodes
to true - Set the config label on node 0 but leave annotation empty
- Verify disk update works.
- Verify tag update works
- Verify using tag annotation for configuration works.
- After remove the tag annotation, verify unset tag node works fine.
- Set tag annotation again. Verify node updated for the tag.
- Set setting
def test_node_controller_sync_disk_state(client)
-
Test node controller to sync disk state
- Set setting
StorageMinimalAvailablePercentage
to 100 - All the disks will become
unschedulable
. - Restore setting
StorageMinimalAvailablePercentage
to previous - All the disks will become
schedulable
.
- Set setting
def test_node_controller_sync_storage_available(client)
-
Test node controller sync storage available correctly
- Create a host disk
test_disk
on the current node - Write 1MiB data to the disk, and run
sync
- Verify the disk
storageAvailable
will update to include the file
- Create a host disk
def test_node_controller_sync_storage_scheduled(client)
-
Test node controller sync storage scheduled correctly
- Wait until no disk has anything scheduled
- Create a volume with "number of nodes" replicas
- Confirm that each disks now has "volume size" scheduled
- Confirm every disks are still schedulable.
def test_node_default_disk_added_back_with_extra_disk_unmounted(client)
-
[Node] Test adding default disk back with extra disk is unmounted on the node
- Clean up all disks on node 1.
- Recreate the default disk with "allowScheduling" disabled for node 1.
- Create a Longhorn volume and attach it to node 1.
- Use the Longhorn volume as an extra host disk and enable "allowScheduling" of the default disk for node 1.
- Verify all disks on node 1 are "Schedulable".
- Delete the default disk on node 1.
- Unmount the extra disk on node 1. And wait for it becoming "Unschedulable".
- Create and add the default disk back on node 1.
- Wait and verify the default disk should become "Schedulable".
- Mount extra disk back on node 1.
- Wait and verify this extra disk should become "Schedulable".
- Delete the host disk
extra_disk
.
def test_node_default_disk_labeled(client, core_api, random_disk_path, reset_default_disk_label, reset_disk_settings)
-
Test node feature: create default Disk according to the node label
Makes sure the created Disk matches the Default Data Path Setting.
- Add labels to node 0 and 1, don't add label to node 2.
- Remove all the disks on node 1 and 2.
- The initial default disk will not be recreated.
- Set setting
default disk path
to a random disk path. - Set setting
create default disk labeled node
to true. - Check node 0. It should still use the previous default disk path.
- Due to we didn't remove the disk from node 0.
- Check node 1. A new disk should be created at the random disk path.
- Check node 2. There is still no disks
def test_node_disk_update(client)
-
Test update node disks
The test will use Longhorn to create disks on the node.
- Get the current node
- Try to delete all the disks. It should fail due to scheduling is enabled
- Create two disks
disk1
anddisk2
, attach them to the current node. - Add two disks to the current node.
- Verify two extra disks have been added to the node
- Disable the two disks' scheduling, and set StorageReserved
- Update the two disks.
- Validate all the disks properties.
- Delete other two disks. Validate deletion works.
def test_node_eviction(client, core_api, csi_pv, pvc, pod_make, volume_name)
-
Test node eviction (assuming this is a 3 nodes cluster)
Case: node 1, 3 to node 1, 2 eviction 1. Disable scheduling on node 2. 2. Create pv, pvc, pod with volume of 2 replicas. 3. Write some data and get the checksum. 4. Set 'Eviction Requested' to 'false' and enable scheduling on node 2. 5. Set 'Eviction Requested' to 'true' and disable scheduling on node 3. 6. Check volume 'healthy' and wait for replicas running on node 1 and 2. 7. Check volume data checksum.
def test_node_eviction_multiple_volume(client, core_api, csi_pv, pvc, pod_make, volume_name)
-
Test node eviction (assuming this is a 3 nodes cluster)
- Disable scheduling on node 1.
- Create pv, pvc, pod with volume 1 of 2 replicas.
- Write some data to volume 1 and get the checksum.
- Create pv, pvc, pod with volume 2 of 2 replicas.
- Write some data to volume 2 and get the checksum.
- Set 'Eviction Requested' to 'true' and disable scheduling on node 2.
- Set 'Eviction Requested' to 'false' and enable scheduling on node 1.
- Check volume 'healthy' and wait for replicas running on node 1 and 3.
- delete pods to detach volume 1 and 2.
- Set 'Eviction Requested' to 'false' and enable scheduling on node 2.
- Set 'Eviction Requested' to 'true' and disable scheduling on node 1.
- Wait for replicas running on node 2 and 3.
- Create pod 1 and pod 2. Volume 1 and 2 will be automatically attached.
- Check volume 'healthy', and replicas running on node 2 and 3.
- Check volume data checksum for volume 1 and 2.
def test_node_eviction_no_schedulable_node(client, core_api, csi_pv, pvc, pod_make, volume_name, settings_reset)
-
Test node eviction (assuming this is a 3 nodes cluster)
- Disable scheduling on node 3.
- Create pv, pvc, pod with volume of 2 replicas.
- Write some data and get the checksum.
- Disable scheduling and set 'Eviction Requested' to 'true' on node 1.
- Volume should be failed to schedule new replica.
- Set 'Eviction Requested' to 'false' to cancel node 1 eviction.
- Check replica has the same hostID.
- Check volume data checksum.
def test_node_eviction_soft_anti_affinity(client, core_api, csi_pv, pvc, pod_make, volume_name, settings_reset)
-
Test node eviction (assuming this is a 3 nodes cluster)
Case #1: node 1,2 to node 2 eviction 1. Disable scheduling on node 3. 2. Create pv, pvc, pod with volume of 2 replicas. 3. Write some data and get the checksum. 7. Set 'Eviction Requested' to 'true' and disable scheduling on node 1. 8. Set 'Replica Node Level Soft Anti-Affinity' to 'true'. 9. Check volume 'healthy' and wait for replicas running on node 2 Case #2: node 2 to node 1, 3 eviction 10. Enable scheduling on node 1 and 3. 11. Set 'Replica Node Level Soft Anti-Affinity' to 'false'. 12. Set 'Eviction Requested' to 'true' and disable scheduling on node 2. 13. Check volume 'healthy' and wait for replicas running on node 1 and 3. 14. Check volume data checksum.
def test_node_umount_disk(client)
-
[Node] Test umount and delete the extra disk on the node
- Create host disk and attach it to the current node
- Disable the existing disk's scheduling on the current node
- Add the disk to the current node
- Wait for node to recognize the disk
- Create a volume with "number of nodes" replicas
- Umount the disk from the host
- Verify the disk
READY
condition become false.- Maximum and available storage become zero.
- No change to storage scheduled and storage reserved.
- Try to delete the extra disk, it should fail due to need to disable scheduling first
- Update the other disk on the node to be allow scheduling. Disable the scheduling for the extra disk
- Mount the disk back
- Verify the disk
READY
condition become true, and other states - Umount and delete the disk.
def test_replica_datapath_cleanup(client)
-
Test replicas data path cleanup
Test prerequisites: - Enable Replica Node Level Soft Anti-Affinity setting
- Create host disk
extra_disk
and add it to the current node. - Disable all the disks except for the ones on the current node.
- Create a volume with 5 replicas (soft anti-affinity on)
- To make sure both default disk and extra disk can have one replica
- Current we don't have anti-affinity for disks on the same node
- Verify the data path for replicas are created.
- Delete the volume.
- Verify the data path for replicas are deleted.
- Create host disk
def test_replica_scheduler_exceed_over_provisioning(client)
-
Test replica scheduler: exceeding overprovisioning parameter
- Set setting
overprovisioning
to 100 (default) - Update every disks to set 1G available for scheduling
- Try to schedule a volume of 2G. Volume scheduled condition should be false
- Set setting
def test_replica_scheduler_just_under_over_provisioning(client)
-
Test replica scheduler: just under overprovisioning parameter
- Set setting
overprovisioning
to 100 (default) - Get the maximum size of all the disks
- Create a volume using maximum_size - 2MiB as the volume size.
- Volume scheduled condition should be true.
- Make sure every replica landed on different nodes's default disk.
- Set setting
def test_replica_scheduler_large_volume_fit_small_disk(client)
-
Test replica scheduler: not schedule a large volume to small disk
- Create a host disk
small_disk
and attach i to the current node. - Create a new large volume.
- Verify the volume wasn't scheduled on the
small_disk
.
- Create a host disk
def test_replica_scheduler_no_disks(client)
-
Test replica scheduler with no disks available
- Delete all the disks on all the nodes
- Create a volume.
- Wait for volume condition
scheduled
to be false.
def test_replica_scheduler_rebuild_restore_is_too_big(set_random_backupstore, client)
-
Test replica scheduler: rebuild/restore can be too big to fit a disk
- Create a small host disk with
SIZE
and add it to the current node. - Create a volume with size
SIZE
. - Disable all scheduling except for the small disk.
- Write a data size
SIZE * 0.9
to the volume and make a backup - Create a restored volume with 1 replica from backup.
- Verify the restored volume cannot be scheduled since the existing data cannot fit in the small disk
- Delete a replica of volume.
- Verify the volume reports
scheduled = false
due to unable to find a suitable disk for rebuilding replica, since the replica with the existing data cannot fit in the small disk
- Verify the volume reports
- Enable the scheduling for other disks, disable scheduling for small disk
- Verify the volume reports
scheduled = true
. And verify the data. - Cleanup the volume.
- Verify the restored volume reports
scheduled = true
. - Wait for the restored volume to complete restoration, then check data.
- Create a small host disk with
def test_replica_scheduler_too_large_volume_fit_any_disks(client)
-
Test replica scheduler: volume is too large to fit any disks
- Disable all default disks on all nodes by setting storageReserved to maximum size
- Create volume.
- Verify the volume scheduled condition is false.
- Reduce the storageReserved on all the disks to just enough for one replica.
- The volume should automatically change scheduled condition to true
- Attach the volume.
- Make sure every replica landed on different nodes's default disk.
def test_replica_scheduler_update_minimal_available(client)
-
Test replica scheduler: update setting
minimal available
- Set setting
minimal available
to 100% (means no one can schedule) - Verify for all disks' schedulable condition to become false.
- Create a volume. Verify it's unschedulable.
- Set setting
minimal available
back to default setting - Disk should become schedulable now.
- Volume should be scheduled now.
- Attach the volume.
- Make sure every replica landed on different nodes's default disk.
- Set setting
def test_replica_scheduler_update_over_provisioning(client)
-
Test replica scheduler: update overprovisioning setting
- Set setting
overprovisioning
to 0. (disable all scheduling) - Create a new volume. Verify volume's
scheduled
condition is false. - Set setting
over provisioning
to 200%. - Verify volume's
scheduled
condition now become true. - Attach the volume.
- Make sure every replica landed on different nodes's default disk.
- Set setting
def test_update_node(client)
-
Test update node scheduling
- Get list of nodes
- Update scheduling to false for current node
- Read back to verify
- Update scheduling to true for current node
- Read back to verify
def wait_drain_complete(future, timeout, copmpleted=True)
-
Wait concurrent.futures object complete in a duration