Module `tests.test_node`

Functions

def check_all_replicas_evict_state(client, volume_name, expect_state)

def check_node_auto_evict_state(client, target_node, expect_state)

def check_replica_evict_state(client, volume_name, node, expect_state)

def drain_node(core_api, node)

def get_all_replica_name(client, volume_name)

def get_default_disk_path()

def get_replica_detail(replica_name)

Get allreplica information by this function

def make_replica_on_specific_node(client, volume_name, node)

def random_disk_path()

def reset_default_disk_label()

def reset_disk_and_tag_annotations()

def reset_disk_settings()

def test_auto_detach_volume_when_node_is_cordoned(client, core_api, volume_name)

Test auto detach volume when node is cordoned

Set detach-manually-attached-volumes-when-cordoned to false.
Create a volume and attached to the node through API (manually).
Cordon the node.
Set detach-manually-attached-volumes-when-cordoned to true.
Volume will be detached automatically.

def test_disable_scheduling_on_cordoned_node(client, core_api, reset_default_disk_label, reset_disk_and_tag_annotations, reset_disk_settings)

Test replica scheduler: schedule replica based on Disable Scheduling On Cordoned Node setting

Set Disable Scheduling On Cordoned Node to true.
Set Replica Soft Anti-Affinity to false.
Set cordon on one node.
Create a volume with 3 replicas.
Set Disable Scheduling On Cordoned Node to false.
Automatically the scheduler should creates three replicas from step 5 failure.
Attach this volume, write data to it and check the data.
Delete the test volume.

def test_disk_eviction_with_node_level_soft_anti_affinity_disabled(client, volume_name, request, settings_reset, reset_disk_settings)

Steps:

Disable the setting Replica Node Level Soft Anti-affinity
Create a volume. Make sure there is a replica on each worker node.
Write some data to the volume.
Add a new schedulable disk to node-1.
Disable the scheduling and enable eviction for the old disk on node-1.
Verify that the replica on the old disk move to the new disk
Make replica count as 1, Delete the replicas on other 2 nodes. Verify the data from the volume.

def test_disk_migration(client)

Disable the node soft anti-affinity.
Create a new host disk.
Disable the default disk and add the extra disk with scheduling enabled for the current node.
Launch a Longhorn volume with 1 replica. Then verify the only replica is scheduled to the new disk.
Write random data to the volume then verify the data.
Detach the volume.
Unmount then remount the disk to another path. (disk migration)
Create another Longhorn disk based on the migrated path.
Verify the Longhorn disk state.
The Longhorn disk added before the migration should become "unschedulable".
The Longhorn disk created after the migration should become "schedulable".
Verify the replica DiskID and the path is updated.
Attach the volume. Then verify the state and the data.

def test_do_not_react_to_brief_kubelet_restart()

Test the node controller ignores Ready == False due to KubeletNotReady for ten seconds before reacting.

Repeat the following five times: 1. Verify status.conditions[type == Ready] == True for the Longhorn node we are running on. 2. Kill the kubelet process (e.g. pkill kubelet). 3. Verify status.conditions[type == Ready] != False for the Longhorn node we are running on at any point for at least ten seconds.

def test_drain_with_block_for_eviction_failure(client, core_api, volume_name, make_deployment_with_pvc)

Test drain never completes with node-drain-policy block-for-eviction

Set node-drain-policy to block-for-eviction.
Create a volume.
Ensure (through soft anti-affinity, high replica count, and/or not enough disks) that an evicted replica of the volume cannot be scheduled elsewhere.
Write data to the volume.
Drain a node one of the volume's replicas is scheduled to.
While the drain is ongoing:
Verify that node.status.autoEvicting == true.
Verify that replica.spec.evictionRequested == true.
Verify the drain never completes.
Stop the drain, check volume is healthy and data correct

def test_drain_with_block_for_eviction_if_contains_last_replica_success(client, core_api, make_deployment_with_pvc)

Test drain completes after evicting replicas with node-drain-policy block-for-eviction-if-contains-last-replica

Set node-drain-policy to block-for-eviction-if-contains-last-replica.
Create one volume with a single replica and another volume with three replicas.
Ensure (through soft anti-affinity, low replica count, and/or enough disks) that evicted replicas of both volumes can be scheduled elsewhere.
Write data to the volumes.
Drain a node both volumes have a replica scheduled to.
While the drain is ongoing:
Verify that the volume with three replicas becomes degraded.
Verify that node.status.autoEvicting == true.
Optionally verify that replica.spec.evictionRequested == true on the replica for the volume that only has one.
Optionally verify that replica.spec.evictionRequested == false on the replica for the volume that has three.
Verify the drain completes.
Uncordon the node.
Verify the replica for the volume with one replica has moved to a different node.
Verify the replica for the volume with three replicas has not moved.
Verify that node.status.autoEvicting == false.
Verify that replica.spec.evictionRequested == false on all replicas.
Verify the the data in both volumes.

def test_drain_with_block_for_eviction_success(client, core_api, volume_name, make_deployment_with_pvc)

Test drain completes after evicting replica with node-drain-policy block-for-eviction

Set node-drain-policy to block-for-eviction.
Create a volume.
Ensure (through soft anti-affinity, low replica count, and/or enough disks) that an evicted replica of the volume can be scheduled elsewhere.
Write data to the volume.
Drain a node one of the volume's replicas is scheduled to.
While the drain is ongoing:
Verify that node.status.autoEvicting == true.
Optionally verify that replica.spec.evictionRequested == true.
Verify the drain completes.
Uncordon the node.
Verify the replica on the drained node has moved to a different one.
Verify that node.status.autoEvicting == false.
Verify that replica.spec.evictionRequested == false.
Verify the volume's data.

def test_node_config_annotation(client, core_api, reset_default_disk_label, reset_disk_and_tag_annotations, reset_disk_settings)

Test node feature: default disks/node configuration

Set node 0 label and annotation.
Set node 1 label but with invalid annotation (invalid path and tag)
Cleanup disks on node 0 and 1.
1. The initial default disk will not be recreated.
Enable setting create default disk labeled nodes
Wait for node tag to update on node 0.
Verify node 0 has correct disk and tags set.
Verify node 1 has no disk or tag.
Update node 1's label and tag to be valid
Verify now node 1 has correct disk and tags set

def test_node_config_annotation_invalid(client, core_api, reset_default_disk_label, reset_disk_and_tag_annotations, reset_disk_settings)

Test invalid node annotations for default disks/node configuration

Case1: The invalid disk annotation shouldn't intervene the node controller.

Set invalid disk annotation
The node tag or disks won't be updated
Create a new disk. It will be updated by the node controller.

Case2: The existing node disks keep unchanged even if the annotation is corrected.

Set valid disk annotation but set allowScheduling to false, etc.
Make sure the current disk won't change

Case3: the correct annotation should be applied after cleaning up all disks

Delete all the disks on the node
Wait for the config from disk annotation applied

Case4: The invalid tag annotation shouldn't intervene the node controller.

Cleanup the node annotation and remove the node disks/tags
Set invalid tag annotation
Disk and tags configuration will not be applied
Disk and tags can still be updated on the node

Case5: The existing node keep unchanged even if the tag annotation is fixed up.

With existing tags, change tag annotation.
It won't change the current node's tag

Case6: Clean up all node tags then the correct annotation should be applied

Clean the current tags
New tags from node annotation should be applied

Case7: Same disk name in annotation shouldn't intereven the node controller 1. Create one disk for node 2. Set the same name in annotation and set label and enable "Create Default Disk on Labeled Nodes" in settings. 3. The node tag or disks won't be updated.

def test_node_config_annotation_missing(client, core_api, reset_default_disk_label, reset_disk_and_tag_annotations, reset_disk_settings)

Test node labeled for configuration but no annotation

Set setting create default disk labeled nodes to true
Set the config label on node 0 but leave annotation empty
Verify disk update works.
Verify tag update works
Verify using tag annotation for configuration works.
After remove the tag annotation, verify unset tag node works fine.
Set tag annotation again. Verify node updated for the tag.

def test_node_controller_sync_disk_state(client)

Test node controller to sync disk state

Set setting StorageMinimalAvailablePercentage to 100
All the disks will become unschedulable.
Restore setting StorageMinimalAvailablePercentage to previous
All the disks will become schedulable.

def test_node_controller_sync_storage_available(client)

Test node controller sync storage available correctly

Create a host disk test_disk on the current node
Write 1MiB data to the disk, and run sync
Verify the disk storageAvailable will update to include the file

def test_node_controller_sync_storage_scheduled(client)

Test node controller sync storage scheduled correctly

Wait until no disk has anything scheduled
Create a volume with "number of nodes" replicas
Confirm that each disks now has "volume size" scheduled
Confirm every disks are still schedulable.

def test_node_default_disk_added_back_with_extra_disk_unmounted(client)

[Node] Test adding default disk back with extra disk is unmounted on the node

Clean up all disks on node 1.
Recreate the default disk with "allowScheduling" disabled for node 1.
Create a Longhorn volume and attach it to node 1.
Use the Longhorn volume as an extra host disk and enable "allowScheduling" of the default disk for node 1.
Verify all disks on node 1 are "Schedulable".
Delete the default disk on node 1.
Unmount the extra disk on node 1. And wait for it becoming "Unschedulable".
Create and add the default disk back on node 1.
Wait and verify the default disk should become "Schedulable".
Mount extra disk back on node 1.
Wait and verify this extra disk should become "Schedulable".
Delete the host disk extra_disk.

def test_node_default_disk_labeled(client, core_api, random_disk_path, reset_default_disk_label, reset_disk_settings)

Test node feature: create default Disk according to the node label

Makes sure the created Disk matches the Default Data Path Setting.

Add labels to node 0 and 1, don't add label to node 2.
Remove all the disks on node 1 and 2.
1. The initial default disk will not be recreated.
Set setting default disk path to a random disk path.
Set setting create default disk labeled node to true.
Check node 0. It should still use the previous default disk path.
1. Due to we didn't remove the disk from node 0.
Check node 1. A new disk should be created at the random disk path.
Check node 2. There is still no disks

def test_node_disk_update(client)

Test update node disks

The test will use Longhorn to create disks on the node.

Get the current node
Try to delete all the disks. It should fail due to scheduling is enabled
Create two disks disk1 and disk2, attach them to the current node.
Add two disks to the current node.
Verify two extra disks have been added to the node
Disable the two disks' scheduling, and set StorageReserved
Update the two disks.
Validate all the disks properties.
Delete other two disks. Validate deletion works.

def test_node_eviction(client, core_api, csi_pv, pvc, pod_make, volume_name)

Test node eviction (assuming this is a 3 nodes cluster)

Case: node 1, 3 to node 1, 2 eviction 1. Disable scheduling on node 2. 2. Create pv, pvc, pod with volume of 2 replicas. 3. Write some data and get the checksum. 4. Set 'Eviction Requested' to 'false' and enable scheduling on node 2. 5. Set 'Eviction Requested' to 'true' and disable scheduling on node 3. 6. Check volume 'healthy' and wait for replicas running on node 1 and 2. 7. Check volume data checksum.

def test_node_eviction_multiple_volume(client, core_api, csi_pv, pvc, pod_make, volume_name)

Test node eviction (assuming this is a 3 nodes cluster)

Disable scheduling on node 1.
Create pv, pvc, pod with volume 1 of 2 replicas.
Write some data to volume 1 and get the checksum.
Create pv, pvc, pod with volume 2 of 2 replicas.
Write some data to volume 2 and get the checksum.
Set 'Eviction Requested' to 'true' and disable scheduling on node 2.
Set 'Eviction Requested' to 'false' and enable scheduling on node 1.
Check volume 'healthy' and wait for replicas running on node 1 and 3.
delete pods to detach volume 1 and 2.
Set 'Eviction Requested' to 'false' and enable scheduling on node 2.
Set 'Eviction Requested' to 'true' and disable scheduling on node 1.
Wait for replicas running on node 2 and 3.
Create pod 1 and pod 2. Volume 1 and 2 will be automatically attached.
Check volume 'healthy', and replicas running on node 2 and 3.
Check volume data checksum for volume 1 and 2.

def test_node_eviction_no_schedulable_node(client, core_api, csi_pv, pvc, pod_make, volume_name, settings_reset)

Test node eviction (assuming this is a 3 nodes cluster)

Disable scheduling on node 3.
Create pv, pvc, pod with volume of 2 replicas.
Write some data and get the checksum.
Disable scheduling and set 'Eviction Requested' to 'true' on node 1.
Volume should be failed to schedule new replica.
Set 'Eviction Requested' to 'false' to cancel node 1 eviction.
Check replica has the same hostID.
Check volume data checksum.

def test_node_eviction_soft_anti_affinity(client, core_api, csi_pv, pvc, pod_make, volume_name, settings_reset)

Test node eviction (assuming this is a 3 nodes cluster)

Case #1: node 1,2 to node 2 eviction 1. Disable scheduling on node 3. 2. Create pv, pvc, pod with volume of 2 replicas. 3. Write some data and get the checksum. 7. Set 'Eviction Requested' to 'true' and disable scheduling on node 1. 8. Set 'Replica Node Level Soft Anti-Affinity' to 'true'. 9. Check volume 'healthy' and wait for replicas running on node 2 Case #2: node 2 to node 1, 3 eviction 10. Enable scheduling on node 1 and 3. 11. Set 'Replica Node Level Soft Anti-Affinity' to 'false'. 12. Set 'Eviction Requested' to 'true' and disable scheduling on node 2. 13. Check volume 'healthy' and wait for replicas running on node 1 and 3. 14. Check volume data checksum.

def test_node_umount_disk(client)

[Node] Test umount and delete the extra disk on the node

Create host disk and attach it to the current node
Disable the existing disk's scheduling on the current node
Add the disk to the current node
Wait for node to recognize the disk
Create a volume with "number of nodes" replicas
Umount the disk from the host
Verify the disk READY condition become false.
1. Maximum and available storage become zero.
2. No change to storage scheduled and storage reserved.
Try to delete the extra disk, it should fail due to need to disable scheduling first
Update the other disk on the node to be allow scheduling. Disable the scheduling for the extra disk
Mount the disk back
Verify the disk READY condition become true, and other states
Umount and delete the disk.

def test_replica_datapath_cleanup(client)

Test replicas data path cleanup

Test prerequisites: - Enable Replica Node Level Soft Anti-Affinity setting

Create host disk extra_disk and add it to the current node.
Disable all the disks except for the ones on the current node.
Create a volume with 5 replicas (soft anti-affinity on)
1. To make sure both default disk and extra disk can have one replica
2. Current we don't have anti-affinity for disks on the same node
Verify the data path for replicas are created.
Delete the volume.
Verify the data path for replicas are deleted.

def test_replica_scheduler_exceed_over_provisioning(client)

Test replica scheduler: exceeding overprovisioning parameter

Set setting overprovisioning to 100 (default)
Update every disks to set 1G available for scheduling
Try to schedule a volume of 2G. Volume scheduled condition should be false

def test_replica_scheduler_just_under_over_provisioning(client)

Test replica scheduler: just under overprovisioning parameter

Set setting overprovisioning to 100 (default)
Get the maximum size of all the disks
Create a volume using maximum_size - 2MiB as the volume size.
Volume scheduled condition should be true.
Make sure every replica landed on different nodes's default disk.

def test_replica_scheduler_large_volume_fit_small_disk(client)

Test replica scheduler: not schedule a large volume to small disk

Create a host disk small_disk and attach i to the current node.
Create a new large volume.
Verify the volume wasn't scheduled on the small_disk.

def test_replica_scheduler_no_disks(client)

Test replica scheduler with no disks available

Delete all the disks on all the nodes
Create a volume.
Wait for volume condition scheduled to be false.

def test_replica_scheduler_rebuild_restore_is_too_big(set_random_backupstore, client)

Test replica scheduler: rebuild/restore can be too big to fit a disk

Create a small host disk with SIZE and add it to the current node.
Create a volume with size SIZE.
Disable all scheduling except for the small disk.
Write a data size SIZE * 0.9 to the volume and make a backup
Create a restored volume with 1 replica from backup.
1. Verify the restored volume cannot be scheduled since the existing data cannot fit in the small disk
Delete a replica of volume.
1. Verify the volume reports scheduled = false due to unable to find a suitable disk for rebuilding replica, since the replica with the existing data cannot fit in the small disk
Enable the scheduling for other disks, disable scheduling for small disk
Verify the volume reports scheduled = true. And verify the data.
Cleanup the volume.
Verify the restored volume reports scheduled = true.
Wait for the restored volume to complete restoration, then check data.

def test_replica_scheduler_too_large_volume_fit_any_disks(client)

Test replica scheduler: volume is too large to fit any disks

Disable all default disks on all nodes by setting storageReserved to maximum size
Create volume.
Verify the volume scheduled condition is false.
Reduce the storageReserved on all the disks to just enough for one replica.
The volume should automatically change scheduled condition to true
Attach the volume.
Make sure every replica landed on different nodes's default disk.

def test_replica_scheduler_update_minimal_available(client)

Test replica scheduler: update setting minimal available

Set setting minimal available to 100% (means no one can schedule)
Verify for all disks' schedulable condition to become false.
Create a volume. Verify it's unschedulable.
Set setting minimal available back to default setting
Disk should become schedulable now.
Volume should be scheduled now.
Attach the volume.
Make sure every replica landed on different nodes's default disk.

def test_replica_scheduler_update_over_provisioning(client)

Test replica scheduler: update overprovisioning setting

Set setting overprovisioning to 0. (disable all scheduling)
Create a new volume. Verify volume's scheduled condition is false.
Set setting over provisioning to 200%.
Verify volume's scheduled condition now become true.
Attach the volume.
Make sure every replica landed on different nodes's default disk.

def test_update_node(client)

Test update node scheduling

Get list of nodes
Update scheduling to false for current node
Read back to verify
Update scheduling to true for current node
Read back to verify

def wait_drain_complete(future, timeout, copmpleted=True)

Wait concurrent.futures object complete in a duration