Module `tests.test_scheduling`

Functions

def get_host_replica(volume, host_id)

Get the replica of the volume that is running on the test host. Trigger a failed assertion if it can't be found. :param volume: The volume to get the replica from. :param host_id: The ID of the test host. :return: The replica hosted on the test host.

def prepare_for_affinity_tests(client, volume_name, request)

For 'test_global_disk_soft_anti_affinity' and 'test_volume_disk_soft_anti_affinity' use, they have identical the same preparation steps as below:

Given - One node has three disks - The three disks have very different sizes - Only two disks are available for scheduling - No other node is available for scheduling

def replica_auto_balance_with_data_locality_test(client, volume_name)

def reset_settings()

def test_allow_empty_disk_selector_volume_setting(client, volume_name)

Test the global setting allow-empty-disk-selector-volume

If true, a replica of the volume without disk selector can be scheduled on disk with tags.

If false, a replica of the volume without disk selector can not be scheduled on disk with tags.

Setup - Prepare 3 nodes each with one disk - Add AVAIL tag to every disk - Set allow-empty-disk-selector-volume to false

When - Create a Volume with 3 replicas without tag

Then - All replicas can not be scheduled to the disks on the nodes

When - Remove AVAIL tag from one of the node - Set allow-empty-disk-selector-volume to true

Then - Wait for a while for controller to resync the volume, all replicas can be scheduled to the disks on the nodes

def test_allow_empty_node_selector_volume_setting(client, volume_name)

Test the global setting allow-empty-node-selector-volume

If true, a replica of the volume without node selector can be scheduled on node with tags.

If false, a replica of the volume without node selector can not be scheduled on node with tags.

Setup - Prepare 3 nodes - Add AVAIL tag to nodes - Set allow-empty-node-selector-volume to false

When - Create a Volume with 3 replicas without tag

Then - All replicas can not be scheduled to the nodes

When - Remove AVAIL tag from one of the node - Set allow-empty-node-selector-volume to true

Then - Wait for a while for controller to resync the volume, all replicas can be scheduled to the nodes

def test_data_locality_basic(client, core_api, volume_name, pod, settings_reset)

Test data locality basic feature

Context:

Data Locality feature allows users to have an option to keep a local replica on the same node as the consuming pod. Longhorn is currently supporting 2 modes: - disabled: Longhorn does not try to keep a local replica - best-effort: Longhorn try to keep a local replica

See manual tests at: https://github.com/longhorn/longhorn/issues/1045#issuecomment-680706283

Steps:

Case 1: Test that Longhorn builds a local replica on the engine node

Create a volume(1) with 1 replica and dataLocality set to disabled
Find node where the replica is located on. Let's call the node is replica-node
Attach the volume to a node different than replica-node. Let call the node is engine-node
Write 200MB data to volume(1)
Use a retry loop to verify that Longhorn does not create a replica on the engine-node
Update dataLocality to best-effort for volume(1)
Use a retry loop to verify that Longhorn creates and rebuilds a replica on the engine-node and remove the other replica
detach the volume(1) and attach it to a different node. Let's call the new node is new-engine-node and the old node is old-engine-node
Wait for volume(1) to finish attaching
Use a retry loop to verify that Longhorn creates and rebuilds a replica on the new-engine-node and remove the replica on old-engine-node

Case 2: Test that Longhorn prioritizes deleting replicas on the same node

Add the tag AVAIL to node-1 and node-2
Set node soft anti-affinity to true.
Create a volume(2) with 3 replicas and dataLocality set to best-effort
Use a retry loop to verify that all 3 replicas are on node-1 and node-2, no replica is on node-3
Attach volume(2) to node-3
User a retry loop to verify that there is no replica on node-3 and we can still read/write to volume(2)
Find the node which contains 2 replicas. Let call the node is most-replica-node
Set the replica count to 2 for volume(2)
Verify that Longhorn remove one replica from most-replica-node

Case 3: Test that the volume is not corrupted if there is an unexpected detachment during building local replica

Remove the tag AVAIL from node-1 and node-2 Set node soft anti-affinity to false.
Create a volume(3) with 1 replicas and dataLocality set to best-effort
Attach volume(3) to node-3.
Use a retry loop to verify that volume(3) has only 1 replica on node-3
Write 2GB data to volume(3)
Detach volume(3)
Attach volume(3) to node-1
Use a retry loop to: Wait until volume(3) finishes attaching. Wait until Longhorn start rebuilding a replica on node-1 Immediately detach volume(3)
Verify that the replica on node-1 is in ERR state.
Attach volume(3) to node-1
Wait until volume(3) finishes attaching.
Use a retry loop to verify the Longhorn cleanup the ERR replica, rebuild a new replica on node-1, and remove the replica on node-3

Case 4: Make sure failed to schedule local replica doesn't block the the creation of other replicas.

Disable scheduling for node-3
Create a vol with 1 replica, dataLocality = best-effort. The replica is scheduled on a node (say node-1)
Attach vol to node-3. There is a fail-to-schedule replica with Spec.HardNodeAffinity=node-3
Increase numberOfReplica to 3. Verify that the replica set contains: one on node-1, one on node-2, one failed replica with Spec.HardNodeAffinity=node-3.
Decrease numberOfReplica to 2. Verify that the replica set contains: one on node-1, one on node-2, one failed replica with Spec.HardNodeAffinity=node-3.
Decrease numberOfReplica to 1. Verify that the replica set contains: one on node-1 or node-2, one failed replica with Spec.HardNodeAffinity=node-3.
Decrease numberOfReplica to 2. Verify that the replica set contains: one on node-1, one on node-2, one failed replica with Spec.HardNodeAffinity=node-3.
Turn off data locality by set dataLocality=disabled for the vol. Verify that the replica set contains: one on node-1, one on node-2
clean up

def test_data_locality_strict_local_node_affinity(client, core_api, apps_api, storage_class, statefulset, request)

Scenario: data-locality (strict-local) should schedule Pod to the same node

Issue: https://github.com/longhorn/longhorn/issues/5448

Given a StorageClass (lh-test) has dataLocality (strict-local) And the StorageClass (lh-test) has numberOfReplicas (1) And the StorageClass (lh-test) exists

And a StatefulSet (test-1) has StorageClass (lh-test) And the StatefulSet (test-1) created. And the StatefulSet (test-1) all Pods are in state (healthy). And the StatefulSet (test-1) is deleted. And the StatefulSet (test-1) PVC exists.

And a StatefulSet (test-2) has StorageClass (lh-test) And the StatefulSet (test-2) has replicas (2) And the StatefulSet (test-2) created. And the StatefulSet (test-2) all Pods is in state (healthy).

When the StatefulSet (test-1) created. Then the StatefulSet (test-1) all Pods are in state (healthy).

def test_global_disk_soft_anti_affinity(client, volume_name, request)

When Replica Disk Soft Anti-Affinity is false, it should be impossible to schedule replicas to the same disk.
When Replica Disk Soft Anti-Affinity is true, it should be possible to schedule replicas to the same disk.
Whether or not Replica Disk Soft Anti-Affinity is true or false, the scheduler should prioritize scheduling replicas to different disks.

Given - One node has three disks - The three disks have very different sizes - Only two disks are available for scheduling - No other node is available for scheduling

When - Global Replica Node Level Soft Anti-Affinity is true - Global Replica Zone Level Soft Anti-Affinity is true - Global Replica Disk Level Soft Anti-Affinity is false - Create a volume with three replicas and a size such that all replicas could fit on the largest disk and still leave it with the most available space - Attach the volume to the schedulable node

Then - Verify the volume is in a degraded state - Verify only two of the three replicas are healthy - Verify the remaining replica doesn't have a spec.nodeID

When - Change the global Replica Disk Level Soft Anti-Affinity to true

Then - Verify the volume is in a healthy state - Verify all three replicas are healthy (two replicas have the same spec.diskID)

When - Enable scheduling on the third disk - Delete one of the two replicas with the same spec.diskID

Then - Verify the volume is in a healthy state - Verify all three replicas are healthy - Verify all three replicas have a different spec.diskID

def test_hard_anti_affinity_detach(client, volume_name)

Test that volumes with Hard Anti-Affinity are still able to detach and reattach to a node properly, even in degraded state.

Create a volume and attach to the current node
Generate and write data to the volume.
Set soft anti-affinity to false
Disable current node's scheduling.
Remove the replica on the current node
1. Verify volume will be in degraded state.
2. Verify volume reports condition scheduled == false
Detach the volume.
Verify that volume only have 2 replicas
1. Unhealthy replica will be removed upon detach.
Attach the volume again.
1. Verify volume will be in degraded state.
2. Verify volume reports condition scheduled == false
3. Verify only two replicas of volume are healthy.
Check volume data

def test_hard_anti_affinity_live_rebuild(client, volume_name)

Test that volumes with Hard Anti-Affinity can build new replicas live once a valid node is available.

If no nodes without existing replicas are available, the volume should remain in "Degraded" state. However, once one is available, the replica should now be scheduled successfully, with the volume returning to "Healthy" state.

Create a volume and attach to the current node
Generate and write data to the volume.
Set soft anti-affinity to false
Disable current node's scheduling.
Remove the replica on the current node
1. Verify volume will be in degraded state.
2. Verify volume reports condition scheduled == false
Enable the current node's scheduling
Wait for volume to start rebuilding and become healthy again
Check volume data

def test_hard_anti_affinity_offline_rebuild(client, volume_name)

Test that volumes with Hard Anti-Affinity can build new replicas during the attaching process once a valid node is available.

Once a new replica has been built as part of the attaching process, the volume should be Healthy again.

Create a volume and attach to the current node
Generate and write data to the volume.
Set soft anti-affinity to false
Disable current node's scheduling.
Remove the replica on the current node
1. Verify volume will be in degraded state.
2. Verify volume reports condition scheduled == false
Detach the volume.
Enable current node's scheduling.
Attach the volume again.
Wait for volume to become healthy with 3 replicas
Check volume data

def test_hard_anti_affinity_scheduling(client, volume_name)

Test that volumes with Hard Anti-Affinity work as expected.

With Hard Anti-Affinity, scheduling on nodes with existing replicas should be forbidden, resulting in "Degraded" state.

Create a volume and attach to the current node
Generate and write data to the volume.
Set soft anti-affinity to false
Disable current node's scheduling.
Remove the replica on the current node
1. Verify volume will be in degraded state.
2. Verify volume reports condition scheduled == false
3. Verify only two replicas of volume are healthy.
Check volume data

def test_replica_auto_balance_disabled_volume_spec_enabled(client, volume_name)

Scenario: replica should auto-balance individual volume when global setting replica-auto-balance is disabled and volume spec replicaAutoBalance is least_effort.

Given set replica-soft-anti-affinity to true. And set replica-auto-balance to disabled. And disable scheduling for node-2. disable scheduling for node-3. And create volume-1 with 3 replicas. create volume-2 with 3 replicas. And set volume-2 spec replicaAutoBalance to least-effort. And attach volume-1 to self-node. attach volume-2 to self-node. And wait for volume-1 to be healthy. wait for volume-2 to be healthy. And count volume-1 replicas running on each node. And 3 replicas running on node-1. 0 replicas running on node-2. 0 replicas running on node-3. And count volume-2 replicas running on each node. And 3 replicas running on node-1. 0 replicas running on node-2. 0 replicas running on node-3. And write some data to volume-1. write some data to volume-2.

When enable scheduling for node-2. enable scheduling for node-3.

Then count volume-1 replicas running on each node. And 3 replicas running on node-1. 0 replicas running on node-2. 0 replicas running on node-3. And count volume-2 replicas running on each node. And 1 replicas running on node-1. 1 replicas running on node-2. 1 replicas running on node-3. And volume-1 data should be the same as written. And volume-2 data should be the same as written.

def test_replica_auto_balance_disk_in_pressure(client, core_api, apps_api, volume_name, statefulset, storage_class)

Scenario: Test replica auto balance disk in pressure

Description: This test simulates a scenario where a disk reaches a certain pressure threshold (80%), triggering the replica auto balance to rebuild the replicas to another disk with enough available space. Replicas should not be rebuilted at the same time.

Issue: https://github.com/longhorn/longhorn/issues/4105

Given setting "replica-soft-anti-affinity" is "false" And setting "replica-auto-balance-disk-pressure-percentage" is "80" And new 1Gi disk 1 is created on self node new 1Gi disk 2 is created on self node And disk scheduling is disabled for disk 1 on self node disk scheduling is disabled for default disk on self node And node scheduling is disabled for all nodes except self node And new storageclass is created with numberOfReplicas: 1 And statefulset 0 is created with 1 replicaset statefulset 1 is created with 1 replicaset statefulset 2 is created with 1 replicaset And all statefulset volume replicas are scheduled on disk 1 And data is written to all statefulset volumes until disk 1 is pressured And disk 1 pressure is exceeded threshold (80%)

When enable disk scheduling for disk 1 on self node And update setting "replica-auto-balance" to "best-effort"

Then at least 1 replicas should be rebuilt on disk 2 And at least 1 replica should not be rebuilt on disk 2 And disk 1 should be below disk pressure threshold (80%) And all statefulset volume data should be intact

def test_replica_auto_balance_node_best_effort(client, volume_name)

Scenario: replica auto-balance nodes with best_effort.

Given set replica-soft-anti-affinity to true. And set replica-auto-balance to best_effort. And disable scheduling for node-2. disable scheduling for node-3. And create a volume with 6 replicas. And attach the volume to self-node. And wait for the volume to be healthy. And write some data to the volume. And count replicas running on each node. And 6 replicas running on node-1. 0 replicas running on node-2. 0 replicas running on node-3.

When enable scheduling for node-2. And count replicas running on each node. Then 3 replicas running on node-1. 3 replicas running on node-2. 0 replicas running on node-3. And loop 3 times with each wait 5 seconds and count replicas on each nodes. To ensure no addition scheduling is happening. 3 replicas running on node-1. 3 replicas running on node-2. 0 replicas running on node-3.

When enable scheduling for node-3. And count replicas running on each node. Then 2 replicas running on node-1. 2 replicas running on node-2. 2 replicas running on node-3. And loop 3 times with each wait 5 seconds and count replicas on each nodes. To ensure no addition scheduling is happening. 2 replicas running on node-1. 2 replicas running on node-2. 2 replicas running on node-3.

When check the volume data. And volume data should be the same as written.

def test_replica_auto_balance_node_least_effort(client, volume_name)

Scenario: replica auto-balance nodes with least_effort.

Given set replica-soft-anti-affinity to true. And set replica-auto-balance to least_effort. And disable scheduling for node-2. disable scheduling for node-3. And create a volume with 6 replicas. And attach the volume to self-node. And wait for the volume to be healthy. And write some data to the volume. And count replicas running on each nodes. And 6 replicas running on node-1. 0 replicas running on node-2. 0 replicas running on node-3.

When enable scheduling for node-2. Then count replicas running on each nodes. And node-1 replica count != node-2 replica count. node-2 replica count != 0. node-3 replica count == 0. And loop 3 times with each wait 5 seconds and count replicas on each nodes. To ensure no addition scheduling is happening. The number of replicas running should be the same.

When enable scheduling for node-3. And count replicas running on each nodes. And node-1 replica count != node-3 replica count. node-2 replica count != 0. node-3 replica count != 0. And loop 3 times with each wait 5 seconds and count replicas on each nodes. To ensure no addition scheduling is happening. The number of replicas running should be the same.

When check the volume data. And volume data should be the same as written.

def test_replica_auto_balance_with_data_locality(client, volume_name)

Scenario: replica auto-balance should not cause rebuild loop. - replica auto-balance set to best-effort - volume data locality set to best-effort - volume has 1 replica

Issue: https://github.com/longhorn/longhorn/issues/4761

Given no existing volume in the cluster. And set replica-auto-balance to best-effort. And create a volume: - set data locality to best-effort - 1 replica

When attach the volume to self-node. And wait for the volume to be healthy. Then the only volume replica should be already on the self-node or get rebuilt one time onto the self-node. And volume have 1 replica only and it should be on the self-node. - check 15 times with 1 second wait interval

When repeat the test for 10 times. Then should pass.

def test_replica_rebuild_per_volume_limit(client, core_api, storage_class, sts_name, statefulset)

Test the volume always only have one replica scheduled for rebuild

Set soft anti-affinity to true.
Create a volume with 1 replica.
Attach the volume and write a few hundreds MB data to it.
Scale the volume replica to 5.
Constantly checking the volume replica list to make sure there should be only 1 replica in WO state.
Wait for the volume to complete rebuilding. Then remove 4 of the 5 replicas.
Monitoring the volume replica list again.
Once the rebuild was completed again, verify the data checksum.

def test_replica_schedule_to_disk_with_most_usable_storage(client, volume_name, request)

Scenario : test replica schedule to disk with the most usable storage

Given default disk 3/4 storage is reserved on the current node. And disk-1 with 1/4 of default disk space + 10 Gi. And add disk-1 to the current node.

When create and attach volume.

Then volume replica on the current node scheduled to disk-1. volume replica not on the current node scheduled to default disk.

def test_soft_anti_affinity_detach(client, volume_name)

Test that volumes with Soft Anti-Affinity can detach and reattach to a node properly.

Create a volume and attach to the current node.
Generate and write data to the volume
Set soft anti-affinity to true
Disable current node's scheduling.
Remove the replica on the current node
Wait for the new replica to be rebuilt
Detach the volume.
Verify there are 3 replicas
Attach the volume again. Verify there are still 3 replicas
Verify the data.

def test_soft_anti_affinity_scheduling(client, volume_name)

Test that volumes with Soft Anti-Affinity work as expected.

With Soft Anti-Affinity, a new replica should still be scheduled on a node with an existing replica, which will result in "Healthy" state but limited redundancy.

Create a volume and attach to the current node
Generate and write data to the volume.
Set soft anti-affinity to true
Disable current node's scheduling.
Remove the replica on the current node
Wait for the volume to complete rebuild. Volume should have 3 replicas.
Verify data

def test_soft_anti_affinity_scheduling_volume_disable(client, volume_name)

Test the global setting will be overwrite if the volume disable the Soft Anti-Affinity

With Soft Anti-Affinity disabled, scheduling on nodes with existing replicas should be forbidden, resulting in "Degraded" state.

Setup - Enable Soft Anti-Affinity in global setting

Given - Create a volume with replicaSoftAntiAffinity=disabled in the spec - Attach to the current node and Generate and write data to the volume

When - Disable current node's scheduling. - Remove the replica on the current node

Then - Verify volume will be in degraded state. - Verify volume reports condition scheduled == false - Verify only two of volume are healthy. - Check volume data

def test_soft_anti_affinity_scheduling_volume_enable(client, volume_name)

Test the global setting will be overwrite if the volume enable the Soft Anti-Affinity

With Soft Anti-Affinity, a new replica should still be scheduled on a node with an existing replica, which will result in "Healthy" state but limited redundancy.

Setup - Disable Soft Anti-Affinity in global setting

Given - Create a volume with replicaSoftAntiAffinity=enabled in the spec - Attach to the current node and Generate and write data to the volume

When - Disable current node's scheduling. - Remove the replica on the current node

Then - Wait for the volume to complete rebuild. Volume should have 3 replicas. - Verify data

def test_volume_disk_soft_anti_affinity(client, volume_name, request)

When Replica Disk Soft Anti-Affinity is disabled, it should be impossible to schedule replicas to the same disk.
When Replica Disk Soft Anti-Affinity is enabled, it should be possible to schedule replicas to the same disk.
Whether or not Replica Disk Soft Anti-Affinity is enabled or disabled, the scheduler should prioritize scheduling replicas to different disks.

Given - One node has three disks - The three disks have very different sizes - Only two disks are available for scheduling - No other node is available for scheduling

When - Global Replica Node Level Soft Anti-Affinity is true - Global Replica Zone Level Soft Anti-Affinity is true - Create a volume with three replicas, a size such that all replicas could fit on the largest disk and still leave it with the most available space, and spec.replicaDiskSoftAntiAffinity = disabled - Attach the volume to the schedulable node

Then - Verify the volume is in a degraded state - Verify only two of the three replicas are healthy - Verify the remaining replica doesn't have a spec.nodeID

When - Change the volume's spec.replicaDiskSoftAntiAffinity to enabled

Then - Verify the volume is in a healthy state - Verify all three replicas are healthy (two replicas have the same spec.diskID)

When - Enable scheduling on the third disk - Delete one of the two replicas with the same spec.diskID

Then - Verify the volume is in a healthy state - Verify all three replicas are healthy - Verify all three replicas have a different diskID

def wait_new_replica_ready(client, volume_name, replica_names)

Wait for a new replica to be found on the specified volume. Trigger a failed assertion if one can't be found. :param client: The Longhorn client to use in the request. :param volume_name: The name of the volume. :param replica_names: The list of names of the volume's old replicas.