Module tests.test_settings
Functions
def check_priority_class(pod, priority_class=None)
def check_tolerations_set(current_toleration_list, expected_tolerations, chk_removed_tolerations=[])
def check_workload_update(core_api, apps_api, count)
def config_map_with_value(configmap_name, setting_names, setting_values)
def guaranteed_instance_manager_cpu_setting_check(client, core_api, instance_managers, state, desire, cpu_val)
-
We check if instance managers are in the desired state with correct setting desire is for reflect the state we are looking for. If desire is True, meanning we need the state to be the same. Otherwise, we are looking for the state to be different. e.g. 'Pending', 'OutofCPU', 'Terminating' they are all 'Not Running'.
def init_longhorn_default_setting_configmap(core_api, client)
def retry_setting_update(client, setting_name, setting_value)
def setting_concurrent_volume_backup_restore_limit_concurrent_restoring_test(client, volname, is_DR_volumes=False)
-
Given Setting concurrent-volume-backup-restore-per-node-limit is 2. And Volume (for backup) created. And Volume (for backup) has backup with some data.
When Create some volumes (num_node * setting value * 3) from backup.
Then Number of restoring volumes per node should be expected based on if they are normal volumes or DR volumes.
def test_instance_manager_cpu_reservation(client, core_api)
-
Test if the CPU requests of instance manager pods are controlled by the settings and the node specs correctly.
- On node 1, set
node.instanceManagerCPURequest
to 150. –> The IM pods on this node will be restarted. And the CPU requests of these IM pods matches the above milli value. - Change the new setting
Guaranteed Instance Manager CPU
to 10, Then wait for all IM pods except for the pods on node 1 restarting. –> The CPU requests of the restarted IM pods equals to the new setting value multiply the kube node allocatable CPU. - Set the new settings to 0. –> All IM pods except for the pod on node 1 will be restarted without CPU requests.
- Set the fields on node 1 to 0. –> The IM pods on node 1 will be restarted without CPU requests.
- Set the new setting to a values smaller than 40. Then wait for all IM pods restarting. –> The CPU requests of all IM pods equals to the new setting value multiply the kube node allocatable CPU.
- Set the new setting to a value greater than 40. –> The setting update should fail.
- Create a volume, verify everything works as normal
Note: use fixture to restore the setting into the original state
- On node 1, set
def test_setting_backing_image_auto_cleanup(client, core_api, volume_name)
-
Test that the Backing Image Cleanup Wait Interval setting works correctly.
The default value of setting
BackingImageCleanupWaitInterval
is 60.- Clean up the backing image work directory so that the current case won't be intervened by previous tests.
- Create a backing image.
- Create multiple volumes using the backing image.
- Attach all volumes, Then:
- Wait for all volumes can become running.
- Verify the correct in all volumes.
- Verify the backing image disk status map.
- Verify the only backing image file in each disk is reused by
multiple replicas. The backing image file path is
<Data path>/<The backing image name>/backing
- Unschedule test node to guarantee when replica removed from test node, no new replica can be rebuilt on the test node.
- Remove all replicas in one disk. Wait for 50 seconds. Then verify nothing changes in the backing image disk state map (before the cleanup wait interval is passed).
- Modify
BackingImageCleanupWaitInterval
to a small value. Then verify:- The download state of the disk containing no replica becomes terminating first, and the entry will be removed from the map later.
- The related backing image file is removed.
- The download state of other disks keep unchanged. All volumes still work fine.
- Delete all volumes. Verify that there will only remain 1 entry in the backing image disk map
- Delete the backing image.
def test_setting_backup_target_update_via_configmap(core_api, request)
-
Test the backup target setting via configmap 1. Initialize longhorn-default-setting configmap 2. Update longhorn-default-setting configmap with a new backup-target value 3. Verify the updated settings
def test_setting_concurrent_rebuild_limit(client, core_api, volume_name)
-
Test if setting Concurrent Replica Rebuild Per Node Limit works correctly.
The default setting value is 0, which means no limit.
Case 1 - the setting will limit the rebuilding correctly: 1. Set
ConcurrentReplicaRebuildPerNodeLimit
to 1. 2. Create 2 volumes then attach both volumes. 3. Write a large amount of data into both volumes, so that the rebuilding will take a while. 4. Delete one replica for volume 1 then the replica on the same node for volume 2 to trigger (concurrent) rebuilding. 5. Verify the new replica of volume 2 won't be started until volume 1 rebuilding complete. And the new replica of volume 2 will be started immediately once the 1st rebuilding is done. 6. Wait for rebuilding complete then repeat step 4. 7. SetConcurrentReplicaRebuildPerNodeLimit
to 0 or 2 while the volume 1 rebuilding is still in progress. Then the new replica of volume 2 will be started immediately before the 1st rebuilding is done. 8. Wait for rebuilding complete then repeat step 4. 9. SetConcurrentReplicaRebuildPerNodeLimit
to 1 10. Crash the replica process of volume 1 while the rebuilding is in progress. Then the rebuilding of volume 2 will be started, and the rebuilding of volume 1 will wait for the volume 2 becoming healthy.(There is no need to clean up the above 2 volumes.)
Case 2 - the setting won't intervene normal attachment: 1. Set
ConcurrentReplicaRebuildPerNodeLimit
to 1. 2. Make volume 1 state attached and healthy while volume 2 is detached. 3. Delete one replica for volume 1 to trigger the rebuilding. 4. Attach then detach volume 2. The attachment/detachment should succeed even if the rebuilding in volume 1 is still in progress. def test_setting_concurrent_volume_backup_restore_limit(set_random_backupstore, client, volume_name)
-
Scenario: setting Concurrent Volume Backup Restore Limit should limit the concurrent volume backup restoring
Issue: https://github.com/longhorn/longhorn/issues/4558
Given/When see: setting_concurrent_volume_backup_restore_limit_concurrent_restoring_test
Then Number of restoring volumes per node not exceed the setting value.
def test_setting_concurrent_volume_backup_restore_limit_should_not_effect_dr_volumes(set_random_backupstore, client, volume_name)
-
Scenario: setting Concurrent Volume Backup Restore Limit should not effect DR volumes
Issue: https://github.com/longhorn/longhorn/issues/4558
Given/When see: setting_concurrent_volume_backup_restore_limit_concurrent_restoring_test
Then Number of restoring volumes can exceed the setting value.
def test_setting_priority_class(core_api, apps_api, scheduling_api, priority_class, volume_name)
-
Test that the Priority Class setting is validated and utilized correctly.
- Verify that the name of a non-existent Priority Class cannot be used for the Setting.
- Create a new Priority Class in Kubernetes.
- Create and attach a Volume.
- Verify that the Priority Class Setting can be updated with an attached volume.
- Generate and write
data1
. - Detach the Volume.
- Update the Priority Class Setting to the new Priority Class.
- Wait for all the Longhorn system components to restart with the new Priority Class.
- Verify that UI, manager, and drive deployer don't have Priority Class
- Attach the Volume and verify
data1
. - Generate and write
data2
. - Unset the Priority Class Setting.
- Wait for all the Longhorn system components to restart with the new Priority Class.
- Verify that UI, manager, and drive deployer don't have Priority Class
- Attach the Volume and verify
data2
. - Generate and write
data3
.
Note: system components are workloads other than UI, manager, driver deployer
def test_setting_replica_count_update_via_configmap(core_api, request)
-
Test the default-replica-count setting via configmap 1. Get default-replica-count value 2. Initialize longhorn-default-setting configmap 3. Verify default-replica-count is not changed 4. Update longhorn-default-setting configmap with a new default-replica-count value 5. Verify the updated settings 6. Update default-replica-count setting CR with the old value
def test_setting_toleration()
-
Test toleration setting
- Set
taint-toleration
to "key1=value1:NoSchedule; key2:InvalidEffect". - Verify the request fails.
- Create a volume and attach it.
- Set
taint-toleration
to "key1=value1:NoSchedule; key2:NoExecute". - Verify that can update toleration setting when any volume is attached.
- Generate and write
data1
into the volume. - Detach the volume.
- Set
taint-toleration
to "key1=value1:NoSchedule; key2:NoExecute". - Wait for all the Longhorn system components to restart with new toleration.
- Verify that UI, manager, and drive deployer don't restart and don't have new toleration.
- Attach the volume again and verify the volume
data1
. - Generate and write
data2
to the volume. - Detach the volume.
- Clean the
toleration
setting. - Wait for all the Longhorn system components to restart with no toleration.
- Attach the volume and validate
data2
. - Generate and write
data3
to the volume.
- Set
def test_setting_toleration_extra(core_api, apps_api)
-
Steps: 1. Set Kubernetes Taint Toleration to:
ex.com/foobar:NoExecute;ex.com/foobar:NoSchedule
. 2. Verify that all system components have the 2 tolerationsex.com/foobar:NoExecute; ex.com/foobar:NoSchedule
. Verify that UI, manager, and drive deployer don't restart and don't have toleration. 3. Set Kubernetes Taint Toleration to:node-role.kubernetes.io/controlplane=true:NoSchedule
. 4. Verify that all system components have the the tolerationnode-role.kubernetes.io/controlplane=true:NoSchedule
, and don't have the 2 tolerationsex.com/foobar:NoExecute;ex.com/foobar:NoSchedule
. Verify that UI, manager, and drive deployer don't restart and don't have toleration. 5. Set Kubernetes Taint Toleration to special value::
. 6. Verify that all system components have the toleration withoperator: Exists
and other field of the toleration are empty. Verify that all system components don't have the tolerationnode-role.kubernetes.io/controlplane=true:NoSchedule
. Verify that UI, manager, and drive deployer don't restart and don't have toleration. 7. Clear Kubernetes Taint TolerationNote: system components are workloads other than UI, manager, driver deployer
def test_setting_update_with_invalid_value_via_configmap(core_api, request)
-
Test the default settings update with invalid value via configmap 1. Create an attached volume 2. Initialize longhorn-default-setting configmap containing valid and invalid settings 3. Update longhorn-default-setting configmap with invalid settings. The invalid settings SETTING_TAINT_TOLERATION will be ignored when there is an attached volume. 4. Validate the default settings values.
def test_setting_v1_data_engine(client, request)
-
Test that the v1 data engine setting works correctly. 1. Create a volume and attach it. 2. Set v1 data engine setting to false. The setting should be rejected. 3. Detach the volume. 4. Set v1 data engine setting to false again. The setting should be accepted. Then, attach the volume. The volume is unable to attach. 5. set v1 data engine setting to true. The setting should be accepted. 6. Attach the volume.
def update_settings_via_configmap(core_api, client, setting_names, setting_values, request)
def validate_settings(core_api, client, setting_names, setting_values)
def wait_for_longhorn_node_ready()
def wait_for_priority_class_update(core_api, apps_api, count, priority_class=None)
def wait_for_setting_updated(client, name, expected_value)
def wait_for_toleration_update(core_api, apps_api, count, expected_tolerations, chk_removed_tolerations=[])