Module tests.test_settings

Functions

def check_priority_class(pod, priority_class=None)
def check_tolerations_set(current_toleration_list, expected_tolerations, chk_removed_tolerations=[])
def check_workload_update(core_api, apps_api, count)
def config_map_with_value(configmap_name, setting_names, setting_values)
def guaranteed_instance_manager_cpu_setting_check(client, core_api, instance_managers, state, desire, cpu_val)

We check if instance managers are in the desired state with correct setting desire is for reflect the state we are looking for. If desire is True, meanning we need the state to be the same. Otherwise, we are looking for the state to be different. e.g. 'Pending', 'OutofCPU', 'Terminating' they are all 'Not Running'.

def init_longhorn_default_setting_configmap(core_api, client)
def retry_setting_update(client, setting_name, setting_value)
def setting_concurrent_volume_backup_restore_limit_concurrent_restoring_test(client, volname, is_DR_volumes=False)

Given Setting concurrent-volume-backup-restore-per-node-limit is 2. And Volume (for backup) created. And Volume (for backup) has backup with some data.

When Create some volumes (num_node * setting value * 3) from backup.

Then Number of restoring volumes per node should be expected based on if they are normal volumes or DR volumes.

def test_instance_manager_cpu_reservation(client, core_api)

Test if the CPU requests of instance manager pods are controlled by the settings and the node specs correctly.

  1. On node 1, set node.instanceManagerCPURequest to 150. –> The IM pods on this node will be restarted. And the CPU requests of these IM pods matches the above milli value.
  2. Change the new setting Guaranteed Instance Manager CPU to 10, Then wait for all IM pods except for the pods on node 1 restarting. –> The CPU requests of the restarted IM pods equals to the new setting value multiply the kube node allocatable CPU.
  3. Set the new settings to 0. –> All IM pods except for the pod on node 1 will be restarted without CPU requests.
  4. Set the fields on node 1 to 0. –> The IM pods on node 1 will be restarted without CPU requests.
  5. Set the new setting to a values smaller than 40. Then wait for all IM pods restarting. –> The CPU requests of all IM pods equals to the new setting value multiply the kube node allocatable CPU.
  6. Set the new setting to a value greater than 40. –> The setting update should fail.
  7. Create a volume, verify everything works as normal

Note: use fixture to restore the setting into the original state

def test_setting_backing_image_auto_cleanup(client, core_api, volume_name)

Test that the Backing Image Cleanup Wait Interval setting works correctly.

The default value of setting BackingImageCleanupWaitInterval is 60.

  1. Clean up the backing image work directory so that the current case won't be intervened by previous tests.
  2. Create a backing image.
  3. Create multiple volumes using the backing image.
  4. Attach all volumes, Then:
    1. Wait for all volumes can become running.
    2. Verify the correct in all volumes.
    3. Verify the backing image disk status map.
    4. Verify the only backing image file in each disk is reused by multiple replicas. The backing image file path is <Data path>/<The backing image name>/backing
  5. Unschedule test node to guarantee when replica removed from test node, no new replica can be rebuilt on the test node.
  6. Remove all replicas in one disk. Wait for 50 seconds. Then verify nothing changes in the backing image disk state map (before the cleanup wait interval is passed).
  7. Modify BackingImageCleanupWaitInterval to a small value. Then verify:
    1. The download state of the disk containing no replica becomes terminating first, and the entry will be removed from the map later.
    2. The related backing image file is removed.
    3. The download state of other disks keep unchanged. All volumes still work fine.
  8. Delete all volumes. Verify that there will only remain 1 entry in the backing image disk map
  9. Delete the backing image.
def test_setting_backup_target_update_via_configmap(core_api, request)

Test the backup target setting via configmap 1. Initialize longhorn-default-setting configmap 2. Update longhorn-default-setting configmap with a new backup-target value 3. Verify the updated settings

def test_setting_concurrent_rebuild_limit(client, core_api, volume_name)

Test if setting Concurrent Replica Rebuild Per Node Limit works correctly.

The default setting value is 0, which means no limit.

Case 1 - the setting will limit the rebuilding correctly: 1. Set ConcurrentReplicaRebuildPerNodeLimit to 1. 2. Create 2 volumes then attach both volumes. 3. Write a large amount of data into both volumes, so that the rebuilding will take a while. 4. Delete one replica for volume 1 then the replica on the same node for volume 2 to trigger (concurrent) rebuilding. 5. Verify the new replica of volume 2 won't be started until volume 1 rebuilding complete. And the new replica of volume 2 will be started immediately once the 1st rebuilding is done. 6. Wait for rebuilding complete then repeat step 4. 7. Set ConcurrentReplicaRebuildPerNodeLimit to 0 or 2 while the volume 1 rebuilding is still in progress. Then the new replica of volume 2 will be started immediately before the 1st rebuilding is done. 8. Wait for rebuilding complete then repeat step 4. 9. Set ConcurrentReplicaRebuildPerNodeLimit to 1 10. Crash the replica process of volume 1 while the rebuilding is in progress. Then the rebuilding of volume 2 will be started, and the rebuilding of volume 1 will wait for the volume 2 becoming healthy.

(There is no need to clean up the above 2 volumes.)

Case 2 - the setting won't intervene normal attachment: 1. Set ConcurrentReplicaRebuildPerNodeLimit to 1. 2. Make volume 1 state attached and healthy while volume 2 is detached. 3. Delete one replica for volume 1 to trigger the rebuilding. 4. Attach then detach volume 2. The attachment/detachment should succeed even if the rebuilding in volume 1 is still in progress.

def test_setting_concurrent_volume_backup_restore_limit(set_random_backupstore, client, volume_name)

Scenario: setting Concurrent Volume Backup Restore Limit should limit the concurrent volume backup restoring

Issue: https://github.com/longhorn/longhorn/issues/4558

Given/When see: setting_concurrent_volume_backup_restore_limit_concurrent_restoring_test

Then Number of restoring volumes per node not exceed the setting value.

def test_setting_concurrent_volume_backup_restore_limit_should_not_effect_dr_volumes(set_random_backupstore, client, volume_name)

Scenario: setting Concurrent Volume Backup Restore Limit should not effect DR volumes

Issue: https://github.com/longhorn/longhorn/issues/4558

Given/When see: setting_concurrent_volume_backup_restore_limit_concurrent_restoring_test

Then Number of restoring volumes can exceed the setting value.

def test_setting_priority_class(core_api, apps_api, scheduling_api, priority_class, volume_name)

Test that the Priority Class setting is validated and utilized correctly.

  1. Verify that the name of a non-existent Priority Class cannot be used for the Setting.
  2. Create a new Priority Class in Kubernetes.
  3. Create and attach a Volume.
  4. Verify that the Priority Class Setting can be updated with an attached volume.
  5. Generate and write data1.
  6. Detach the Volume.
  7. Update the Priority Class Setting to the new Priority Class.
  8. Wait for all the Longhorn system components to restart with the new Priority Class.
  9. Verify that UI, manager, and drive deployer don't have Priority Class
  10. Attach the Volume and verify data1.
  11. Generate and write data2.
  12. Unset the Priority Class Setting.
  13. Wait for all the Longhorn system components to restart with the new Priority Class.
  14. Verify that UI, manager, and drive deployer don't have Priority Class
  15. Attach the Volume and verify data2.
  16. Generate and write data3.

Note: system components are workloads other than UI, manager, driver deployer

def test_setting_replica_count_update_via_configmap(core_api, request)

Test the default-replica-count setting via configmap 1. Get default-replica-count value 2. Initialize longhorn-default-setting configmap 3. Verify default-replica-count is not changed 4. Update longhorn-default-setting configmap with a new default-replica-count value 5. Verify the updated settings 6. Update default-replica-count setting CR with the old value

def test_setting_toleration()

Test toleration setting

  1. Set taint-toleration to "key1=value1:NoSchedule; key2:InvalidEffect".
  2. Verify the request fails.
  3. Create a volume and attach it.
  4. Set taint-toleration to "key1=value1:NoSchedule; key2:NoExecute".
  5. Verify that can update toleration setting when any volume is attached.
  6. Generate and write data1 into the volume.
  7. Detach the volume.
  8. Set taint-toleration to "key1=value1:NoSchedule; key2:NoExecute".
  9. Wait for all the Longhorn system components to restart with new toleration.
  10. Verify that UI, manager, and drive deployer don't restart and don't have new toleration.
  11. Attach the volume again and verify the volume data1.
  12. Generate and write data2 to the volume.
  13. Detach the volume.
  14. Clean the toleration setting.
  15. Wait for all the Longhorn system components to restart with no toleration.
  16. Attach the volume and validate data2.
  17. Generate and write data3 to the volume.
def test_setting_toleration_extra(core_api, apps_api)

Steps: 1. Set Kubernetes Taint Toleration to: ex.com/foobar:NoExecute;ex.com/foobar:NoSchedule. 2. Verify that all system components have the 2 tolerations ex.com/foobar:NoExecute; ex.com/foobar:NoSchedule. Verify that UI, manager, and drive deployer don't restart and don't have toleration. 3. Set Kubernetes Taint Toleration to: node-role.kubernetes.io/controlplane=true:NoSchedule. 4. Verify that all system components have the the toleration node-role.kubernetes.io/controlplane=true:NoSchedule, and don't have the 2 tolerations ex.com/foobar:NoExecute;ex.com/foobar:NoSchedule. Verify that UI, manager, and drive deployer don't restart and don't have toleration. 5. Set Kubernetes Taint Toleration to special value: :. 6. Verify that all system components have the toleration with operator: Exists and other field of the toleration are empty. Verify that all system components don't have the toleration node-role.kubernetes.io/controlplane=true:NoSchedule. Verify that UI, manager, and drive deployer don't restart and don't have toleration. 7. Clear Kubernetes Taint Toleration

Note: system components are workloads other than UI, manager, driver deployer

def test_setting_update_with_invalid_value_via_configmap(core_api, request)

Test the default settings update with invalid value via configmap 1. Create an attached volume 2. Initialize longhorn-default-setting configmap containing valid and invalid settings 3. Update longhorn-default-setting configmap with invalid settings. The invalid settings SETTING_TAINT_TOLERATION will be ignored when there is an attached volume. 4. Validate the default settings values.

def test_setting_v1_data_engine(client, request)

Test that the v1 data engine setting works correctly. 1. Create a volume and attach it. 2. Set v1 data engine setting to false. The setting should be rejected. 3. Detach the volume. 4. Set v1 data engine setting to false again. The setting should be accepted. Then, attach the volume. The volume is unable to attach. 5. set v1 data engine setting to true. The setting should be accepted. 6. Attach the volume.

def update_settings_via_configmap(core_api, client, setting_names, setting_values, request)
def validate_settings(core_api, client, setting_names, setting_values)
def wait_for_longhorn_node_ready()
def wait_for_priority_class_update(core_api, apps_api, count, priority_class=None)
def wait_for_setting_updated(client, name, expected_value)
def wait_for_toleration_update(core_api, apps_api, count, expected_tolerations, chk_removed_tolerations=[])