Module tests.test_basic
Functions
def backup_failed_cleanup(client, core_api, volume_name, volume_size, failed_backup_ttl='3')
-
Setup the failed backup cleanup
def backup_labels_test(client, random_labels, volume_name, size='16777216', backing_image='')
def backup_test(client, volume_name, size, backing_image='', compression_method='lz4')
def backupstore_test(client, host_id, volname, size, compression_method)
def check_volume_and_snapshot_after_corrupting_volume_metadata_file(client, core_api, volume_name, pod, test_pod_name, data_path1, data_md5sum1, data_path2, snap)
-
Test volume I/O and take/delete a snapshot
def prepare_data_volume_metafile(client, core_api, volume_name, csi_pv, pvc, pod, pod_make, data_path, test_writing_data=False, writing_data_path='/data/writing_data_file')
-
Prepare volume and snapshot for volume metafile testing
Setup:
- Create a pod using Longhorn volume
- Write some data to the volume then get the md5sum
- Create a snapshot
- Delete the pod and wait for the volume detached
- Pick up a replica on this host and get the replica data path
def prepare_space_usage_for_rebuilding_only_volume(client)
-
- Create a 7Gi volume and attach to the node.
- Make a filesystem then mount this volume.
- Make this volume as a disk of the node, and disable the scheduling for the default disk.
def restore_inc_test(client, core_api, volume_name, pod)
def snapshot_prune_and_coalesce_simultaneously(client, volume_name, backing_image)
def snapshot_prune_test(client, volume_name, backing_image)
def snapshot_test(client, volume_name, backing_image)
def test_allow_volume_creation_with_degraded_availability(client, volume_name)
-
Test Allow Volume Creation with Degraded Availability (API)
Requirement: 1. Set
allow-volume-creation-with-degraded-availability
to true. 2.node-level-soft-anti-affinity
to false.Steps: (degraded availability) 1. Disable scheduling for node 2 and 3. 2. Create a volume with three replicas. 1. Volume should be
ready
after creation andScheduled
is true. 2. One replica schedule succeed. Two other replicas failed scheduling. 3. Enable the scheduling of node 2. 1. One additional replica of the volume will become scheduled. 2. The other replica is still failed to schedule. 3. Scheduled condition is still true. 4. Attach the volume. 1. After the volume is attached, scheduled condition become false. 5. Write data to the volume. 6. Detach the volume. 1. Scheduled condition should become true. 7. Reattach the volume to verify the data. 1. Scheduled condition should become false. 8. Enable the scheduling for the node 3. 9. Wait for the scheduling condition to become true. 10. Detach and reattach the volume to verify the data. def test_allow_volume_creation_with_degraded_availability_dr(set_random_backupstore, client, core_api, volume_name, csi_pv, pvc, pod, pod_make)
-
Test Allow Volume Creation with Degraded Availability (Restore)
Requirement: 1. Set
allow-volume-creation-with-degraded-availability
to true. 2.node-level-soft-anti-affinity
to false. 3. Create a backup of 800MB.Steps: (DR volume) 1. Disable scheduling for node 2 and 3. 2. Create a DR volume from backup with 3 replicas. 1. The scheduled condition is false. 2. Only node 1 replica become scheduled. 3. Enable scheduling for node 2 and 3. 1. Replicas scheduling to node 1, 2, 3 success. 2. Wait for restore progress to complete. 3. The scheduled condition becomes true. 4. Activate, attach the volume, and verify the data.
def test_allow_volume_creation_with_degraded_availability_error(client, volume_name)
-
Test Allow Volume Creation with Degraded Availability (API)
Requirement: 1. Set
allow-volume-creation-with-degraded-availability
to true. 2.node-level-soft-anti-affinity
to false.Steps: (no availability) 1. Disable all nodes' scheduling. 2. Create a volume with three replicas. 1. Volume should be NotReady after creation. 2. Scheduled condition should become false. 3. Attaching the volume should result in error. 4. Enable one node's scheduling. 1. Volume should become Ready soon. 2. Scheduled condition should become true. 5. Attach the volume. Write data. Detach and reattach to verify the data.
def test_allow_volume_creation_with_degraded_availability_restore(set_random_backupstore, client, core_api, volume_name, csi_pv, pvc, pod, pod_make)
-
Test Allow Volume Creation with Degraded Availability (Restore)
Requirement: 1. Set
allow-volume-creation-with-degraded-availability
to true. 2.node-level-soft-anti-affinity
to false. 3.replica-replenishment-wait-interval
to 0. 4. Create a backup of 800MB.Steps: (restore) 1. Disable scheduling for node 2 and 3. 2. Restore a volume with 3 replicas. 1. The scheduled condition is true. 2. Only node 1 replica become scheduled. 3. Enable scheduling for node 2. 4. Wait for the restore to complete and volume detach automatically. Then check the scheduled condition still true. 5. Attach and wait for the volume. 1. 2 Replicas successfully scheduled to node 1 and 2. 1 Replica cannot be created due to node 3 is unscheduled. 2. The scheduled condition becomes false. 3. Verify the data.
def test_attach_without_frontend(client, volume_name)
-
Test attach in maintenance mode (without frontend)
- Create a volume and attach to the current node with enabled frontend
- Check volume has
blockdev
- Write
snap1_data
into volume and create snapshotsnap1
- Write more random data into volume and create another anspshot
- Detach the volume and reattach with disabled frontend
- Check volume still has
blockdev
as frontend but no endpoint - Revert back to
snap1
- Detach and reattach the volume with enabled frontend
- Check volume contains data
snap1_data
def test_aws_iam_role_arn(client, core_api)
-
Test AWS IAM Role ARN
- Set backup target to S3
- Check longhorn manager and aio instance manager Pods without 'iam.amazonaws.com/role' annotation
- Add AWS_IAM_ROLE_ARN to secret
- Check longhorn manager and aio instance manager Pods with 'iam.amazonaws.com/role' annotation and matches to AWS_IAM_ROLE_ARN in secret
- Update AWS_IAM_ROLE_ARN from secret
- Check longhorn manager and aio instance manager Pods with 'iam.amazonaws.com/role' annotation and matches to AWS_IAM_ROLE_ARN in secret
- Remove AWS_IAM_ROLE_ARN from secret
- Check longhorn manager and aio instance manager Pods without 'iam.amazonaws.com/role' annotation
def test_backup(set_random_backupstore, client, volume_name)
-
Test basic backup
Setup:
- Create a volume and attach to the current node
- Run the test for all the available backupstores.
Steps:
- Create a backup of volume
- Restore the backup to a new volume
- Attach the new volume and make sure the data is the same as the old one
- Detach the volume and delete the backup.
- Wait for the restored volume's
lastBackup
to be cleaned (due to remove the backup) - Delete the volume
def test_backup_block_deletion(set_random_backupstore, client, core_api, volume_name)
-
Test backup block deletion
Context:
We want to make sure that we only delete non referenced backup blocks, we also don't want to delete blocks while there other backups in progress. The reason for this is that we don't yet know which blocks are required by the in progress backup, so blocks deletion could lead to a faulty backup.
Setup:
- Setup minio as S3 backupstore
Steps:
- Create a volume and attach to the current node
- Write 4 MB to the beginning of the volume (2 x 2MB backup blocks)
- Create backup(1) of the volume
- Overwrite the first of the backup blocks of data on the volume
- Create backup(2) of the volume
- Overwrite the first of the backup blocks of data on the volume
- Create backup(3) of the volume
- Verify backup block count == 4 assert volume["DataStored"] == str(BLOCK_SIZE * expected_count) assert count of *.blk files for that volume == expected_count
- Create an artificial in progress backup.cfg file json.dumps({"Name": name, "VolumeName": volume, "CreatedTime": ""})
- Delete backup(2)
- Verify backup block count == 4 (because of the in progress backup)
- Delete the artificial in progress backup.cfg file
- Delete backup(1)
- Verify backup block count == 2
- Delete backup(3)
- Verify backup block count == 0
- Delete the backup volume
- Cleanup the volume
def test_backup_failed_disable_auto_cleanup(set_random_backupstore, client, core_api, volume_name)
-
Test the failed backup would be automatically deleted.
- Set the default setting
backupstore-poll-interval
to 60 (seconds) - Set the default setting
failed-backup-ttl
to 0 - Create a volume and attach to the current node
- Create a empty backup for creating the backup volume
- Write some data to the volume
- Create a backup of the volume
- Crash all replicas
- Wait and check if the backup failed
- Wait and check if the backup was not deleted.
- Cleanup
- Set the default setting
def test_backup_failed_enable_auto_cleanup(set_random_backupstore, client, core_api, volume_name)
-
Test the failed backup would be automatically deleted.
- Set the default setting
backupstore-poll-interval
to 60 (seconds) - Set the default setting
failed-backup-ttl
to 3 (minutes) - Create a volume and attach to the current node
- Create a empty backup for creating the backup volume
- Write some data to the volume
- Create a backup of the volume
- Crash all replicas
- Wait and check if the backup failed
- Wait and check if the backup was deleted automatically
- Cleanup
- Set the default setting
def test_backup_labels(set_random_backupstore, client, random_labels, volume_name)
-
Test that the proper Labels are applied when creating a Backup manually.
- Create a volume
- Run the following steps on all backupstores
- Create a backup with some random labels
- Get backup from backupstore, verify the labels are set on the backups
def test_backup_lock_creation_during_deletion(set_random_backupstore, client, core_api, volume_name, csi_pv, pvc, pod_make)
-
Test backup locks Context: To test the locking mechanism that utilizes the backupstore, to prevent the following case of concurrent operations. - prevent backup creation during backup deletion
steps: 1. Create a volume, then create the corresponding PV, PVC and Pod. 2. Wait for the pod running and the volume healthy. 3. Write data (DATA_SIZE_IN_MB_4) to the pod volume and get the md5sum. 4. Take a backup. 5. Wait for the backup to be completed. 6. Delete the backup. 7. Create another backup of the same volume. 8. The newly created backup should failed because there is a deletion lock. 9. Wait for the first backup to be Deleted 10. Create another backup of the same volume. 11. Wait for the backup to be completed.
def test_backup_lock_deletion_during_backup(set_random_backupstore, client, core_api, volume_name, csi_pv, pvc, pod_make)
-
Test backup locks Context: To test the locking mechanism that utilizes the backupstore, to prevent the following case of concurrent operations. - prevent backup deletion while a backup is in progress
steps: 1. Create a volume, then create the corresponding PV, PVC and Pod. 2. Wait for the pod running and the volume healthy. 3. Write data to the pod volume and get the md5sum. 4. Take a backup. 5. Wait for the backup to be completed. 6. Write more data into the volume and compute md5sum. 7. Take another backup of the volume. 8. While backup is in progress, delete the older backup up. 9. Wait for the backup creation in progress to be completed. 10. Check the backup store, there should be 1 backup. 11. Restore the latest backup. 12. Wait for the restoration to be completed. Assert md5sum from step 6.
def test_backup_lock_deletion_during_restoration(set_random_backupstore, client, core_api, volume_name, csi_pv, pvc, pod_make)
-
Test backup locks Context: To test the locking mechanism that utilizes the backupstore, to prevent the following case of concurrent operations. - prevent backup deletion during backup restoration
steps: 1. Create a volume, then create the corresponding PV, PVC and Pod. 2. Wait for the pod running and the volume healthy. 3. Write data to the pod volume and get the md5sum. 4. Take a backup. 5. Wait for the backup to be completed. 6. Start backup restoration for the backup creation. 7. Wait for restoration to be in progress. 8. Delete the backup from the backup store. 9. Wait for the restoration to be completed. 10. Assert the data from the restored volume with md5sum. 11. Assert the backup count in the backup store with 0.
def test_backup_lock_restoration_during_deletion(set_random_backupstore, client, core_api, volume_name, csi_pv, pvc, pod_make)
-
Test backup locks Context: To test the locking mechanism that utilizes the backupstore, to prevent the following case of concurrent operations. - prevent backup restoration during backup deletion
steps: 1. Create a volume, then create the corresponding PV, PVC and Pod. 2. Wait for the pod running and the volume healthy. 3. Write data to the pod volume and get the md5sum. 4. Take a backup. 5. Wait for the backup to be completed. 6. Write more data (1.5 Gi) to the volume and take another backup. 7. Wait for the 2nd backup to be completed. 8. Delete the 2nd backup. 9. Without waiting for the backup deletion completion, restore the 1st backup from the backup store. 10. Verify the restored volume become faulted. 11. Wait for the 2nd backup deletion and assert the count of the backups with 1 in the backup store.
def test_backup_metadata_deletion(set_random_backupstore, client, core_api, volume_name)
-
Test backup metadata deletion
Context:
We want to be able to delete the metadata (.cfg) files, even if they are corrupt or in a bad state (missing volume.cfg).
Setup:
- Setup minio as S3 backupstore
- Cleanup backupstore
Steps:
- Create volume(1,2) and attach to the current node
- write some data to volume(1,2)
- Create backup(1,2) of volume(1,2)
- request a backup list
- verify backup list contains no error messages for volume(1,2)
- verify backup list contains backup(1,2) information for volume(1,2)
- delete backup(1) of volume(1,2)
- request a backup list
- verify backup list contains no error messages for volume(1,2)
- verify backup list only contains backup(2) information for volume(1,2)
- delete volume.cfg of volume(2)
- request backup volume deletion for volume(2)
- verify that volume(2) has been deleted in the backupstore.
- request a backup list
- verify backup list only contains volume(1) and no errors
- verify backup list only contains backup(2) information for volume(1)
- delete backup volume(1)
- verify that volume(1) has been deleted in the backupstore.
- cleanup
-
Test backup status for unavailable replicas
Context:
We want to make sure that during the backup creation, once the responsible replica gone, the backup should in Error state and with the error message.
Setup:
- Create a volume and attach to the current node
- Run the test for all the available backupstores
Steps:
- Create a backup of volume
- Find the replica for that backup
- Disable scheduling on the node of that replica
- Delete the replica
- Verify backup status with Error state and with an error message
- Create a new backup
- Verify new backup was successful
- Cleanup (delete backups, delete volume)
def test_backup_volume_list(set_random_backupstore, client, core_api)
-
Test backup volume list Context: We want to make sure that an error when listing a single backup volume does not stop us from listing all the other backup volumes. Otherwise a single faulty backup can block the retrieval of all known backup volumes. Setup: 1. Setup minio as S3 backupstore Steps: 1. Create a volume(1,2) and attach to the current node 2. write some data to volume(1,2) 3. Create a backup(1) of volume(1,2) 4. request a backup list 5. verify backup list contains no error messages for volume(1,2) 6. verify backup list contains backup(1) for volume(1,2) 7. place a file named "backup_1234@failure.cfg" into the backups folder of volume(1) 8. request a backup list 9. verify backup list contains no error messages for volume(1,2) 10. verify backup list contains backup(1) for volume(1,2) 11. delete backup volumes(1 & 2) 12. cleanup
def test_backup_volume_restore_with_access_mode(core_api, set_random_backupstore, client, access_mode, overridden_restored_access_mode)
-
Test the backup w/ the volume access mode, then restore a volume w/ the original access mode or being overridden.
- Prepare a healthy volume
- Create a backup for the volume
- Restore a volume from the backup w/o specifying the access mode => Validate the access mode should be the same the volume
- Restore a volume from the backup w/ specifying the access mode => Validate the access mode should be the same as the specified
def test_backuptarget_available_during_engine_image_not_ready(client, apps_api)
-
Test backup target available during engine image not ready
- Set backup target URL to S3 and NFS respectively
- Set poll interval to 0 and 300 respectively
- Scale down the engine image DaemonSet
- Check engine image in deploying state
- Configures backup target during engine image in not ready state
- Check backup target status.available=false
- Scale up the engine image DaemonSet
- Check backup target status.available=true
- Reset backup target setting
- Check backup target status.available=false
def test_backuptarget_invalid(apps_api, client, core_api, backupstore_invalid, make_deployment_with_pvc, pvc_name, request, volume_name)
-
Related issue : https://github.com/longhorn/longhorn/issues/1249
This test case does not cover the UI test mentioned in the related issue's test steps."
Setup - Give an incorrect value to Backup target.
Given - Create a volume, attach it to a workload, write data into the volume.
When - Create a backup by a manifest yaml file
Then - Backup will be failed and the backup state is Error. - Backup target will be unavailable with an explanatory condition.
def test_cleanup_system_generated_snapshots(client, core_api, volume_name, csi_pv, pvc, pod_make)
-
Test Cleanup System Generated Snapshots
- Enabled 'Auto Cleanup System Generated Snapshot'.
- Create a volume and attach it to a node.
- Write some data to the volume and get the checksum of the data.
- Delete a random replica to trigger a system generated snapshot.
- Repeat Step 3 for 3 times, and make sure only one snapshot is left.
- Check the data with the saved checksum.
def test_default_storage_class_syncup(core_api, request)
-
Steps: 1. Record the current Longhorn-StorageClass-related ConfigMap
longhorn-storageclass
. 2. Modify the default Longhorn StorageClasslonghorn
. e.g., updatereclaimPolicy
fromDelete
toRetain
. 3. Verify that the change is reverted immediately and the manifest is the same as the record in ConfigMaplonghorn-storageclass
. 4. Delete the default Longhorn StorageClasslonghorn
. 5. Verify that the StorageClass is recreated immediately with the manifest the same as the record in ConfigMaplonghorn-storageclass
. 6. Modify the content of ConfigMaplonghorn-storageclass
. 7. Verify that the modifications will be applied to the default Longhorn StorageClasslonghorn
immediately. 8. Revert the modifications of the ConfigMaps. Then wait for the StorageClass sync-up. def test_delete_backup_during_restoring_volume(set_random_backupstore, client)
-
Test delete backup during restoring volume
Context:
The volume robustness should be faulted if the backup was deleted during restoring the volume.
- Given create volume v1 and attach to a node And write data 150M to volume v1
- When create a backup of volume v1 And wait for that backup is completed And restore a volume v2 from volume v1 backup And delete the backup immediately
- Then volume v2 "robustness" should be "faulted" And "status" of volume restore condition should be "False", And "reason" of volume restore condition should be "RestoreFailure"
def test_deleting_backup_volume(set_random_backupstore, client, volume_name)
-
Test deleting backup volumes
- Create volume and create backup
- Delete the backup and make sure it's gone in the backupstore
def test_dr_volume_activated_with_failed_replica(set_random_backupstore, client, core_api, volume_name)
-
Test DR volume activated with a failed replica
Context:
Make sure that DR volume could be activated as long as there is a ready replica.
Steps:
- Create a volume and attach to a node.
- Create a backup of the volume with writing some data.
- Create a DR volume from the backup.
- Disable the replica rebuilding.
- Enable the setting
allow-volume-creation-with-degraded-availability
- Make a replica failed.
- Activate the DR volume.
- Enable the replica rebuilding.
- Attach the volume to a node.
- Check if data is correct.
def test_dr_volume_with_backup_and_backup_volume_deleted(set_random_backupstore, client, core_api, volume_name)
-
Test DR volume can be activated after delete all backups.
Context:
We want to make sure that DR volume can activate after deleting some/all backups or the backup volume.
Steps:
- Create a volume and attach to the current node.
- Write 4 MB to the beginning of the volume (2 x 2MB backup blocks).
- Create backup(0) then backup(1) for the volume.
- Verify backup block count == 4.
- Create DR volume(1) and DR volume(2) from backup(1).
- Verify DR volumes last backup is backup(1).
- Delete backup(1).
- Verify backup block count == 2.
- Verify DR volumes last backup becomes backup(0).
- Activate and verify DR volume(1) data is data(0).
- Delete backup(0).
- Verify backup block count == 0.
- Verify DR volume last backup is empty.
- Delete the backup volume.
- Activate and verify DR volume data is data(0).
def test_dr_volume_with_backup_block_deletion(set_random_backupstore, client, core_api, volume_name)
-
Test DR volume last backup after block deletion.
Context:
We want to make sure that when the block is delete, the DR volume picks up the correct last backup.
Steps:
- Create a volume and attach to the current node.
- Write 4 MB to the beginning of the volume (2 x 2MB backup blocks).
- Create backup(0) of the volume.
- Overwrite backup(0) 1st blocks of data on the volume. (Since backup(0) contains 2 blocks of data, the updated data is data1["content"] + data0["content"][BACKUP_BLOCK_SIZE:])
- Create backup(1) of the volume.
- Verify backup block count == 3.
- Create DR volume from backup(1).
- Verify DR volume last backup is backup(1).
- Delete backup(1).
- Verify backup block count == 2.
- Verify DR volume last backup is backup(0).
- Overwrite backup(0) 1st blocks of data on the volume. (Since backup(0) contains 2 blocks of data, the updated data is data2["content"] + data0["content"][BACKUP_BLOCK_SIZE:])
- Create backup(2) of the volume.
- Verify DR volume last backup is backup(2).
- Activate and verify DR volume data is data2["content"] + data0["content"][BACKUP_BLOCK_SIZE:].
def test_dr_volume_with_backup_block_deletion_abort_during_backup_in_progress(set_random_backupstore, client, core_api, volume_name)
-
Test DR volume last backup after block deletion aborted. This will set the last backup to be empty.
Context:
We want to make sure that when the block deletion for the last backup is aborted by operations such as backups in progress, the DR volume will still pick up the correct last backup.
Steps:
- Create a volume and attach to the current node.
- Write 4 MB to the beginning of the volume (2 x 2MB backup blocks).
- Create backup(0) of the volume.
- Overwrite backup(0) 1st blocks of data on the volume. (Since backup(0) contains 2 blocks of data, the updated data is data1["content"] + data0["content"][BACKUP_BLOCK_SIZE:])
- Create backup(1) of the volume.
- Verify backup block count == 3.
- Create DR volume from backup(1).
- Verify DR volume last backup is backup(1).
- Create an artificial in progress backup.cfg file. This cfg file will convince the longhorn manager that there is a backup being created. Then all subsequent backup block cleanup will be skipped.
- Delete backup(1).
- Verify backup block count == 3 (because of the in progress backup).
- Verify DR volume last backup is empty.
- Delete the artificial in progress backup.cfg file.
- Overwrite backup(0) 1st blocks of data on the volume. (Since backup(0) contains 2 blocks of data, the updated data is data2["content"] + data0["content"][BACKUP_BLOCK_SIZE:])
- Create backup(2) of the volume.
- Verify DR volume last backup is backup(2).
- Activate and verify DR volume data is data2["content"] + data0["content"][BACKUP_BLOCK_SIZE:].
def test_engine_image_daemonset_restart(client, apps_api, volume_name)
-
Test restarting engine image daemonset
- Get the default engine image
- Create a volume and attach to the current node
- Write random data to the volume and create a snapshot
- Delete the engine image daemonset
- Engine image daemonset should be recreated
- In the meantime, validate the volume data to prove it's still functional
- Wait for the engine image to become
ready
again - Check the volume data again.
- Write some data and create a new snapshot.
- Since create snapshot will use engine image binary.
- Check the volume data again
def test_expand_pvc_with_size_round_up(client, core_api, volume_name)
-
test expand longhorn volume with pvc
- Create LHV,PV,PVC with size '1Gi'
- Attach, write data, and detach
- Expand volume size to '2000000000/2G' and check if size round up '2000683008/1908Mi'
- Attach, write data, and detach
- Expand volume size to '2Gi' and check if size is '2147483648'
- Attach, write data, and detach
def test_expansion_basic(client, volume_name)
-
Test volume expansion using Longhorn API
- Create volume and attach to the current node
- Generate data
snap1_data
and write it to the volume - Create snapshot
snap1
- Online expand the volume
- Verify the volume has been expanded
- Generate data
snap2_data
and write it to the volume - Create snapshot
snap2
- Generate data
snap3_data
and write it after the original size - Create snapshot
snap3
and verify thesnap3_data
with location - Detach and reattach the volume.
- Verify the volume is still expanded, and
snap3_data
remain valid - Detach the volume.
- Reattach the volume in maintenance mode
- Revert to
snap2
and detach. - Attach the volume and check data
snap2_data
- Generate
snap4_data
and write it after the original size - Create snapshot
snap4
and verifysnap4_data
. - Detach the volume and revert to
snap1
- Validate
snap1_data
TODO: Add offline expansion
def test_expansion_canceling(client, core_api, volume_name, pod, pvc, storage_class)
-
Test expansion canceling
- Create a volume, then create the corresponding PV, PVC and Pod.
- Generate
test_data
and write to the pod - Create an empty directory with expansion snapshot tmp meta file path so that the following offline expansion will fail
- Delete the pod and wait for volume detachment
- Try offline expansion via Longhorn API
- Wait for expansion failure then use Longhorn API to cancel it
- Create a new pod and validate the volume content
- Create an empty directory with expansion snapshot tmp meta file path so that the following online expansion will fail
- Try online expansion via Longhorn API
- Wait for expansion failure then use Longhorn API to cancel it
- Validate the volume content again, then re-write random data to the pod
- Retry online expansion, then verify the expansion done via Longhorn API
- Validate the volume content, then check if data writing looks fine
- Clean up pod, PVC, and PV
def test_expansion_with_scheduling_failure(client, core_api, volume_name, pod, pvc, storage_class)
-
Test if the running volume with scheduling failure can be expanded after the detachment.
Prerequisite: Setting "soft anti-affinity" is false.
- Create a volume, then create the corresponding PV, PVC and Pod.
- Wait for the pod running and the volume healthy.
- Write data to the pod volume and get the md5sum.
- Disable the scheduling for a node contains a running replica.
- Crash the replica on the scheduling disabled node for the volume. Then delete the failed replica so that it won't be reused.
- Wait for the scheduling failure.
- Verify:
7.1.
volume.ready == True
. 7.2.volume.conditions[scheduled].status == False
. 7.3. the volume is Degraded. 7.4. the new replica cannot be created. - Write more data to the volume and get the md5sum
- Delete the pod and wait for the volume detached.
- Verify:
10.1.
volume.ready == True
. 10.2.volume.conditions[scheduled].status == True
- Expand the volume and wait for the expansion succeeds.
- Verify there is no rebuild replica after the expansion.
- Recreate a new pod for the volume and wait for the pod running.
- Validate the volume content.
- Verify the expanded part can be read/written correctly.
- Enable the node scheduling.
- Wait for the volume rebuild succeeds.
- Verify the data written in the expanded part.
- Clean up pod, PVC, and PV.
Notice that the step 1 to step 10 is identical with those of the case test_running_volume_with_scheduling_failure().
def test_expansion_with_size_round_up(client, core_api, volume_name)
-
test expand longhorn volume
- Create and attach longhorn volume with size '1Gi'.
- Write data, and offline expand volume size to '2000000000/2G'.
- Check if size round up '2000683008' and the written data.
- Write data, and online expand volume size to '2Gi'.
- Check if size round up '2147483648' and the written data.
def test_filesystem_trim(client, fs_type)
-
Test the filesystem in the volume can be trimmed correctly.
- Create a volume with option
unmapMarkSnapChainRemoved
enabled, then attach to the current node. - Make a filesystem and write file0 into the fs, calculate the checksum, then take snap0.
- Write file21 and calculate the checksum. Then take snap21.
- Unmount then reattach the volume without frontend. Revert the volume to snap0.
- Reattach and mount the volume.
- Write file11. Then take snap11.
- Write file12. Then take snap12.
- Write file13. Then remove file0, file11, file12, and file13. Verify the snapshots and volume head size are not shrunk.
- Do filesystem trim (via Longhorn API or cmdline).
Verify that:
- snap11 and snap12 are marked as removed.
- snap11, snap12, and volume head size are shrunk.
- Disable option
unmapMarkSnapChainRemoved
for the volume. - Write file14. Then take snap14.
- Write file15. Then remove file14 and file15.
Verify that:
- snap14 is not marked as removed and its size is not changed.
- volume head size is shrunk.
- Unmount and reattach the volume. Then revert to snap21.
- Reattach and mount the volume. Verify the file0 and file21.
- Cleanup.
- Create a volume with option
def test_hosts(client)
-
Check node name and IP
def test_listing_backup_volume(client, backing_image='')
-
Test listing backup volumes
- Create three volumes:
volume1/2/3
- Setup NFS backupstore since we can manipulate the content easily
- Create multiple snapshots for all three volumes
- Rename
volume1
'svolume.cfg
tovolume.cfg.tmp
in backupstore - List backup volumes. Make sure
volume1
errors out but found other two - Restore
volume1
'svolume.cfg
. - Make sure now backup volume
volume1
can be found - Delete backups for
volume1/2
, make sure they cannot be found later - Corrupt a backup.cfg on volume3
- Check that the backup is listed with the other backups of volume3
- Verify that the corrupted backup has Messages of type error
- Check that backup inspection for the previously corrupted backup fails
- Delete backups for
volume3
, make sure they cannot be found later
- Create three volumes:
def test_multiple_volumes_creation_with_degraded_availability(set_random_backupstore, client, core_api, apps_api, storage_class, statefulset)
-
Scenario: verify multiple volumes with degraded availability can be created, attached, detached, and deleted at nearly the same time.
Given new StorageClass created with
numberOfReplicas=5
.When set
allow-volume-creation-with-degraded-availability
toTrue
. And deploy this StatefulSet: https://github.com/longhorn/longhorn/issues/2073#issuecomment-742948726 Then all 10 volumes are healthy in 1 minute.When delete the StatefulSet. then all 10 volumes are detached in 1 minute.
When find and delete the PVC of the 10 volumes. Then all 10 volumes are deleted in 1 minute.
def test_pvc_storage_class_name_from_backup_volume(set_random_backupstore, core_api, client, volume_name, pvc_name, pvc, pod_make, storage_class)
-
Test the storageClasName of the restored volume's PV/PVC should be from the backup volume
Given - Create a new StorageClass
kind: StorageClass apiVersion: storage.k8s.io/v1 metadata: name: longhorn-test provisioner: driver.longhorn.io allowVolumeExpansion: true reclaimPolicy: Delete volumeBindingMode: Immediate parameters: numberOfReplicas: "3"
- Create a PVC to use this SCapiVersion: v1 kind: PersistentVolumeClaim metadata: name: test-pvc spec: accessModes: - ReadWriteOnce storageClassName: longhorn-test resources: requests: storage: 300Mi
- Attach the Volume and write some dataWhen - Backup the Volume
Then - the backupvolume's status.storageClassName should be longhorn-test
When - Restore the backup to a new volume - Create PV/PVC from the new volume with create new PVC option
Then - The new PVC's storageClassName should still be longhorn-test - Verify the restored data is the same as original one
def test_restore_basic(set_random_backupstore, client, core_api, volume_name, pod)
-
Steps: 1. Create a volume and attach to a pod. 2. Write some data into the volume and compute the checksum m1. 3. Create a backup say b1. 4. Write some more data into the volume and compute the checksum m2. 5. Create a backup say b2. 6. Delete all the data from the volume. 7. Write some more data into the volume and compute the checksum m3. 8. Create a backup say b3. 9. Restore backup b1 and verify the data with m1. 10. Restore backup b2 and verify the data with m1 and m2. 11. Restore backup b3 and verify the data with m3. 12. Delete the backup b2. 13. restore the backup b3 and verify the data with m3.
def test_restore_inc(set_random_backupstore, client, core_api, volume_name, pod)
-
Test restore from disaster recovery volume (incremental restore)
Run test against all the backupstores
- Create a volume and attach to the current node
- Generate
data0
, write to the volume, make a backupbackup0
- Create three DR(standby) volumes from the backup:
sb_volume0/1/2
- Wait for all three DR volumes to start the initial restoration
- Verify DR volumes's
lastBackup
isbackup0
- Verify snapshot/pv/pvc/change backup target are not allowed as long as the DR volume exists
- Activate standby
sb_volume0
and attach it to check the volume data - Generate
data1
and write to the original volume and createbackup1
- Make sure
sb_volume1
'slastBackup
field has been updated tobackup1
- Wait for
sb_volume1
to finish incremental restoration then activate - Attach and check
sb_volume1
's data - Generate
data2
and write to the original volume and createbackup2
- Make sure
sb_volume2
'slastBackup
field has been updated tobackup1
- Wait for
sb_volume2
to finish incremental restoration then activate - Attach and check
sb_volume2
's data - Create PV, PVC and Pod to use
sb_volume2
, check PV/PVC/POD are good
FIXME: Step 16 works because the disk will be treated as a unformatted disk
def test_restore_inc_with_offline_expansion(set_random_backupstore, client, core_api, volume_name, pod)
-
Test restore from disaster recovery volume with volume offline expansion
Run test against a random backupstores
- Create a volume and attach to the current node
- Generate
data0
, write to the volume, make a backupbackup0
- Create three DR(standby) volumes from the backup:
dr_volume0/1/2
- Wait for all three DR volumes to start the initial restoration
- Verify DR volumes's
lastBackup
isbackup0
- Verify snapshot/pv/pvc/change backup target are not allowed as long as the DR volume exists
- Activate standby
dr_volume0
and attach it to check the volume data - Expand the original volume. Make sure the expansion is successful.
- Generate
data1
and write to the original volume and createbackup1
- Make sure
dr_volume1
'slastBackup
field has been updated tobackup1
- Activate
dr_volume1
and check datadata0
anddata1
- Generate
data2
and write to the original volume after original SIZE - Create
backup2
- Wait for
dr_volume2
to finish expansion, showbackup2
as latest - Activate
dr_volume2
and verifydata2
- Detach
dr_volume2
- Create PV, PVC and Pod to use
sb_volume2
, check PV/PVC/POD are good
FIXME: Step 16 works because the disk will be treated as a unformatted disk
def test_running_volume_with_scheduling_failure(client, core_api, volume_name, pod)
-
Test if the running volume still work fine when there is a scheduling failed replica
Prerequisite: Setting "soft anti-affinity" is false. Setting "replica-replenishment-wait-interval" is 0
- Create a volume, then create the corresponding PV, PVC and Pod.
- Wait for the pod running and the volume healthy.
- Write data to the pod volume and get the md5sum.
- Disable the scheduling for a node contains a running replica.
- Crash the replica on the scheduling disabled node for the volume.
- Wait for the scheduling failure.
- Verify:
7.1.
volume.ready == True
. 7.2.volume.conditions[scheduled].status == False
. 7.3. the volume is Degraded. 7.4. the new replica cannot be created. - Write more data to the volume and get the md5sum
- Delete the pod and wait for the volume detached.
- Verify:
10.1.
volume.ready == True
. 10.2.volume.conditions[scheduled].status == True
- Recreate a new pod for the volume and wait for the pod running.
- Validate the volume content, then check if data writing looks fine.
- Clean up pod, PVC, and PV.
def test_setting_default_replica_count(client, volume_name)
-
Test
Default Replica Count
setting- Set default replica count in the global settings to 5
- Create a volume without specify the replica count
- The volume should have 5 replicas (instead of the previous default 3)
def test_settings(client)
-
Check input for settings
def test_snapshot(client, volume_name, backing_image='')
-
Test snapshot operations
- Create a volume and attach to the node
- Create the empty snapshot
snap1
- Generate and write data
snap2_data
, then createsnap2
- Generate and write data
snap3_data
, then createsnap3
- List snapshot. Validate the snapshot chain relationship
- Mark
snap3
as removed. Make sure volume's data didn't change - List snapshot. Make sure
snap3
is marked as removed - Detach and reattach the volume in maintenance mode.
- Make sure the volume frontend is still
blockdev
but disabled - Revert to
snap2
- Detach and reattach the volume with frontend enabled
- Make sure volume's data is
snap2_data
- List snapshot. Make sure
volume-head
is nowsnap2
's child - Delete
snap1
andsnap2
- Purge the snapshot.
- List the snapshot, make sure
snap1
andsnap3
are gone.snap2
is marked as removed. - Check volume data, make sure it's still
snap2_data
.
def test_snapshot_prune(client, volume_name, backing_image='')
-
Test removing the snapshot directly behinds the volume head would trigger snapshot prune. Snapshot pruning means removing the overlapping part from the snapshot based on the volume head content.
- Create a volume and attach to the node
- Generate and write data
snap1_data
, then createsnap1
- Generate and write data
snap2_data
with the same offset. - Mark
snap1
as removed. Make sure volume's data didn't change. But all data of thesnap1
will be pruned. - Detach and expand the volume, then wait for the expansion done.
This will implicitly create a new snapshot
snap2
. - Attach the volume. Make sure there is a system snapshot with the old size.
- Generate and write data
snap3_data
which is partially overlapped withsnap2_data
, plus one extra data chunk in the expanded part. - Mark
snap2
as removed then do snapshot purge. Make sure volume's data didn't change. But the overlapping part ofsnap2
will be pruned. - Create
snap3
. - Do snapshot purge for the volume. Make sure
snap2
will be removed. - Generate and write data
snap4_data
which has no overlapping withsnap3_data
. - Mark
snap3
as removed. Make sure volume's data didn't change. But there is no change forsnap3
. - Create
snap4
. - Generate and write data
snap5_data
, then createsnap5
. - Detach and reattach the volume in maintenance mode.
- Make sure the volume frontend is still
blockdev
but disabled - Revert to
snap4
- Detach and reattach the volume with frontend enabled
- Make sure volume's data is correct.
- List snapshot. Make sure
volume-head
is nowsnap4
's child
def test_snapshot_prune_and_coalesce_simultaneously(client, volume_name, backing_image='')
-
Test the prune for the snapshot directly behinds the volume head would be handled after all snapshot coalescing done.
- Create a volume and attach to the node
- Generate and write 1st data chunk
snap1_data
, then createsnap1
- Generate and write 2nd data chunk
snap2_data
, then createsnap2
- Generate and write 3rd data chunk
snap3_data
, then createsnap3
- Generate and write 4th data chunk
snap4_data
, then createsnap4
- Overwrite all existing data chunks in the volume head.
- Mark all snapshots as
Removed
, then start snapshot purge and wait for complete. - List snapshot.
Make sure there are only 2 snapshots left:
volume-head
andsnap4
. Andsnap4
is an empty snapshot. - Make sure volume's data is correct.
def test_space_usage_for_rebuilding_only_volume(client, volume_name, request)
-
Test case: the normal scenario 1. Prepare a 7Gi volume as a node disk. 2. Create a new volume with 3Gi spec size. 3. Write 3Gi data (using
dd
) to the volume. 4. Take a snapshot then mark this snapshot as Removed. (this snapshot won't be deleted immediately.) 5. Write 3Gi data (usingdd
) to the volume again. 6. Delete a random replica to trigger the rebuilding. 7. Wait for the rebuilding complete. And verify the volume actual size won't be greater than 2x of the volume spec size. 8. Delete the volume. def test_space_usage_for_rebuilding_only_volume_worst_scenario(client, volume_name, request)
-
Test case: worst scenario 1. Prepare a 7Gi volume as a node disk. 2. Create a new volume with 2Gi spec size. 3. Write 2Gi data (using
dd
) to the volume. 4. Take a snapshot then mark this snapshot as Removed. (this snapshot won't be deleted immediately.) 5. Write 2Gi data (usingdd
) to the volume again. 6. Delete a random replica to trigger the rebuilding. 7. Write 2Gi data once the rebuilding is trigger (new replica is created). 8. Wait for the rebuilding complete. And verify the volume actual size won't be greater than 3x of the volume spec size. 9. Delete the volume. def test_storage_class_from_backup(set_random_backupstore, volume_name, pvc_name, storage_class, client, core_api, pod_make)
-
Test restore backup using StorageClass
- Create volume and PV/PVC/POD
- Write
test_data
into pod - Create a snapshot and back it up. Get the backup URL
- Create a new StorageClass
longhorn-from-backup
and set backup URL. - Use
longhorn-from-backup
to create a new PVC - Wait for the volume to be created and complete the restoration.
- Create the pod using the PVC. Verify the data
def test_volume_backup_and_restore_with_gzip_compression_method(client, set_random_backupstore, volume_name)
-
Scenario: test volume backup and restore with different compression methods
Issue: https://github.com/longhorn/longhorn/issues/5189
Given setup Backup Compression Method is "gzip" And setup backup concurrent limit is "4" And setup restore concurrent limit is "4"
When create a volume and attach to the current node And get the volume's details Then verify the volume's compression method is "gzip"
Then Create a backup of volume And Write volume random data Then restore the backup to a new volume And Attach the new volume and verify the data integrity And Detach the volume and delete the backup And Wait for the restored volume's
lastBackup
to be cleaned (due to remove the backup) And Delete the volume def test_volume_backup_and_restore_with_lz4_compression_method(client, set_random_backupstore, volume_name)
-
Scenario: test volume backup and restore with different compression methods
Issue: https://github.com/longhorn/longhorn/issues/5189
Given setup Backup Compression Method is "lz4" And setup backup concurrent limit is "4" And setup restore concurrent limit is "4"
When create a volume and attach to the current node And get the volume's details Then verify the volume's compression method is "lz4"
Then Create a backup of volume And Write volume random data Then restore the backup to a new volume And Attach the new volume and verify the data integrity Then Detach the volume and delete the backup And Wait for the restored volume's
lastBackup
to be cleaned (due to remove the backup) And Delete the volume def test_volume_backup_and_restore_with_none_compression_method(client, set_random_backupstore, volume_name)
-
Scenario: test volume backup and restore with different compression methods
Issue: https://github.com/longhorn/longhorn/issues/5189
Given setup Backup Compression Method is "none" And setup backup concurrent limit is "4" And setup restore concurrent limit is "4"
When create a volume and attach to the current node And get the volume's details Then verify the volume's compression method is "none"
Then Create a backup of volume And Write volume random data Then restore the backup to a new volume And Attach the new volume and verify the data integrity And Detach the volume and delete the backup And Wait for the restored volume's
lastBackup
to be cleaned (due to remove the backup) And Delete the volume def test_volume_basic(client, volume_name)
-
Test basic volume operations:
- Check volume name and parameter
- Create a volume and attach to the current node, then check volume states
- Check soft anti-affinity rule
- Write then read back to check volume data
def test_volume_iscsi_basic(client, volume_name)
-
Test basic volume operations with iscsi frontend
- Create and attach a volume with iscsi frontend
- Check the volume endpoint and connect it using the iscsi initiator on the node.
- Write then read back volume data for validation
def test_volume_metafile_deleted(client, core_api, volume_name, csi_pv, pvc, pod, pod_make)
-
Scenario:
Test volume should still work when the volume meta file is removed in the replica data path.
Steps:
- Delete volume meta file in this replica data path
- Recreate the pod and wait for the volume attached
- Check if the volume is Healthy after the volume attached
- Check volume data
- Check if the volume still works fine by r/w data and creating/removing snapshots
def test_volume_metafile_deleted_when_writing_data(client, core_api, volume_name, csi_pv, pvc, pod, pod_make)
-
Scenario:
While writing data, test volume should still work when the volume meta file is deleted in the replica data path.
Steps:
- Create a pod using Longhorn volume
- Delete volume meta file in this replica data path
- Recreate the pod and wait for the volume attached
- Check if the volume is Healthy after the volume attached
- Check volume data
- Check if the volume still works fine by r/w data and creating/removing snapshots
def test_volume_metafile_empty(client, core_api, volume_name, csi_pv, pvc, pod, pod_make)
-
Scenario:
Test volume should still work when there is an invalid volume meta file in the replica data path.
Steps:
- Remove the content of the volume meta file in this replica data path
- Recreate the pod and wait for the volume attached
- Check if the volume is Healthy after the volume attached
- Check volume data
- Check if the volume still works fine by r/w data and creating/removing snapshots
def test_volume_multinode(client, volume_name)
-
Test the volume can be attached on multiple nodes
- Create one volume
- Attach it on every node once, verify the state, then detach it
def test_volume_scheduling_failure(client, volume_name)
-
Test fail to schedule by disable scheduling for all the nodes
Also test cannot attach a scheduling failed volume
- Disable
allowScheduling
for all nodes - Create a volume.
- Verify the volume condition
Scheduled
is false - Verify the volume is not ready for workloads
- Verify attaching the volume will result in error
- Enable
allowScheduling
for all nodes - Volume should be automatically scheduled (condition become true)
- Volume can be attached now
- Disable
def test_volume_toomanysnapshots_condition(client, core_api, volume_name)
-
Test Volume TooManySnapshots Condition
- Create a volume and attach it to a node.
- Check the 'TooManySnapshots' condition is False.
- Writing data to this volume and meanwhile taking 101 snapshots.
- Check the 'TooManySnapshots' condition is True.
- Take one more snapshot to make sure snapshots works fine.
- Delete 2 snapshots, and check the 'TooManySnapshots' condition is False.
def test_volume_update_replica_count(client, volume_name)
-
Test updating volume's replica count
- Create a volume with 2 replicas
- Attach the volume
- Increase the replica to 3.
- Volume will become degraded and start rebuilding
- Wait for rebuilding to complete
- Update the replica count to 2. Volume should remain healthy
- Remove 1 replicas, so there will be 2 replicas in the volume
- Verify the volume is still healthy
Volume should always be healthy even only with 2 replicas.
def test_workload_with_fsgroup(core_api, statefulset)
-
- Deploy a StatefulSet workload that uses Longhorn volume and has
securityContext set:
securityContext: runAsUser: 1000 runAsGroup: 1000 fsGroup: 1000
See https://github.com/longhorn/longhorn/issues/2964#issuecomment-910117570 for an example. - Wait for the workload pod to be running
- Exec into the workload pod, cd into the mount point of the volume.
- Verify that the mount point has correct filesystem permission (e.g.,
running
ls -l
on the mount point should return the permission in the format *rw* - Verify that we can read/write files.
- Deploy a StatefulSet workload that uses Longhorn volume and has
securityContext set:
def volume_basic_test(client, volume_name, backing_image='')
def volume_iscsi_basic_test(client, volume_name, backing_image='')
def volume_rw_test(dev)