Module tests.test_basic

Functions

def backup_failed_cleanup(client, core_api, volume_name, volume_size, failed_backup_ttl='3')

Setup the failed backup cleanup

def backup_labels_test(client, random_labels, volume_name, size='16777216', backing_image='')
def backup_status_for_unavailable_replicas_test(client, volume_name, size, backing_image='')
def backup_test(client, volume_name, size, backing_image='', compression_method='lz4')
def backupstore_test(client, host_id, volname, size, compression_method)
def check_volume_and_snapshot_after_corrupting_volume_metadata_file(client, core_api, volume_name, pod, test_pod_name, data_path1, data_md5sum1, data_path2, snap)

Test volume I/O and take/delete a snapshot

def prepare_data_volume_metafile(client, core_api, volume_name, csi_pv, pvc, pod, pod_make, data_path, test_writing_data=False, writing_data_path='/data/writing_data_file')

Prepare volume and snapshot for volume metafile testing

Setup:

  1. Create a pod using Longhorn volume
  2. Write some data to the volume then get the md5sum
  3. Create a snapshot
  4. Delete the pod and wait for the volume detached
  5. Pick up a replica on this host and get the replica data path
def prepare_space_usage_for_rebuilding_only_volume(client)
  1. Create a 7Gi volume and attach to the node.
  2. Make a filesystem then mount this volume.
  3. Make this volume as a disk of the node, and disable the scheduling for the default disk.
def restore_inc_test(client, core_api, volume_name, pod)
def snapshot_prune_and_coalesce_simultaneously(client, volume_name, backing_image)
def snapshot_prune_test(client, volume_name, backing_image)
def snapshot_test(client, volume_name, backing_image)
def test_allow_volume_creation_with_degraded_availability(client, volume_name)

Test Allow Volume Creation with Degraded Availability (API)

Requirement: 1. Set allow-volume-creation-with-degraded-availability to true. 2. node-level-soft-anti-affinity to false.

Steps: (degraded availability) 1. Disable scheduling for node 2 and 3. 2. Create a volume with three replicas. 1. Volume should be ready after creation and Scheduled is true. 2. One replica schedule succeed. Two other replicas failed scheduling. 3. Enable the scheduling of node 2. 1. One additional replica of the volume will become scheduled. 2. The other replica is still failed to schedule. 3. Scheduled condition is still true. 4. Attach the volume. 1. After the volume is attached, scheduled condition become false. 5. Write data to the volume. 6. Detach the volume. 1. Scheduled condition should become true. 7. Reattach the volume to verify the data. 1. Scheduled condition should become false. 8. Enable the scheduling for the node 3. 9. Wait for the scheduling condition to become true. 10. Detach and reattach the volume to verify the data.

def test_allow_volume_creation_with_degraded_availability_dr(set_random_backupstore, client, core_api, volume_name, csi_pv, pvc, pod, pod_make)

Test Allow Volume Creation with Degraded Availability (Restore)

Requirement: 1. Set allow-volume-creation-with-degraded-availability to true. 2. node-level-soft-anti-affinity to false. 3. Create a backup of 800MB.

Steps: (DR volume) 1. Disable scheduling for node 2 and 3. 2. Create a DR volume from backup with 3 replicas. 1. The scheduled condition is false. 2. Only node 1 replica become scheduled. 3. Enable scheduling for node 2 and 3. 1. Replicas scheduling to node 1, 2, 3 success. 2. Wait for restore progress to complete. 3. The scheduled condition becomes true. 4. Activate, attach the volume, and verify the data.

def test_allow_volume_creation_with_degraded_availability_error(client, volume_name)

Test Allow Volume Creation with Degraded Availability (API)

Requirement: 1. Set allow-volume-creation-with-degraded-availability to true. 2. node-level-soft-anti-affinity to false.

Steps: (no availability) 1. Disable all nodes' scheduling. 2. Create a volume with three replicas. 1. Volume should be NotReady after creation. 2. Scheduled condition should become false. 3. Attaching the volume should result in error. 4. Enable one node's scheduling. 1. Volume should become Ready soon. 2. Scheduled condition should become true. 5. Attach the volume. Write data. Detach and reattach to verify the data.

def test_allow_volume_creation_with_degraded_availability_restore(set_random_backupstore, client, core_api, volume_name, csi_pv, pvc, pod, pod_make)

Test Allow Volume Creation with Degraded Availability (Restore)

Requirement: 1. Set allow-volume-creation-with-degraded-availability to true. 2. node-level-soft-anti-affinity to false. 3. replica-replenishment-wait-interval to 0. 4. Create a backup of 800MB.

Steps: (restore) 1. Disable scheduling for node 2 and 3. 2. Restore a volume with 3 replicas. 1. The scheduled condition is true. 2. Only node 1 replica become scheduled. 3. Enable scheduling for node 2. 4. Wait for the restore to complete and volume detach automatically. Then check the scheduled condition still true. 5. Attach and wait for the volume. 1. 2 Replicas successfully scheduled to node 1 and 2. 1 Replica cannot be created due to node 3 is unscheduled. 2. The scheduled condition becomes false. 3. Verify the data.

def test_attach_without_frontend(client, volume_name)

Test attach in maintenance mode (without frontend)

  1. Create a volume and attach to the current node with enabled frontend
  2. Check volume has blockdev
  3. Write snap1_data into volume and create snapshot snap1
  4. Write more random data into volume and create another anspshot
  5. Detach the volume and reattach with disabled frontend
  6. Check volume still has blockdev as frontend but no endpoint
  7. Revert back to snap1
  8. Detach and reattach the volume with enabled frontend
  9. Check volume contains data snap1_data
def test_aws_iam_role_arn(client, core_api)

Test AWS IAM Role ARN

  1. Set backup target to S3
  2. Check longhorn manager and aio instance manager Pods without 'iam.amazonaws.com/role' annotation
  3. Add AWS_IAM_ROLE_ARN to secret
  4. Check longhorn manager and aio instance manager Pods with 'iam.amazonaws.com/role' annotation and matches to AWS_IAM_ROLE_ARN in secret
  5. Update AWS_IAM_ROLE_ARN from secret
  6. Check longhorn manager and aio instance manager Pods with 'iam.amazonaws.com/role' annotation and matches to AWS_IAM_ROLE_ARN in secret
  7. Remove AWS_IAM_ROLE_ARN from secret
  8. Check longhorn manager and aio instance manager Pods without 'iam.amazonaws.com/role' annotation
def test_backup(set_random_backupstore, client, volume_name)

Test basic backup

Setup:

  1. Create a volume and attach to the current node
  2. Run the test for all the available backupstores.

Steps:

  1. Create a backup of volume
  2. Restore the backup to a new volume
  3. Attach the new volume and make sure the data is the same as the old one
  4. Detach the volume and delete the backup.
  5. Wait for the restored volume's lastBackup to be cleaned (due to remove the backup)
  6. Delete the volume
def test_backup_block_deletion(set_random_backupstore, client, core_api, volume_name)

Test backup block deletion

Context:

We want to make sure that we only delete non referenced backup blocks, we also don't want to delete blocks while there other backups in progress. The reason for this is that we don't yet know which blocks are required by the in progress backup, so blocks deletion could lead to a faulty backup.

Setup:

  1. Setup minio as S3 backupstore

Steps:

  1. Create a volume and attach to the current node
  2. Write 4 MB to the beginning of the volume (2 x 2MB backup blocks)
  3. Create backup(1) of the volume
  4. Overwrite the first of the backup blocks of data on the volume
  5. Create backup(2) of the volume
  6. Overwrite the first of the backup blocks of data on the volume
  7. Create backup(3) of the volume
  8. Verify backup block count == 4 assert volume["DataStored"] == str(BLOCK_SIZE * expected_count) assert count of *.blk files for that volume == expected_count
  9. Create an artificial in progress backup.cfg file json.dumps({"Name": name, "VolumeName": volume, "CreatedTime": ""})
  10. Delete backup(2)
  11. Verify backup block count == 4 (because of the in progress backup)
  12. Delete the artificial in progress backup.cfg file
  13. Delete backup(1)
  14. Verify backup block count == 2
  15. Delete backup(3)
  16. Verify backup block count == 0
  17. Delete the backup volume
  18. Cleanup the volume
def test_backup_failed_disable_auto_cleanup(set_random_backupstore, client, core_api, volume_name)

Test the failed backup would be automatically deleted.

  1. Set the default setting backupstore-poll-interval to 60 (seconds)
  2. Set the default setting failed-backup-ttl to 0
  3. Create a volume and attach to the current node
  4. Create a empty backup for creating the backup volume
  5. Write some data to the volume
  6. Create a backup of the volume
  7. Crash all replicas
  8. Wait and check if the backup failed
  9. Wait and check if the backup was not deleted.
  10. Cleanup
def test_backup_failed_enable_auto_cleanup(set_random_backupstore, client, core_api, volume_name)

Test the failed backup would be automatically deleted.

  1. Set the default setting backupstore-poll-interval to 60 (seconds)
  2. Set the default setting failed-backup-ttl to 3 (minutes)
  3. Create a volume and attach to the current node
  4. Create a empty backup for creating the backup volume
  5. Write some data to the volume
  6. Create a backup of the volume
  7. Crash all replicas
  8. Wait and check if the backup failed
  9. Wait and check if the backup was deleted automatically
  10. Cleanup
def test_backup_labels(set_random_backupstore, client, random_labels, volume_name)

Test that the proper Labels are applied when creating a Backup manually.

  1. Create a volume
  2. Run the following steps on all backupstores
  3. Create a backup with some random labels
  4. Get backup from backupstore, verify the labels are set on the backups
def test_backup_lock_creation_during_deletion(set_random_backupstore, client, core_api, volume_name, csi_pv, pvc, pod_make)

Test backup locks Context: To test the locking mechanism that utilizes the backupstore, to prevent the following case of concurrent operations. - prevent backup creation during backup deletion

steps: 1. Create a volume, then create the corresponding PV, PVC and Pod. 2. Wait for the pod running and the volume healthy. 3. Write data (DATA_SIZE_IN_MB_4) to the pod volume and get the md5sum. 4. Take a backup. 5. Wait for the backup to be completed. 6. Delete the backup. 7. Create another backup of the same volume. 8. The newly created backup should failed because there is a deletion lock. 9. Wait for the first backup to be Deleted 10. Create another backup of the same volume. 11. Wait for the backup to be completed.

def test_backup_lock_deletion_during_backup(set_random_backupstore, client, core_api, volume_name, csi_pv, pvc, pod_make)

Test backup locks Context: To test the locking mechanism that utilizes the backupstore, to prevent the following case of concurrent operations. - prevent backup deletion while a backup is in progress

steps: 1. Create a volume, then create the corresponding PV, PVC and Pod. 2. Wait for the pod running and the volume healthy. 3. Write data to the pod volume and get the md5sum. 4. Take a backup. 5. Wait for the backup to be completed. 6. Write more data into the volume and compute md5sum. 7. Take another backup of the volume. 8. While backup is in progress, delete the older backup up. 9. Wait for the backup creation in progress to be completed. 10. Check the backup store, there should be 1 backup. 11. Restore the latest backup. 12. Wait for the restoration to be completed. Assert md5sum from step 6.

def test_backup_lock_deletion_during_restoration(set_random_backupstore, client, core_api, volume_name, csi_pv, pvc, pod_make)

Test backup locks Context: To test the locking mechanism that utilizes the backupstore, to prevent the following case of concurrent operations. - prevent backup deletion during backup restoration

steps: 1. Create a volume, then create the corresponding PV, PVC and Pod. 2. Wait for the pod running and the volume healthy. 3. Write data to the pod volume and get the md5sum. 4. Take a backup. 5. Wait for the backup to be completed. 6. Start backup restoration for the backup creation. 7. Wait for restoration to be in progress. 8. Delete the backup from the backup store. 9. Wait for the restoration to be completed. 10. Assert the data from the restored volume with md5sum. 11. Assert the backup count in the backup store with 0.

def test_backup_lock_restoration_during_deletion(set_random_backupstore, client, core_api, volume_name, csi_pv, pvc, pod_make)

Test backup locks Context: To test the locking mechanism that utilizes the backupstore, to prevent the following case of concurrent operations. - prevent backup restoration during backup deletion

steps: 1. Create a volume, then create the corresponding PV, PVC and Pod. 2. Wait for the pod running and the volume healthy. 3. Write data to the pod volume and get the md5sum. 4. Take a backup. 5. Wait for the backup to be completed. 6. Write more data (1.5 Gi) to the volume and take another backup. 7. Wait for the 2nd backup to be completed. 8. Delete the 2nd backup. 9. Without waiting for the backup deletion completion, restore the 1st backup from the backup store. 10. Verify the restored volume become faulted. 11. Wait for the 2nd backup deletion and assert the count of the backups with 1 in the backup store.

def test_backup_metadata_deletion(set_random_backupstore, client, core_api, volume_name)

Test backup metadata deletion

Context:

We want to be able to delete the metadata (.cfg) files, even if they are corrupt or in a bad state (missing volume.cfg).

Setup:

  1. Setup minio as S3 backupstore
  2. Cleanup backupstore

Steps:

  1. Create volume(1,2) and attach to the current node
  2. write some data to volume(1,2)
  3. Create backup(1,2) of volume(1,2)
  4. request a backup list
  5. verify backup list contains no error messages for volume(1,2)
  6. verify backup list contains backup(1,2) information for volume(1,2)
  7. delete backup(1) of volume(1,2)
  8. request a backup list
  9. verify backup list contains no error messages for volume(1,2)
  10. verify backup list only contains backup(2) information for volume(1,2)
  11. delete volume.cfg of volume(2)
  12. request backup volume deletion for volume(2)
  13. verify that volume(2) has been deleted in the backupstore.
  14. request a backup list
  15. verify backup list only contains volume(1) and no errors
  16. verify backup list only contains backup(2) information for volume(1)
  17. delete backup volume(1)
  18. verify that volume(1) has been deleted in the backupstore.
  19. cleanup
def test_backup_status_for_unavailable_replicas(set_random_backupstore, client, volume_name)

Test backup status for unavailable replicas

Context:

We want to make sure that during the backup creation, once the responsible replica gone, the backup should in Error state and with the error message.

Setup:

  1. Create a volume and attach to the current node
  2. Run the test for all the available backupstores

Steps:

  1. Create a backup of volume
  2. Find the replica for that backup
  3. Disable scheduling on the node of that replica
  4. Delete the replica
  5. Verify backup status with Error state and with an error message
  6. Create a new backup
  7. Verify new backup was successful
  8. Cleanup (delete backups, delete volume)
def test_backup_volume_list(set_random_backupstore, client, core_api)

Test backup volume list Context: We want to make sure that an error when listing a single backup volume does not stop us from listing all the other backup volumes. Otherwise a single faulty backup can block the retrieval of all known backup volumes. Setup: 1. Setup minio as S3 backupstore Steps: 1. Create a volume(1,2) and attach to the current node 2. write some data to volume(1,2) 3. Create a backup(1) of volume(1,2) 4. request a backup list 5. verify backup list contains no error messages for volume(1,2) 6. verify backup list contains backup(1) for volume(1,2) 7. place a file named "backup_1234@failure.cfg" into the backups folder of volume(1) 8. request a backup list 9. verify backup list contains no error messages for volume(1,2) 10. verify backup list contains backup(1) for volume(1,2) 11. delete backup volumes(1 & 2) 12. cleanup

def test_backup_volume_restore_with_access_mode(core_api, set_random_backupstore, client, access_mode, overridden_restored_access_mode)

Test the backup w/ the volume access mode, then restore a volume w/ the original access mode or being overridden.

  1. Prepare a healthy volume
  2. Create a backup for the volume
  3. Restore a volume from the backup w/o specifying the access mode => Validate the access mode should be the same the volume
  4. Restore a volume from the backup w/ specifying the access mode => Validate the access mode should be the same as the specified
def test_backuptarget_available_during_engine_image_not_ready(client, apps_api)

Test backup target available during engine image not ready

  1. Set backup target URL to S3 and NFS respectively
  2. Set poll interval to 0 and 300 respectively
  3. Scale down the engine image DaemonSet
  4. Check engine image in deploying state
  5. Configures backup target during engine image in not ready state
  6. Check backup target status.available=false
  7. Scale up the engine image DaemonSet
  8. Check backup target status.available=true
  9. Reset backup target setting
  10. Check backup target status.available=false
def test_backuptarget_invalid(apps_api, client, core_api, backupstore_invalid, make_deployment_with_pvc, pvc_name, request, volume_name)

Related issue : https://github.com/longhorn/longhorn/issues/1249

This test case does not cover the UI test mentioned in the related issue's test steps."

Setup - Give an incorrect value to Backup target.

Given - Create a volume, attach it to a workload, write data into the volume.

When - Create a backup by a manifest yaml file

Then - Backup will be failed and the backup state is Error. - Backup target will be unavailable with an explanatory condition.

def test_cleanup_system_generated_snapshots(client, core_api, volume_name, csi_pv, pvc, pod_make)

Test Cleanup System Generated Snapshots

  1. Enabled 'Auto Cleanup System Generated Snapshot'.
  2. Create a volume and attach it to a node.
  3. Write some data to the volume and get the checksum of the data.
  4. Delete a random replica to trigger a system generated snapshot.
  5. Repeat Step 3 for 3 times, and make sure only one snapshot is left.
  6. Check the data with the saved checksum.
def test_default_storage_class_syncup(core_api, request)

Steps: 1. Record the current Longhorn-StorageClass-related ConfigMap longhorn-storageclass. 2. Modify the default Longhorn StorageClass longhorn. e.g., update reclaimPolicy from Delete to Retain. 3. Verify that the change is reverted immediately and the manifest is the same as the record in ConfigMap longhorn-storageclass. 4. Delete the default Longhorn StorageClass longhorn. 5. Verify that the StorageClass is recreated immediately with the manifest the same as the record in ConfigMap longhorn-storageclass. 6. Modify the content of ConfigMap longhorn-storageclass. 7. Verify that the modifications will be applied to the default Longhorn StorageClass longhorn immediately. 8. Revert the modifications of the ConfigMaps. Then wait for the StorageClass sync-up.

def test_delete_backup_during_restoring_volume(set_random_backupstore, client)

Test delete backup during restoring volume

Context:

The volume robustness should be faulted if the backup was deleted during restoring the volume.

  1. Given create volume v1 and attach to a node And write data 150M to volume v1
  2. When create a backup of volume v1 And wait for that backup is completed And restore a volume v2 from volume v1 backup And delete the backup immediately
  3. Then volume v2 "robustness" should be "faulted" And "status" of volume restore condition should be "False", And "reason" of volume restore condition should be "RestoreFailure"
def test_deleting_backup_volume(set_random_backupstore, client, volume_name)

Test deleting backup volumes

  1. Create volume and create backup
  2. Delete the backup and make sure it's gone in the backupstore
def test_dr_volume_activated_with_failed_replica(set_random_backupstore, client, core_api, volume_name)

Test DR volume activated with a failed replica

Context:

Make sure that DR volume could be activated as long as there is a ready replica.

Steps:

  1. Create a volume and attach to a node.
  2. Create a backup of the volume with writing some data.
  3. Create a DR volume from the backup.
  4. Disable the replica rebuilding.
  5. Enable the setting allow-volume-creation-with-degraded-availability
  6. Make a replica failed.
  7. Activate the DR volume.
  8. Enable the replica rebuilding.
  9. Attach the volume to a node.
  10. Check if data is correct.
def test_dr_volume_with_backup_and_backup_volume_deleted(set_random_backupstore, client, core_api, volume_name)

Test DR volume can be activated after delete all backups.

Context:

We want to make sure that DR volume can activate after deleting some/all backups or the backup volume.

Steps:

  1. Create a volume and attach to the current node.
  2. Write 4 MB to the beginning of the volume (2 x 2MB backup blocks).
  3. Create backup(0) then backup(1) for the volume.
  4. Verify backup block count == 4.
  5. Create DR volume(1) and DR volume(2) from backup(1).
  6. Verify DR volumes last backup is backup(1).
  7. Delete backup(1).
  8. Verify backup block count == 2.
  9. Verify DR volumes last backup becomes backup(0).
  10. Activate and verify DR volume(1) data is data(0).
  11. Delete backup(0).
  12. Verify backup block count == 0.
  13. Verify DR volume last backup is empty.
  14. Delete the backup volume.
  15. Activate and verify DR volume data is data(0).
def test_dr_volume_with_backup_block_deletion(set_random_backupstore, client, core_api, volume_name)

Test DR volume last backup after block deletion.

Context:

We want to make sure that when the block is delete, the DR volume picks up the correct last backup.

Steps:

  1. Create a volume and attach to the current node.
  2. Write 4 MB to the beginning of the volume (2 x 2MB backup blocks).
  3. Create backup(0) of the volume.
  4. Overwrite backup(0) 1st blocks of data on the volume. (Since backup(0) contains 2 blocks of data, the updated data is data1["content"] + data0["content"][BACKUP_BLOCK_SIZE:])
  5. Create backup(1) of the volume.
  6. Verify backup block count == 3.
  7. Create DR volume from backup(1).
  8. Verify DR volume last backup is backup(1).
  9. Delete backup(1).
  10. Verify backup block count == 2.
  11. Verify DR volume last backup is backup(0).
  12. Overwrite backup(0) 1st blocks of data on the volume. (Since backup(0) contains 2 blocks of data, the updated data is data2["content"] + data0["content"][BACKUP_BLOCK_SIZE:])
  13. Create backup(2) of the volume.
  14. Verify DR volume last backup is backup(2).
  15. Activate and verify DR volume data is data2["content"] + data0["content"][BACKUP_BLOCK_SIZE:].
def test_dr_volume_with_backup_block_deletion_abort_during_backup_in_progress(set_random_backupstore, client, core_api, volume_name)

Test DR volume last backup after block deletion aborted. This will set the last backup to be empty.

Context:

We want to make sure that when the block deletion for the last backup is aborted by operations such as backups in progress, the DR volume will still pick up the correct last backup.

Steps:

  1. Create a volume and attach to the current node.
  2. Write 4 MB to the beginning of the volume (2 x 2MB backup blocks).
  3. Create backup(0) of the volume.
  4. Overwrite backup(0) 1st blocks of data on the volume. (Since backup(0) contains 2 blocks of data, the updated data is data1["content"] + data0["content"][BACKUP_BLOCK_SIZE:])
  5. Create backup(1) of the volume.
  6. Verify backup block count == 3.
  7. Create DR volume from backup(1).
  8. Verify DR volume last backup is backup(1).
  9. Create an artificial in progress backup.cfg file. This cfg file will convince the longhorn manager that there is a backup being created. Then all subsequent backup block cleanup will be skipped.
  10. Delete backup(1).
  11. Verify backup block count == 3 (because of the in progress backup).
  12. Verify DR volume last backup is empty.
  13. Delete the artificial in progress backup.cfg file.
  14. Overwrite backup(0) 1st blocks of data on the volume. (Since backup(0) contains 2 blocks of data, the updated data is data2["content"] + data0["content"][BACKUP_BLOCK_SIZE:])
  15. Create backup(2) of the volume.
  16. Verify DR volume last backup is backup(2).
  17. Activate and verify DR volume data is data2["content"] + data0["content"][BACKUP_BLOCK_SIZE:].
def test_engine_image_daemonset_restart(client, apps_api, volume_name)

Test restarting engine image daemonset

  1. Get the default engine image
  2. Create a volume and attach to the current node
  3. Write random data to the volume and create a snapshot
  4. Delete the engine image daemonset
  5. Engine image daemonset should be recreated
  6. In the meantime, validate the volume data to prove it's still functional
  7. Wait for the engine image to become ready again
  8. Check the volume data again.
  9. Write some data and create a new snapshot.
    1. Since create snapshot will use engine image binary.
  10. Check the volume data again
def test_expand_pvc_with_size_round_up(client, core_api, volume_name)

test expand longhorn volume with pvc

  1. Create LHV,PV,PVC with size '1Gi'
  2. Attach, write data, and detach
  3. Expand volume size to '2000000000/2G' and check if size round up '2000683008/1908Mi'
  4. Attach, write data, and detach
  5. Expand volume size to '2Gi' and check if size is '2147483648'
  6. Attach, write data, and detach
def test_expansion_basic(client, volume_name)

Test volume expansion using Longhorn API

  1. Create volume and attach to the current node
  2. Generate data snap1_data and write it to the volume
  3. Create snapshot snap1
  4. Online expand the volume
  5. Verify the volume has been expanded
  6. Generate data snap2_data and write it to the volume
  7. Create snapshot snap2
  8. Generate data snap3_data and write it after the original size
  9. Create snapshot snap3 and verify the snap3_data with location
  10. Detach and reattach the volume.
  11. Verify the volume is still expanded, and snap3_data remain valid
  12. Detach the volume.
  13. Reattach the volume in maintenance mode
  14. Revert to snap2 and detach.
  15. Attach the volume and check data snap2_data
  16. Generate snap4_data and write it after the original size
  17. Create snapshot snap4 and verify snap4_data.
  18. Detach the volume and revert to snap1
  19. Validate snap1_data

TODO: Add offline expansion

def test_expansion_canceling(client, core_api, volume_name, pod, pvc, storage_class)

Test expansion canceling

  1. Create a volume, then create the corresponding PV, PVC and Pod.
  2. Generate test_data and write to the pod
  3. Create an empty directory with expansion snapshot tmp meta file path so that the following offline expansion will fail
  4. Delete the pod and wait for volume detachment
  5. Try offline expansion via Longhorn API
  6. Wait for expansion failure then use Longhorn API to cancel it
  7. Create a new pod and validate the volume content
  8. Create an empty directory with expansion snapshot tmp meta file path so that the following online expansion will fail
  9. Try online expansion via Longhorn API
  10. Wait for expansion failure then use Longhorn API to cancel it
  11. Validate the volume content again, then re-write random data to the pod
  12. Retry online expansion, then verify the expansion done via Longhorn API
  13. Validate the volume content, then check if data writing looks fine
  14. Clean up pod, PVC, and PV
def test_expansion_with_scheduling_failure(client, core_api, volume_name, pod, pvc, storage_class)

Test if the running volume with scheduling failure can be expanded after the detachment.

Prerequisite: Setting "soft anti-affinity" is false.

  1. Create a volume, then create the corresponding PV, PVC and Pod.
  2. Wait for the pod running and the volume healthy.
  3. Write data to the pod volume and get the md5sum.
  4. Disable the scheduling for a node contains a running replica.
  5. Crash the replica on the scheduling disabled node for the volume. Then delete the failed replica so that it won't be reused.
  6. Wait for the scheduling failure.
  7. Verify: 7.1. volume.ready == True. 7.2. volume.conditions[scheduled].status == False. 7.3. the volume is Degraded. 7.4. the new replica cannot be created.
  8. Write more data to the volume and get the md5sum
  9. Delete the pod and wait for the volume detached.
  10. Verify: 10.1. volume.ready == True. 10.2. volume.conditions[scheduled].status == True
  11. Expand the volume and wait for the expansion succeeds.
  12. Verify there is no rebuild replica after the expansion.
  13. Recreate a new pod for the volume and wait for the pod running.
  14. Validate the volume content.
  15. Verify the expanded part can be read/written correctly.
  16. Enable the node scheduling.
  17. Wait for the volume rebuild succeeds.
  18. Verify the data written in the expanded part.
  19. Clean up pod, PVC, and PV.

Notice that the step 1 to step 10 is identical with those of the case test_running_volume_with_scheduling_failure().

def test_expansion_with_size_round_up(client, core_api, volume_name)

test expand longhorn volume

  1. Create and attach longhorn volume with size '1Gi'.
  2. Write data, and offline expand volume size to '2000000000/2G'.
  3. Check if size round up '2000683008' and the written data.
  4. Write data, and online expand volume size to '2Gi'.
  5. Check if size round up '2147483648' and the written data.
def test_filesystem_trim(client, fs_type)

Test the filesystem in the volume can be trimmed correctly.

  1. Create a volume with option unmapMarkSnapChainRemoved enabled, then attach to the current node.
  2. Make a filesystem and write file0 into the fs, calculate the checksum, then take snap0.
  3. Write file21 and calculate the checksum. Then take snap21.
  4. Unmount then reattach the volume without frontend. Revert the volume to snap0.
  5. Reattach and mount the volume.
  6. Write file11. Then take snap11.
  7. Write file12. Then take snap12.
  8. Write file13. Then remove file0, file11, file12, and file13. Verify the snapshots and volume head size are not shrunk.
  9. Do filesystem trim (via Longhorn API or cmdline). Verify that:
    1. snap11 and snap12 are marked as removed.
    2. snap11, snap12, and volume head size are shrunk.
  10. Disable option unmapMarkSnapChainRemoved for the volume.
  11. Write file14. Then take snap14.
  12. Write file15. Then remove file14 and file15. Verify that:
    1. snap14 is not marked as removed and its size is not changed.
    2. volume head size is shrunk.
  13. Unmount and reattach the volume. Then revert to snap21.
  14. Reattach and mount the volume. Verify the file0 and file21.
  15. Cleanup.
def test_hosts(client)

Check node name and IP

def test_listing_backup_volume(client, backing_image='')

Test listing backup volumes

  1. Create three volumes: volume1/2/3
  2. Setup NFS backupstore since we can manipulate the content easily
  3. Create multiple snapshots for all three volumes
  4. Rename volume1's volume.cfg to volume.cfg.tmp in backupstore
  5. List backup volumes. Make sure volume1 errors out but found other two
  6. Restore volume1's volume.cfg.
  7. Make sure now backup volume volume1 can be found
  8. Delete backups for volume1/2, make sure they cannot be found later
  9. Corrupt a backup.cfg on volume3
  10. Check that the backup is listed with the other backups of volume3
  11. Verify that the corrupted backup has Messages of type error
  12. Check that backup inspection for the previously corrupted backup fails
  13. Delete backups for volume3, make sure they cannot be found later
def test_multiple_volumes_creation_with_degraded_availability(set_random_backupstore, client, core_api, apps_api, storage_class, statefulset)

Scenario: verify multiple volumes with degraded availability can be created, attached, detached, and deleted at nearly the same time.

Given new StorageClass created with numberOfReplicas=5.

When set allow-volume-creation-with-degraded-availability to True. And deploy this StatefulSet: https://github.com/longhorn/longhorn/issues/2073#issuecomment-742948726 Then all 10 volumes are healthy in 1 minute.

When delete the StatefulSet. then all 10 volumes are detached in 1 minute.

When find and delete the PVC of the 10 volumes. Then all 10 volumes are deleted in 1 minute.

def test_pvc_storage_class_name_from_backup_volume(set_random_backupstore, core_api, client, volume_name, pvc_name, pvc, pod_make, storage_class)

Test the storageClasName of the restored volume's PV/PVC should be from the backup volume

Given - Create a new StorageClass kind: StorageClass apiVersion: storage.k8s.io/v1 metadata: name: longhorn-test provisioner: driver.longhorn.io allowVolumeExpansion: true reclaimPolicy: Delete volumeBindingMode: Immediate parameters: numberOfReplicas: "3" - Create a PVC to use this SC apiVersion: v1 kind: PersistentVolumeClaim metadata: name: test-pvc spec: accessModes: - ReadWriteOnce storageClassName: longhorn-test resources: requests: storage: 300Mi - Attach the Volume and write some data

When - Backup the Volume

Then - the backupvolume's status.storageClassName should be longhorn-test

When - Restore the backup to a new volume - Create PV/PVC from the new volume with create new PVC option

Then - The new PVC's storageClassName should still be longhorn-test - Verify the restored data is the same as original one

def test_restore_basic(set_random_backupstore, client, core_api, volume_name, pod)

Steps: 1. Create a volume and attach to a pod. 2. Write some data into the volume and compute the checksum m1. 3. Create a backup say b1. 4. Write some more data into the volume and compute the checksum m2. 5. Create a backup say b2. 6. Delete all the data from the volume. 7. Write some more data into the volume and compute the checksum m3. 8. Create a backup say b3. 9. Restore backup b1 and verify the data with m1. 10. Restore backup b2 and verify the data with m1 and m2. 11. Restore backup b3 and verify the data with m3. 12. Delete the backup b2. 13. restore the backup b3 and verify the data with m3.

def test_restore_inc(set_random_backupstore, client, core_api, volume_name, pod)

Test restore from disaster recovery volume (incremental restore)

Run test against all the backupstores

  1. Create a volume and attach to the current node
  2. Generate data0, write to the volume, make a backup backup0
  3. Create three DR(standby) volumes from the backup: sb_volume0/1/2
  4. Wait for all three DR volumes to start the initial restoration
  5. Verify DR volumes's lastBackup is backup0
  6. Verify snapshot/pv/pvc/change backup target are not allowed as long as the DR volume exists
  7. Activate standby sb_volume0 and attach it to check the volume data
  8. Generate data1 and write to the original volume and create backup1
  9. Make sure sb_volume1's lastBackup field has been updated to backup1
  10. Wait for sb_volume1 to finish incremental restoration then activate
  11. Attach and check sb_volume1's data
  12. Generate data2 and write to the original volume and create backup2
  13. Make sure sb_volume2's lastBackup field has been updated to backup1
  14. Wait for sb_volume2 to finish incremental restoration then activate
  15. Attach and check sb_volume2's data
  16. Create PV, PVC and Pod to use sb_volume2, check PV/PVC/POD are good

FIXME: Step 16 works because the disk will be treated as a unformatted disk

def test_restore_inc_with_offline_expansion(set_random_backupstore, client, core_api, volume_name, pod)

Test restore from disaster recovery volume with volume offline expansion

Run test against a random backupstores

  1. Create a volume and attach to the current node
  2. Generate data0, write to the volume, make a backup backup0
  3. Create three DR(standby) volumes from the backup: dr_volume0/1/2
  4. Wait for all three DR volumes to start the initial restoration
  5. Verify DR volumes's lastBackup is backup0
  6. Verify snapshot/pv/pvc/change backup target are not allowed as long as the DR volume exists
  7. Activate standby dr_volume0 and attach it to check the volume data
  8. Expand the original volume. Make sure the expansion is successful.
  9. Generate data1 and write to the original volume and create backup1
  10. Make sure dr_volume1's lastBackup field has been updated to backup1
  11. Activate dr_volume1 and check data data0 and data1
  12. Generate data2 and write to the original volume after original SIZE
  13. Create backup2
  14. Wait for dr_volume2 to finish expansion, show backup2 as latest
  15. Activate dr_volume2 and verify data2
  16. Detach dr_volume2
  17. Create PV, PVC and Pod to use sb_volume2, check PV/PVC/POD are good

FIXME: Step 16 works because the disk will be treated as a unformatted disk

def test_running_volume_with_scheduling_failure(client, core_api, volume_name, pod)

Test if the running volume still work fine when there is a scheduling failed replica

Prerequisite: Setting "soft anti-affinity" is false. Setting "replica-replenishment-wait-interval" is 0

  1. Create a volume, then create the corresponding PV, PVC and Pod.
  2. Wait for the pod running and the volume healthy.
  3. Write data to the pod volume and get the md5sum.
  4. Disable the scheduling for a node contains a running replica.
  5. Crash the replica on the scheduling disabled node for the volume.
  6. Wait for the scheduling failure.
  7. Verify: 7.1. volume.ready == True. 7.2. volume.conditions[scheduled].status == False. 7.3. the volume is Degraded. 7.4. the new replica cannot be created.
  8. Write more data to the volume and get the md5sum
  9. Delete the pod and wait for the volume detached.
  10. Verify: 10.1. volume.ready == True. 10.2. volume.conditions[scheduled].status == True
  11. Recreate a new pod for the volume and wait for the pod running.
  12. Validate the volume content, then check if data writing looks fine.
  13. Clean up pod, PVC, and PV.
def test_setting_default_replica_count(client, volume_name)

Test Default Replica Count setting

  1. Set default replica count in the global settings to 5
  2. Create a volume without specify the replica count
  3. The volume should have 5 replicas (instead of the previous default 3)
def test_settings(client)

Check input for settings

def test_snapshot(client, volume_name, backing_image='')

Test snapshot operations

  1. Create a volume and attach to the node
  2. Create the empty snapshot snap1
  3. Generate and write data snap2_data, then create snap2
  4. Generate and write data snap3_data, then create snap3
  5. List snapshot. Validate the snapshot chain relationship
  6. Mark snap3 as removed. Make sure volume's data didn't change
  7. List snapshot. Make sure snap3 is marked as removed
  8. Detach and reattach the volume in maintenance mode.
  9. Make sure the volume frontend is still blockdev but disabled
  10. Revert to snap2
  11. Detach and reattach the volume with frontend enabled
  12. Make sure volume's data is snap2_data
  13. List snapshot. Make sure volume-head is now snap2's child
  14. Delete snap1 and snap2
  15. Purge the snapshot.
  16. List the snapshot, make sure snap1 and snap3 are gone. snap2 is marked as removed.
  17. Check volume data, make sure it's still snap2_data.
def test_snapshot_prune(client, volume_name, backing_image='')

Test removing the snapshot directly behinds the volume head would trigger snapshot prune. Snapshot pruning means removing the overlapping part from the snapshot based on the volume head content.

  1. Create a volume and attach to the node
  2. Generate and write data snap1_data, then create snap1
  3. Generate and write data snap2_data with the same offset.
  4. Mark snap1 as removed. Make sure volume's data didn't change. But all data of the snap1 will be pruned.
  5. Detach and expand the volume, then wait for the expansion done. This will implicitly create a new snapshot snap2.
  6. Attach the volume. Make sure there is a system snapshot with the old size.
  7. Generate and write data snap3_data which is partially overlapped with snap2_data, plus one extra data chunk in the expanded part.
  8. Mark snap2 as removed then do snapshot purge. Make sure volume's data didn't change. But the overlapping part of snap2 will be pruned.
  9. Create snap3.
  10. Do snapshot purge for the volume. Make sure snap2 will be removed.
  11. Generate and write data snap4_data which has no overlapping with snap3_data.
  12. Mark snap3 as removed. Make sure volume's data didn't change. But there is no change for snap3.
  13. Create snap4.
  14. Generate and write data snap5_data, then create snap5.
  15. Detach and reattach the volume in maintenance mode.
  16. Make sure the volume frontend is still blockdev but disabled
  17. Revert to snap4
  18. Detach and reattach the volume with frontend enabled
  19. Make sure volume's data is correct.
  20. List snapshot. Make sure volume-head is now snap4's child
def test_snapshot_prune_and_coalesce_simultaneously(client, volume_name, backing_image='')

Test the prune for the snapshot directly behinds the volume head would be handled after all snapshot coalescing done.

  1. Create a volume and attach to the node
  2. Generate and write 1st data chunk snap1_data, then create snap1
  3. Generate and write 2nd data chunk snap2_data, then create snap2
  4. Generate and write 3rd data chunk snap3_data, then create snap3
  5. Generate and write 4th data chunk snap4_data, then create snap4
  6. Overwrite all existing data chunks in the volume head.
  7. Mark all snapshots as Removed, then start snapshot purge and wait for complete.
  8. List snapshot. Make sure there are only 2 snapshots left: volume-head and snap4. And snap4 is an empty snapshot.
  9. Make sure volume's data is correct.
def test_space_usage_for_rebuilding_only_volume(client, volume_name, request)

Test case: the normal scenario 1. Prepare a 7Gi volume as a node disk. 2. Create a new volume with 3Gi spec size. 3. Write 3Gi data (using dd) to the volume. 4. Take a snapshot then mark this snapshot as Removed. (this snapshot won't be deleted immediately.) 5. Write 3Gi data (using dd) to the volume again. 6. Delete a random replica to trigger the rebuilding. 7. Wait for the rebuilding complete. And verify the volume actual size won't be greater than 2x of the volume spec size. 8. Delete the volume.

def test_space_usage_for_rebuilding_only_volume_worst_scenario(client, volume_name, request)

Test case: worst scenario 1. Prepare a 7Gi volume as a node disk. 2. Create a new volume with 2Gi spec size. 3. Write 2Gi data (using dd) to the volume. 4. Take a snapshot then mark this snapshot as Removed. (this snapshot won't be deleted immediately.) 5. Write 2Gi data (using dd) to the volume again. 6. Delete a random replica to trigger the rebuilding. 7. Write 2Gi data once the rebuilding is trigger (new replica is created). 8. Wait for the rebuilding complete. And verify the volume actual size won't be greater than 3x of the volume spec size. 9. Delete the volume.

def test_storage_class_from_backup(set_random_backupstore, volume_name, pvc_name, storage_class, client, core_api, pod_make)

Test restore backup using StorageClass

  1. Create volume and PV/PVC/POD
  2. Write test_data into pod
  3. Create a snapshot and back it up. Get the backup URL
  4. Create a new StorageClass longhorn-from-backup and set backup URL.
  5. Use longhorn-from-backup to create a new PVC
  6. Wait for the volume to be created and complete the restoration.
  7. Create the pod using the PVC. Verify the data
def test_volume_backup_and_restore_with_gzip_compression_method(client, set_random_backupstore, volume_name)

Scenario: test volume backup and restore with different compression methods

Issue: https://github.com/longhorn/longhorn/issues/5189

Given setup Backup Compression Method is "gzip" And setup backup concurrent limit is "4" And setup restore concurrent limit is "4"

When create a volume and attach to the current node And get the volume's details Then verify the volume's compression method is "gzip"

Then Create a backup of volume And Write volume random data Then restore the backup to a new volume And Attach the new volume and verify the data integrity And Detach the volume and delete the backup And Wait for the restored volume's lastBackup to be cleaned (due to remove the backup) And Delete the volume

def test_volume_backup_and_restore_with_lz4_compression_method(client, set_random_backupstore, volume_name)

Scenario: test volume backup and restore with different compression methods

Issue: https://github.com/longhorn/longhorn/issues/5189

Given setup Backup Compression Method is "lz4" And setup backup concurrent limit is "4" And setup restore concurrent limit is "4"

When create a volume and attach to the current node And get the volume's details Then verify the volume's compression method is "lz4"

Then Create a backup of volume And Write volume random data Then restore the backup to a new volume And Attach the new volume and verify the data integrity Then Detach the volume and delete the backup And Wait for the restored volume's lastBackup to be cleaned (due to remove the backup) And Delete the volume

def test_volume_backup_and_restore_with_none_compression_method(client, set_random_backupstore, volume_name)

Scenario: test volume backup and restore with different compression methods

Issue: https://github.com/longhorn/longhorn/issues/5189

Given setup Backup Compression Method is "none" And setup backup concurrent limit is "4" And setup restore concurrent limit is "4"

When create a volume and attach to the current node And get the volume's details Then verify the volume's compression method is "none"

Then Create a backup of volume And Write volume random data Then restore the backup to a new volume And Attach the new volume and verify the data integrity And Detach the volume and delete the backup And Wait for the restored volume's lastBackup to be cleaned (due to remove the backup) And Delete the volume

def test_volume_basic(client, volume_name)

Test basic volume operations:

  1. Check volume name and parameter
  2. Create a volume and attach to the current node, then check volume states
  3. Check soft anti-affinity rule
  4. Write then read back to check volume data
def test_volume_iscsi_basic(client, volume_name)

Test basic volume operations with iscsi frontend

  1. Create and attach a volume with iscsi frontend
  2. Check the volume endpoint and connect it using the iscsi initiator on the node.
  3. Write then read back volume data for validation
def test_volume_metafile_deleted(client, core_api, volume_name, csi_pv, pvc, pod, pod_make)

Scenario:

Test volume should still work when the volume meta file is removed in the replica data path.

Steps:

  1. Delete volume meta file in this replica data path
  2. Recreate the pod and wait for the volume attached
  3. Check if the volume is Healthy after the volume attached
  4. Check volume data
  5. Check if the volume still works fine by r/w data and creating/removing snapshots
def test_volume_metafile_deleted_when_writing_data(client, core_api, volume_name, csi_pv, pvc, pod, pod_make)

Scenario:

While writing data, test volume should still work when the volume meta file is deleted in the replica data path.

Steps:

  1. Create a pod using Longhorn volume
  2. Delete volume meta file in this replica data path
  3. Recreate the pod and wait for the volume attached
  4. Check if the volume is Healthy after the volume attached
  5. Check volume data
  6. Check if the volume still works fine by r/w data and creating/removing snapshots
def test_volume_metafile_empty(client, core_api, volume_name, csi_pv, pvc, pod, pod_make)

Scenario:

Test volume should still work when there is an invalid volume meta file in the replica data path.

Steps:

  1. Remove the content of the volume meta file in this replica data path
  2. Recreate the pod and wait for the volume attached
  3. Check if the volume is Healthy after the volume attached
  4. Check volume data
  5. Check if the volume still works fine by r/w data and creating/removing snapshots
def test_volume_multinode(client, volume_name)

Test the volume can be attached on multiple nodes

  1. Create one volume
  2. Attach it on every node once, verify the state, then detach it
def test_volume_scheduling_failure(client, volume_name)

Test fail to schedule by disable scheduling for all the nodes

Also test cannot attach a scheduling failed volume

  1. Disable allowScheduling for all nodes
  2. Create a volume.
  3. Verify the volume condition Scheduled is false
  4. Verify the volume is not ready for workloads
  5. Verify attaching the volume will result in error
  6. Enable allowScheduling for all nodes
  7. Volume should be automatically scheduled (condition become true)
  8. Volume can be attached now
def test_volume_toomanysnapshots_condition(client, core_api, volume_name)

Test Volume TooManySnapshots Condition

  1. Create a volume and attach it to a node.
  2. Check the 'TooManySnapshots' condition is False.
  3. Writing data to this volume and meanwhile taking 101 snapshots.
  4. Check the 'TooManySnapshots' condition is True.
  5. Take one more snapshot to make sure snapshots works fine.
  6. Delete 2 snapshots, and check the 'TooManySnapshots' condition is False.
def test_volume_update_replica_count(client, volume_name)

Test updating volume's replica count

  1. Create a volume with 2 replicas
  2. Attach the volume
  3. Increase the replica to 3.
  4. Volume will become degraded and start rebuilding
  5. Wait for rebuilding to complete
  6. Update the replica count to 2. Volume should remain healthy
  7. Remove 1 replicas, so there will be 2 replicas in the volume
  8. Verify the volume is still healthy

Volume should always be healthy even only with 2 replicas.

def test_workload_with_fsgroup(core_api, statefulset)
  1. Deploy a StatefulSet workload that uses Longhorn volume and has securityContext set: securityContext: runAsUser: 1000 runAsGroup: 1000 fsGroup: 1000 See https://github.com/longhorn/longhorn/issues/2964#issuecomment-910117570 for an example.
  2. Wait for the workload pod to be running
  3. Exec into the workload pod, cd into the mount point of the volume.
  4. Verify that the mount point has correct filesystem permission (e.g., running ls -l on the mount point should return the permission in the format *rw*
  5. Verify that we can read/write files.
def volume_basic_test(client, volume_name, backing_image='')
def volume_iscsi_basic_test(client, volume_name, backing_image='')
def volume_rw_test(dev)