- Set up a
BackupStoreanywhere (since the cleanup fails at theEnginelevel, anyBackupStorecan be used. - Add both of the
Engine Imageslisted here:
quay.io/ttpcodes/longhorn-engine:no-cleanup-SnapshotandBackupdeletion are both set to return an error. If theSnapshotpart of aBackupfails, that will error out first andBackupdeletion will not be reached.quay.io/ttpcodes/longhorn-engine:no-cleanup-backup- OnlyBackupdeletion is set to return an error. TheSnapshotpart of aBackupshould succeed, and theBackupdeletion will fail.
The next steps need to be repeated for each Engine Image (this is to test the code for Snapshots and Backups separately).
- Create a
Volumeand run anEngine Upgradeto use one of the above images. - Attach the
Volumeand create aRecurring Jobfor testing. You can use a configuration that runs once every 3 minutes and only retains oneBackup. - You should only see one
SnapshotorBackupcreated per invocation. Once enoughBackupsorSnapshotshave been created and theJobattempts to delete the old ones, you will see something in the logs for thePodfor theJobsimilar to the following (as a result of using the providedEngine Imagesthat do not have workingSnapshotorBackupdeletion:
time="2020-06-08T20:05:10Z" level=warning msg="created snapshot successfully but errored on cleanup for test: error deleting snapshot 'c-c3athc-fd3adb1e': Failed to execute: /var/lib/longhorn/engine-binaries/quay.io-ttpcodes-longhorn-engine-no-cleanup/longhorn [--url 10.42.0.188:10000 snapshot rm c-c3athc-fd3adb1e], output , stderr, time=\"2020-06-08T20:05:10Z\" level=fatal msg=\"stubbed snapshot deletion for testing\"\n, error exit status 1"
The Job should nonetheless run successfully according to Kubernetes. This can be verified by using kubectl -n longhorn-system get jobs to find the name of the Recurring Job and using kubectl -n longhorn-system describe job <job-name> to view the details, which should show that the Jobs ran successfully.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 4m50s cronjob-controller Created job test-c-yxam34-c-1591652160
Normal SawCompletedJob 4m10s cronjob-controller Saw completed job: test-c-yxam34-c-1591652160, status: Complete
Normal SuccessfulCreate 109s cronjob-controller Created job test-c-yxam34-c-1591652340
Normal SawCompletedJob 59s cronjob-controller Saw completed job: test-c-yxam34-c-1591652340, status: Complete
Additional invocations should not be attempted on that Pod that would result in multiple Backups or Snapshots being created at the same time.
Note that while the Engine Images being used to test this fix cause old Backups/Snapshots to not be deleted, even accounting for the extra Backups and Snapshots, you should not see multiple Backups being created at the same time. You should only see enough Backups/Snapshots that match the Job interval (since old Backups and Snapshots would not get deleted) without any extras.