Skip to content

Introduce new native backup provider (KNIB)#12758

Open
JoaoJandre wants to merge 2 commits intoapache:mainfrom
scclouds:new-native-backup-provider
Open

Introduce new native backup provider (KNIB)#12758
JoaoJandre wants to merge 2 commits intoapache:mainfrom
scclouds:new-native-backup-provider

Conversation

@JoaoJandre
Copy link
Contributor

Description

This PR adds a new native incremental backup provider for KVM. The design document which goes into details of the implementation can be found on https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=406622120.

The validation process which is detailed in the design document will be added to this PR soon.
The file extraction process will be added in a later PR.

This PR adds a few new APIs:

  • The createNativeBackupOffering API has the following parameters:

    Parameter Description Default Value Required
    name Specifies the name of the offering - Yes
    compress Specifies whether the offering supports backup compression false No
    validate Specifies whether the offering supports backup validation false No
    allowQuickRestore Specifies whether the offering supports quick restore false No
    allowExtractFile Specifies whether the offering supports file extraction from backups false No
    backupchainsize Backup chain size for backups created with this offering. If this is set, it overrides the backup.chain.size setting - No
    compressionlibrary Specifies the compression library for offerings that support compression. Accepted values are zstd and zlib. By default, zstd is used for images that support it. If the image only supports zlib, it will be used regardless of this parameter. zstd No
  • The deleteNativeBackupOffering API has the following parameter:

    Parameter Description Required
    id Identifier of the native backup offering. Yes

    A native backup offering can only be removed if it is not currently imported.

  • The listNativeBackupOfferings API has the following parameters:

    Parameter Description Required
    id Identifier of the offering. No
    compress Lists only offerings that support backup compression. No
    validate Lists only offerings that support backup validation. No
    allowQuickRestore Lists only offerings that support quick restore. No
    allowExtractFile Lists only offerings that support file extraction from backups. No
    showRemoved Lists also offerings that have already been removed. false No
  • The listBackupCompressionJobs has the following parameters

    Parameter Description
    id List only the job with the specified ID
    backupid List jobs associated with the specified backup
    hostid List jobs associated with the specified host. When this parameter is provided, the executing parameter is implicit
    zoneid List jobs associated with the specified zone
    type List jobs of the specified type. Accepts Starting or Finalizing
    executing List jobs that are currently executing
    scheduled List jobs scheduled to run in the future

    By default, lists all offerings that have not been removed.

It also adds parameters to the following APIs:

  • The isolated parameter was added to the createBackup and createBackupSchedule APIs
  • The quickRestore parameter was added to the restoreBackup, restoreVolumeFromBackupAndAttachToVM and createVMFromBackup APIs
  • The hostId parameter was added to the restoreBackup and restoreVolumeFromBackupAndAttachToVM APIs, which can only be used by root admins and only when quick restore is true.

New settings were also added:

Configuration Description Default Value
backup.chain.size Determines the max size of a backup chain. If cloud admins set it to 1 , all the backups will be full backups. With values lower than 1, the backup chain will be unlimited, unless it is stopped by another process. Please note that unlimited backup chains have a higher chance of getting corrupted, as new backups will be dependent on all of the older ones. 8
knib.timeout Timeout, in seconds, to execute KNIB commands. After the command times out, the Management Server will still wait for another knib.timeout seconds to receive a response from the Agent. 43200
backup.compression.task.enabled Determines whether the task responsible for scheduling compression jobs is active. If not, compression jobs will not run true
backup.compression.max.concurrent.compressions.per.host Maximum number of concurrent compression jobs. Compression finalization jobs ignore this setting 5
backup.compression.max.job.retries Maximum number of attempts for executing compression jobs 2
backup.compression.retry.interval Interval, in minutes, between attempts to run compression jobs 60
backup.compression.timeout Timeout, in seconds, for running compression jobs 28800
backup.compression.minimum.free.storage Minimum required available storage to start the backup compression process. This setting accepts a real number that is multiplied by the total size of the backup to determine the necessary available space. By default, the storage must have the same amount of available space as the space occupied by the backup. 1
backup.compression.coroutines Number of coroutines used for the compression process, each coroutine has its own thread 1
backup.compression.rate.limit Compression rate limit, in MB/s. Values less than 1 disable the limit 0

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • Build/CI
  • Test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

Tests related to disk-only VM snapshots

N Test Result
1 Take disk-only VM snapshot ok
2 Take disk-only VM snapshot again ok
3 Stop VM, revert to snapshot 2, start VM Correct deltas found in the VM volume chain
4 Stop VM, revert to snapshot 1, start VM Correct deltas found in the VM volume chain
5 Take disk-only VM snapshot ok
6 Remove disk-only VM snapshot 1 Marked as removed, not removed from storage
7 Remove disk-only VM snapshot 3 Merged with current volume
8 Remove disk-only VM snapshot 2 Removed, snap 1 merged with the current volume

Basic tests with backup

Using backup.chain.size=3

N Test Result
1 With the VM stopped, I created a backup (b1) Full backup created
2 Started VM, wrote data, created a second backup (b2) Incremental backup created
3 Stopped the VM, went back to backup 1, started Ok, VM without data
4 Stopped the VM, went back to backup 2, started Ok, data from the backup created in test 2 present
5 Created 4 backups (b3, b4, b5, b6) b3 was a full, b4 and b5 incremental, b6 full
6 Removing the last backup (b6) the delta on the primary was merged with the volume
7 Removed backups b4 and b5 they were marked as removed, but not deleted from storage
8 Batch removing the remaining backups Ok, all removed
9 Created a new backup ok
10 Detached the VM from the offer Deltas were merged on primary
11 Removing this last backup ok

Interactions with other functionalities

I created a new VM with a root disk and a data disk for the tests below.

N Test Result
1 Took a new backup and migrated the VM ok
2 Migrated the VM + one of the volumes ok, the migrated volume had no delta on the primary, the other volume still had a delta
3 Took a new backup For the volume that was not migrated the backup was incremental, for the migrated volume, it was a full backup
4 I took 2 backups OK, the finished normally
5 Try restoring one of the backups from before the migration OK
6 Created file 1, created backup b1 OK
7 Created file 2, created VM snap s1 OK
8 Created file 3, created VM snap s2 OK
9 Created file 4, created backup b2 OK
10 Created file 5, created backup b3 OK
11 Stopped the VM, restored VM snap s1, started Files 1 and 2 present
12 Stopped the VM, restored VM snap s2, started Files 1, 2 and 3 present
13 Removed VM snapshots ok
14 Restored backup b1, started file 1 present
15 Restored backup b2, started files 1, 2, 3 and 4 present
16 Restored backup b3, started files 1, 2, 3, 4 and 5 present
17 Took a new backup b4 ok
18 Attached a new volume, wrote data, took a backup b5 ok
19 Stopped the VM, restored backup b4, started the VM the new volume was not affected by the restoration
20 Detached the volume, restored backup b5 a new volume was created and attached to the VM, the files were there
21 Created a backup ok
22 Created a volume snapshot OK
23 Revert volume snapshot I verified that the delta on the primary left by the last backup was removed

Configuration Tests

  • I changed the value of the backup.compression.task.enabled setting and verified that no new jobs were started. I verified that when returned to true, they were executed.
  • I changed the value of the backup.compression.max.concurrent.compressions.per.host setting and verified that the number of jobs executed simultaneously for each host was relative to the value of the setting. I also verified that the value -1 does not limit the number of jobs executed by the host.
  • I verified that the number of retries respects the backup.compression.max.job.retries setting.
  • I verified that the time between retries respects the backup.compression.retry.interval setting.
  • I changed the value of the backup.compression.minimum.free.storage setting and verified that the job failed if there was not enough free space.
  • I changed the value of the backup.compression.coroutines setting and verified that the value passed to qemu-img was reflected.
  • I changed the value of the backup.compression.rate.limit setting and verified that the value was passed to qemu-img.

Compression Tests

Tests performed with an offer that provides compressed backups support

Test Result
Create full backup Backup created and compressed
Create incremental backup Backup created and compressed
Create 10 backups of the same machine Backups created sequentially, but compressed in parallel

Tests with restoreVolumeFromBackupAndAttachToVM

N Test Result
1 Restore volume A from VM 1 to the same VM while it is stopped New volume created, restored, and attached
2 Restore volume B from VM 1 to the same VM while it is running New volume created, restored, and attached
3 Restore volume B from VM 1 to VM 2 while it is running New volume created, restored, and attached
4 Restore volume A from VM 1 to VM 2 while it is running, even though the VM was deleted New volume created, restored, and attached
5 Restore a volume to a VM using quickrestore New volume created, attached, and consolidated with the backup
6 Restore a volume to a stopped VM using quickrestore and specifying the hostId New volume created, attached, VM started on the specified host, and volume consolidated

Tests with restoreBackup

N Test Result
1 Restore VM without quickrestore OK
2 Restore VM with quickrestore Volumes restored, VM started, and volumes consolidated
3 Restore VM with quickrestore and specifying hostId Volumes restored, VM started on the specified host, and volumes consolidated
4 Detach a volume from the VM and repeat test 3 Detached volume duplicated, attached to the VM, restored, and VM started on the host, volumes consolidated

@codecov
Copy link

codecov bot commented Mar 6, 2026

Codecov Report

❌ Patch coverage is 4.94580% with 2806 lines in your changes missing coverage. Please review.
✅ Project coverage is 17.84%. Comparing base (608345d) to head (8a94dcb).
⚠️ Report is 8 commits behind head on main.

Files with missing lines Patch % Lines
...e/wrapper/LibvirtTakeKnibBackupCommandWrapper.java 0.49% 202 Missing ⚠️
...he/cloudstack/backup/BackupCompressionService.java 0.00% 201 Missing ⚠️
...ervisor/kvm/resource/LibvirtComputingResource.java 4.56% 188 Missing ⚠️
...che/cloudstack/backup/NativeBackupServiceImpl.java 0.00% 134 Missing ⚠️
...apache/cloudstack/storage/backup/BackupObject.java 0.00% 91 Missing ⚠️
...napshot/KvmFileBasedStorageVmSnapshotStrategy.java 0.00% 82 Missing ⚠️
...tack/engine/orchestration/StorageOrchestrator.java 0.00% 80 Missing ⚠️
...e/wrapper/LibvirtCompressBackupCommandWrapper.java 1.40% 70 Missing ⚠️
...ache/cloudstack/backup/BackupCompressionJobVO.java 0.00% 68 Missing ⚠️
...cloudstack/backup/dao/NativeBackupJoinDaoImpl.java 0.00% 68 Missing ⚠️
... and 101 more
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #12758      +/-   ##
============================================
- Coverage     17.93%   17.84%   -0.09%     
- Complexity    16159    16208      +49     
============================================
  Files          5939     5991      +52     
  Lines        533147   537115    +3968     
  Branches      65237    65588     +351     
============================================
+ Hits          95603    95837     +234     
- Misses       426803   430509    +3706     
- Partials      10741    10769      +28     
Flag Coverage Δ
uitests 3.66% <ø> (-0.01%) ⬇️
unittests 18.93% <4.94%> (-0.11%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants