How deduplication works

  Deduplication at source  

When performing a backup to a deduplicating vault, Acronis Backup & Recovery 10 Agent reads items being backed up—disk blocks for disk backup or files for file backup—and calculates a fingerprint of each block. Such a fingerprint, often called a hash value, uniquely represents the item’s content within the vault.

Before sending the item to the vault, the agent queries the deduplication database to determine whether the item’s hash value is the same as that of an already stored item.

If so, the agent sends only the item’s hash value; otherwise, it sends the item itself.

Some items, such as encrypted files or disk blocks of a non-standard size, cannot be deduplicated, and the agent always transfers such items to the vault without calculating their hash values. For more information about restrictions of file-level and disk-level deduplication, see Deduplication restrictions.

  Deduplication at target  

After backup to a deduplicating vault is completed, the storage node runs the indexing task to deduplicate data in the vault as follows:

  1. It moves the items (disk blocks or files) from the archives to a special file within the vault, storing duplicate items there only once. This file is called the deduplication data store. If there are both disk-level and file-level backups in the vault, there are two separate data stores for them. Items that cannot be deduplicated remain in the archives.
  2. In the archives, it replaces the moved items with the corresponding references to them.

As a result, the vault contains a number of unique, deduplicated items, with each item having one or more references to it from the vault’s archives.

The indexing task may take considerable time to complete. You can see this task’s state in the Tasks view on the management server.

  Compacting  

After one or more backups or archives have been deleted from the vault—either manually or during cleanup—the vault may contain items which are no longer referred to from any archive. Such items are deleted by the compacting task, which is a scheduled task performed by the storage node.

By default, the compacting task runs every Sunday night at 03:00. You can re-schedule the task as described in Actions on storage nodes, under “Change the compacting task schedule”. You can also manually start or stop the task from the Tasks view.

Because deletion of unused items is resource-consuming, the compacting task performs it only when a sufficient amount of data to delete has accumulated. The threshold is determined by the Compacting Trigger Threshold configuration parameter.

How deduplication works