Deduplication ratio

The deduplication ratio shows the size of archives in a deduplicating vault in relation to the size they would occupy in a non-deduplicating vault.

For example, suppose that you are backing up two files with identical content from two machines. If the size of each file is one gigabyte, then the size of the backups in a non-deduplicating vault will be approximately 2 GB, but this size will be just about 1 GB in a deduplicating vault. This gives a deduplication ratio of 2:1, or 50%.

Conversely, if the two files had different content, the backup sizes in non-deduplicating and duplicating vaults would be the same (2 GB), and the deduplication ratio would be 1:1, or 100%.

  What ratio to expect  

Although, in some situations, the deduplication ratio may be very high (in the previous example, increasing the number of machines would lead to ratios of 3:1, 4:1, etc.), a reasonable expectation for a typical environment is a ratio between 1.2:1 and 1.6:1.

As a more realistic example, suppose that you are performing a file-level or disk-level backup of two machines with similar disks. On each machine, the files common to all the machines occupy 50% of disk space (say, 1 GB); the files that are specific to each machine occupy the other 50% (another 1 GB).

In a deduplicating vault, the size of the first machine’s backup in this case will be 2 GB, and that of the second machine will be 1 GB. In a non-deduplicating vault, the backups would occupy 4 GB in total. As a result, the deduplication ratio is 4:3, or about 1.33:1.

Similarly, in case of three machines, the ratio becomes 1.5:1; for four machines, it is 1.6:1. It approaches 2:1 as more such machines are backed up to the same vault. This means that you can buy, say, a 10-TB storage device instead of a 20-TB one.

The actual amount of capacity reduction is influenced by numerous factors such as the type of data that is being backed up, the frequency of the backup, and the backups’ retention period.

Deduplication ratio