×

Synchronized data deduplication

  • US 8,930,306 B1
  • Filed: 07/08/2009
  • Issued: 01/06/2015
  • Est. Priority Date: 07/08/2009
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method for performing data deduplication for data used by a plurality of computing systems, the method comprising:

  • receiving, over a computer network and at a shared deduplicated storage repository, data from a first computing system of a plurality of computing systems, each of the plurality of computing systems physically separate from the shared deduplicated storage repository and including application software executing thereon, the application software of the first computing system generating the received data;

    with a processing system which comprises computer hardware, is physically separate from the plurality of computing systems, and is located at the shared deduplicated storage repository, performing a data deduplication operation on the received data, the deduplication operation comprising;

    defining a segment of the received data;

    applying an algorithm to the defined data segment to generate a signature for the defined data segment;

    comparing the signature for the defined data segment with one or more signatures stored in a central reference table for one or more previously defined data segments to determine whether the defined segment is already stored in the shared deduplicated storage repository; and

    updating the central reference table to include the signature for the defined data segment and a reference for the defined data segment if the defined data segment is not in the shared deduplicated storage repository;

    subsequent to said performing the data deduplication operation, analyzing data traffic received from the plurality of computing systems;

    based on said analyzing the data traffic, determining at least a second computing system of the plurality of computing systems to which to transmit an updated partial instantiation of the central reference table, the partial instantiation including the signature for the defined data segment;

    transmitting the updated partial instantiation of the central reference table from the shared deduplicated storage repository to the determined second computing system of the plurality of computing systems, such that the partial instantiation of the central reference table local to the second computing system includes the at least one signature and a partial instantiation of the central reference table local to a third computing system of the plurality of computing systems does not include the at least one signature; and

    with the second computing system, subsequent to said transmitting the partial instantiation of the central reference table;

    generating a signature for a first data segment generated by the application software executing on the second computing system, the first data segment matching the defined data segment and scheduled for storage in the shared deduplicated storage repository;

    comparing the signature for the first data segment with one or more signatures stored in the partial instantiation of the central reference table local to the second computing system;

    determining that an entry exists in the partial instantiation of the central reference table local to the second computing system that corresponds to the signature for the first data segment; and

    transmitting the signature for the first data segment over the network from the second computing system to the shared deduplicated storage repository without transmitting the first data segment itself.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×