SYNCHRONIZED DATA DUPLICATION
2 Assignments
0 Petitions
Accused Products
Abstract
A system and method for data deduplication is presented. Data received from one or more computing systems is deduplicated, and the results of the deduplication process stored in a reference table. A representative subset of the reference table is shared among a plurality of systems that utilize the data deduplication repository. This representative subset of the reference table can be used by the computing systems to deduplicate data locally before it is sent to the repository for storage. Likewise, it can be used to allow deduplicated data to be returned from the repository to the computing systems. In some cases, the representative subset can be a proper subset wherein a portion of the referenced table is identified shared among the computing systems to reduce bandwidth requirements for reference-table synchronization.
30 Citations
15 Claims
-
1. (canceled)
-
2. A computer-implemented data deduplication method, the method comprising:
with one or more computing systems of a shared storage system that maintains a deduplicated data store and that is in networked communication with a plurality of computing systems that are physically separate from the shared storage system and share the deduplicated data store; determining whether a first data segment included in data generated by an application executing on a first computing system of the plurality of client computing systems is already stored in the shared storage system; if the first data segment is not already stored in the shared storage system, updating a central reference table of the shared storage system to include an entry corresponding to the first data segment; determining a first subset of the references in the central reference table for inclusion in a first updated partial instantiation of the central reference table based on one or more of data segment size information and data segment utilization frequency information, the first subset including a reference to the first data segment; transmitting the first updated partial instantiation of the central reference table from the shared storage system to a second computing system of the plurality of client computing systems such that, subsequent to said transmitting, a partial instantiation of the central reference table local to the second computing system includes the entry corresponding to the first data segment; determining a second subset of the references in the central reference table for inclusion in a second updated partial instantiation of the central reference table based on one or more of data segment size information and data segment utilization frequency information, the second subset different than the first subset; and transmitting the second updated partial instantiation of the central reference table to a third computing system of the plurality of client computing systems such that, subsequent to transmission of the first and second updated partial instantiations, a partial instantiation of the central reference table local to the third computing system is different from the partial instantiation of the central reference table local to the second computing system, and does not include the entry corresponding to the first data segment. - View Dependent Claims (3, 4, 5, 6, 7, 8)
-
9. A system, comprising:
-
a shared deduplicated storage repository comprising computer memory; and a server system including one or more computing devices comprising computer hardware, the server system in networked communication with a plurality of computing systems which are physically separate from the server system, the server system configured to; determine whether a first data segment included in data generated by an application executing on a first computing system of the plurality of computing systems is already stored in the shared deduplicated storage repository; if the first data segment is not already stored in the shared deduplicated storage repository, update a central reference table of the shared storage system to include an entry corresponding to the first data segment; select a first subset of the references in the central reference table for inclusion in a first updated partial instantiation of the central reference table based on one or more of data segment size information and data segment utilization frequency information, the first subset including a reference to the first data segment; transmit the first updated partial instantiation of the central reference table from the server system to the second computing system of the plurality of client computing systems such that, subsequent to the transmission of the updated partial instantiation of the central reference table, a partial instantiation of the central reference table local to the second computing system includes the entry corresponding to the first data segment; select a second subset of the references in the central reference table for inclusion in a second updated partial instantiation of the central reference table based on one or more of data segment size information and data segment utilization frequency information, the second subset different than the first subset; and transmit the second updated partial instantiation of the central reference table to a third computing system of the plurality of computing systems such that, subsequent to transmission of the first and second updated partial instantiations, a partial instantiation of the central reference table local to the third computing system is different from the partial instantiation of the central reference table local to the second computing system, and does not include the entry corresponding to the first data segment. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
Specification