RECEIVER-SIDE DATA DEDUPLICATION IN DATA SYSTEMS
0 Assignments
0 Petitions
Accused Products
Abstract
Methods and apparatus for receiving uploaded data from a sender at a receiver. A data deduplication technique is described that may reduce the bandwidth used in uploading data from the sender to the receiver. In the technique, the receiver, rather than the sender, maintains a fingerprint dictionary for previously uploaded data. When a sender has additional data to be uploaded, the sender extracts fingerprints for units of the data and sends the fingerprints to the receiver. The receiver checks its fingerprint dictionary to determine the data units to be uploaded and notifies the sender of the identified units, which then sends the identified units of data to the receiver. The technique may, for example, be applied in virtualized data store systems to reduce bandwidth usage in uploading data.
-
Citations
49 Claims
-
1-29. -29. (canceled)
-
30. A system, comprising:
-
at least one processor; and a memory comprising program instructions, wherein the program instructions are executable by the at least one processor to implement a data store service configured to; store fingerprints to a fingerprint dictionary, wherein respective fingerprints in the fingerprint dictionary uniquely identify respective data units of data stored at the data store service; receive, from a gateway device via a network, at least one fingerprint corresponding to a respective data unit of a data volume, wherein the gateway device is located at a client site remote from the data store service, and wherein the data volume is generated or modified on the gateway device by a client device at the client site; search the fingerprint dictionary for the at least one fingerprint to determine whether the fingerprint is in or is not in the fingerprint dictionary, wherein determining that the fingerprint is not in the fingerprint dictionary indicates the corresponding data unit is to be uploaded; send, to the gateway device via the network, an indication of one or more data units to be uploaded as determined by said search; and receive, from the gateway device via the network, the indicated one or more data units to store at the data store service. - View Dependent Claims (31, 32, 33, 34, 35, 36)
-
-
37. A method, comprising:
performing, by a data store service implemented on one or more computing devices; storing fingerprints for data stored in a data store to a fingerprint dictionary, wherein the data comprises a plurality of data units, and wherein each fingerprint in the fingerprint dictionary uniquely identifies a respective data unit in the data stored in the data store; receiving, from a device via a network, one or more fingerprints each corresponding to a different data unit cached at the device; searching the fingerprint dictionary for each of the one or more fingerprints received from the device to determine whether the fingerprint is in or is not in the fingerprint dictionary, wherein determining that a fingerprint is not in the fingerprint dictionary indicates a corresponding data unit to be uploaded; sending, to the device via the network, an indication of one or more data units to be uploaded as determined by said searching the fingerprint dictionary; receiving, from the device via the network, the indicated one or more data units, wherein each received data unit corresponds to a fingerprint that is not in the fingerprint dictionary; and storing the one or more data units received from the device to the data store. - View Dependent Claims (38, 39, 40, 41, 42, 43)
-
44. A non-transitory computer-accessible storage medium storing program instructions that when executed by one or more computers implement a data store service configured to:
-
store fingerprints for data stored in a data store to a fingerprint dictionary, wherein the data comprises a plurality of data units, and wherein each fingerprint in the fingerprint dictionary uniquely identifies a respective data unit in the data stored in the data store; receive, from a device via a network, one or more fingerprints each corresponding to a different data unit cached at the device; search the fingerprint dictionary for each of the one or more fingerprints received from the device to determine whether the fingerprint is in or is not in the fingerprint dictionary, wherein determining that a fingerprint is not in the fingerprint dictionary indicates a corresponding data unit to be uploaded; send, to the device via the network, an indication of one or more data units to be uploaded as determined by said search of the fingerprint dictionary; receive, from the device via the network, the indicated one or more data units, wherein each received data unit corresponds to a fingerprint that is not in the fingerprint dictionary; and store the one or more data units received from the device to the data store. - View Dependent Claims (45, 46, 47, 48, 49)
-
Specification