Reduced Bandwidth Data Uploading in Data Systems
First Claim
1. A system, comprising:
- one or more devices implementing a data receiver, wherein the data receiver locally stores fingerprints for data to a fingerprint dictionary, wherein the data comprises a plurality of data units, and wherein each fingerprint in the fingerprint dictionary uniquely identifies a respective data unit in the data; and
one or more devices implementing a data sender configured to;
generate fingerprints for a plurality of data units of locally cached data, wherein each fingerprint uniquely identifies a respective data unit in the locally cached data; and
send the fingerprints to the data receiver via a communications channel;
wherein the data receiver is configured to;
search the fingerprint dictionary for the fingerprints received from the data sender to determine if each of the fingerprints is in the fingerprint dictionary or is not in the fingerprint dictionary, wherein determining that a fingerprint is not in the fingerprint dictionary indicates a corresponding data unit to be uploaded; and
send, to the data sender via the communications channel, an indication of one or more data units to be uploaded as determined by said searching the fingerprint dictionary;
wherein the data sender is configured to send, to the data receiver via the communications channel, the indicated one or more data units, wherein only data units corresponding to fingerprints that are not in the fingerprint dictionary are sent to the data receiver.
1 Assignment
0 Petitions
Accused Products
Abstract
Methods and apparatus for uploading data from a sender to a receiver. A data deduplication technique is described that may reduce the bandwidth used in uploading data from the sender to the receiver. In the technique, the receiver, rather than the sender, maintains a fingerprint dictionary for previously uploaded data. When a sender has additional data to be uploaded, the sender extracts fingerprints for units of the data and sends the fingerprints to the receiver. The receiver checks its fingerprint dictionary to determine the data units to be uploaded and notifies the sender of the identified units, which then sends the identified units of data to the receiver. The technique may, for example, be applied in virtualized data store systems to reduce bandwidth usage in uploading data.
-
Citations
35 Claims
-
1. A system, comprising:
-
one or more devices implementing a data receiver, wherein the data receiver locally stores fingerprints for data to a fingerprint dictionary, wherein the data comprises a plurality of data units, and wherein each fingerprint in the fingerprint dictionary uniquely identifies a respective data unit in the data; and one or more devices implementing a data sender configured to; generate fingerprints for a plurality of data units of locally cached data, wherein each fingerprint uniquely identifies a respective data unit in the locally cached data; and send the fingerprints to the data receiver via a communications channel; wherein the data receiver is configured to; search the fingerprint dictionary for the fingerprints received from the data sender to determine if each of the fingerprints is in the fingerprint dictionary or is not in the fingerprint dictionary, wherein determining that a fingerprint is not in the fingerprint dictionary indicates a corresponding data unit to be uploaded; and send, to the data sender via the communications channel, an indication of one or more data units to be uploaded as determined by said searching the fingerprint dictionary; wherein the data sender is configured to send, to the data receiver via the communications channel, the indicated one or more data units, wherein only data units corresponding to fingerprints that are not in the fingerprint dictionary are sent to the data receiver. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A device, comprising:
-
at least one processor; and a memory comprising program instructions, wherein the program instructions are executable by the at least one processor to; generate fingerprints for a plurality of data units of locally cached data, wherein each fingerprint uniquely identifies a respective data unit in the locally cached data; send the fingerprints to a remote data storage service via a network, wherein the remote data storage service maintains a primary data store of the data; receive, from the remote data storage service via the network, an indication of one or more of the data units that are to be stored to the primary data store; and send, via the network to the remote data storage service, the one or more data units for storage, by the remote data storage service, to the primary data store. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A method, comprising:
-
generating fingerprints for a plurality of data units of locally cached data, wherein each fingerprint uniquely identifies a respective data unit in the locally cached data; sending the fingerprints to a data service via a communications channel; receiving, from the data service via the communications channel, an indication of one or more of the data units that are to be uploaded to the data service via the communications channel; and sending, via the communications channel to the data service, the indicated one or more data units. - View Dependent Claims (19, 20, 22, 23, 24, 25)
-
-
21. The method as recited in claim 21, wherein the one of the plurality of data blocks is marked as dirty to indicate that the locally cached data block has been created or modified and thus requires uploading from the locally cached data to the data service.
-
26. A non-transitory computer-accessible storage medium storing program instructions computer-executable to implement:
-
generating fingerprints for a plurality of data units of locally cached data, wherein each fingerprint uniquely identifies a respective data unit in the locally cached data; sending the fingerprints to a data service via a communications channel; receiving, from the data service via the communications channel, an indication of one or more of the data units that are to be uploaded to the data service via the communications channel; and sending, via the communications channel to the data service, the indicated one or more data units. - View Dependent Claims (27, 28, 29, 30, 31, 32, 33, 34, 35)
-
Specification