Reduced bandwidth data uploading in data systems
First Claim
1. A system, comprising:
- one or more hardware devices configured to implement a data receiver of a network-based virtualized data store service, the network-based virtualized data store service configured to provide remote virtualized data storage services over a network for a plurality of virtualized data store customers of the network-based virtualized data store service, wherein the data receiver locally stores fingerprints for data to a fingerprint dictionary, wherein the data comprises a plurality of data units, and wherein each fingerprint in the fingerprint dictionary uniquely identifies a respective data unit in the data; and
one or more hardware devices configured to implement a given customer site for one of the plurality of the virtualized data store customers, the given customer site comprising;
one or more data clients;
a customer network; and
a data sender configured as a virtualized data store gateway between the given customer site and the network-based virtualized data store service, wherein the data sender locally caches a plurality of data units, and wherein the locally cached data units are configured to be accessed over the customer network by the one or more data clients of the given customer site, the data sender configured to;
generate fingerprints for the plurality of data units of locally cached data, wherein each fingerprint uniquely identifies a respective data unit in the locally cached data; and
send the fingerprints to the data receiver via a communications channel;
wherein the data receiver is configured to;
search the fingerprint dictionary for the fingerprints received from the data sender to determine if each of the fingerprints is in the fingerprint dictionary or is not in the fingerprint dictionary, wherein determining that a fingerprint is not in the fingerprint dictionary indicates a corresponding data unit to be uploaded; and
send, to the data sender via the communications channel, an indication of one or more data units to be uploaded as determined by said search the fingerprint dictionary;
wherein the data sender is configured to send, to the data receiver via the communications channel, the indicated one or more data units, wherein only data units corresponding to fingerprints that are not in the fingerprint dictionary are sent to the data receiver.
1 Assignment
0 Petitions
Accused Products
Abstract
Methods and apparatus for uploading data from a sender to a receiver. A data deduplication technique is described that may reduce the bandwidth used in uploading data from the sender to the receiver. In the technique, the receiver, rather than the sender, maintains a fingerprint dictionary for previously uploaded data. When a sender has additional data to be uploaded, the sender extracts fingerprints for units of the data and sends the fingerprints to the receiver. The receiver checks its fingerprint dictionary to determine the data units to be uploaded and notifies the sender of the identified units, which then sends the identified units of data to the receiver. The technique may, for example, be applied in virtualized data store systems to reduce bandwidth usage in uploading data.
18 Citations
35 Claims
-
1. A system, comprising:
-
one or more hardware devices configured to implement a data receiver of a network-based virtualized data store service, the network-based virtualized data store service configured to provide remote virtualized data storage services over a network for a plurality of virtualized data store customers of the network-based virtualized data store service, wherein the data receiver locally stores fingerprints for data to a fingerprint dictionary, wherein the data comprises a plurality of data units, and wherein each fingerprint in the fingerprint dictionary uniquely identifies a respective data unit in the data; and one or more hardware devices configured to implement a given customer site for one of the plurality of the virtualized data store customers, the given customer site comprising; one or more data clients; a customer network; and a data sender configured as a virtualized data store gateway between the given customer site and the network-based virtualized data store service, wherein the data sender locally caches a plurality of data units, and wherein the locally cached data units are configured to be accessed over the customer network by the one or more data clients of the given customer site, the data sender configured to; generate fingerprints for the plurality of data units of locally cached data, wherein each fingerprint uniquely identifies a respective data unit in the locally cached data; and send the fingerprints to the data receiver via a communications channel; wherein the data receiver is configured to; search the fingerprint dictionary for the fingerprints received from the data sender to determine if each of the fingerprints is in the fingerprint dictionary or is not in the fingerprint dictionary, wherein determining that a fingerprint is not in the fingerprint dictionary indicates a corresponding data unit to be uploaded; and send, to the data sender via the communications channel, an indication of one or more data units to be uploaded as determined by said search the fingerprint dictionary; wherein the data sender is configured to send, to the data receiver via the communications channel, the indicated one or more data units, wherein only data units corresponding to fingerprints that are not in the fingerprint dictionary are sent to the data receiver. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A device, comprising:
-
at least one processor; and a memory comprising program instructions, wherein the program instructions are executable by the at least one processor to implement a virtualized data store gateway of a given customer site, the given customer site comprising one or more data clients connected to the virtualized data store gateway via a customer network, the given customer site for a given one of a plurality of virtualized data store customers of a remote network-based virtualized data store service that provides remote storage services over a network for the plurality of virtualized data store customers, the virtualized data store gateway providing a storage gateway between the given customer site and the remote network-based virtualized data store service, the program instructions further executable to cause the virtualized data store gateway to; locally cache a plurality of data units from the one or more data clients connected to the virtualized data store gateway via the customer network of the given customer of the plurality of virtualized data store customers of the remote network-based virtualized data store service; provide access to the locally cached customer data units to the one or more data clients of the given customer of the plurality of virtualized data store customers; generate fingerprints for the plurality of data units of locally cached data units, wherein each fingerprint uniquely identifies a respective data unit in the locally cached data units; send the fingerprints to the remote network-based virtualized data store service via the network, wherein the remote network-based virtualized data store service maintains a primary data store of the plurality of data units; receive, from the remote network-based virtualized data store service via the network, an indication of one or more of the data units that are to be stored to the primary data store; and send, via the network to the remote network-based virtualized data store service, the one or more data units for storage, by the remote network-based virtualized data store service, to the primary data store. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A method, comprising:
providing a virtualized data store gateway between a given customer site of a given customer of a plurality of virtualized data store customers of a remote network-based virtualized data store service that provides remote storage services over a network for the plurality of virtualized data store customers, the given customer site comprising;
one or more data clients connected to a customer network and the virtualized data store gateway, wherein providing the virtualized data store gateway comprises;locally caching at the virtualized data store gateway, a plurality of data units from the one or more data clients connected to the virtualized data store gateway via the customer network of the given customer of the plurality of virtualized data store customers; providing access to the locally cached data units to the one or more data clients of the given customer of the plurality of virtualized data store customers; generating fingerprints for the plurality of data units of locally cached data units, wherein each fingerprint uniquely identifies a respective data unit in the locally cached data units; sending the fingerprints to the remote network-based virtualized data store service via a communications channel; receiving, from the remote network-based virtualized data store service via the communications channel, an indication of one or more of the data units that are to be uploaded to the remote network-based virtualized data store service via the communications channel; and sending, via the communications channel to the remote network-based virtualized data store service, the indicated one or more data units. - View Dependent Claims (19, 20, 21, 22, 23, 24, 25)
-
26. A non-transitory computer-accessible storage medium storing program instructions computer-executable to implement:
providing a virtualized data store gateway between a given customer site of a given customer of a plurality of virtualized data store customers of a remote network-based virtualized data store service that provides remote storage services over a network for the plurality of virtualized data store customers, the given customer site comprising;
the virtualized data store gateway and one or more data clients connected to the virtualized data store gateway via a local network, wherein to provide the virtualized data store gateway the program instructions are further computer-executable to implement;locally caching at the virtualized data store gateway a plurality of data units from the one or more data clients connected to the virtualized data store gateway via the local network of the given customer of the plurality virtualized data store customers; providing access to the locally cached data units to the one or more data clients of the given customer of the plurality of virtualized data store customers; generating fingerprints for the plurality of data units of locally cached data units, wherein each fingerprint uniquely identifies a respective data unit in the locally cached data units; sending the fingerprints to the remote network-based virtualized data store service via a communications channel; receiving, from the remote network-based virtualized data store service via the communications channel, an indication of one or more of the data units that are to be uploaded to the remote network-based virtualized data store service via the communications channel; and sending, via the communications channel to the remote network-based virtualized data store service, the indicated one or more data units. - View Dependent Claims (27, 28, 29, 30, 31, 32, 33, 34, 35)
Specification