Client-side deduplication with local chunk caching
First Claim
1. A method, comprising:
- at a client device comprising a processor and memory, identifying a fingerprint for a data chunk by applying a hash function to the data chunk via the processor, the data chunk determined by parsing a data stream at the client device, the data stream designated for storage at a networked storage system, using functions in a custom communications protocol interface, the custom communications protocol interface including a parser and a fingerprinter for facilitating client-side deduplication, the custom communications protocol interface being at the client device and being operable to communicate with at least one other module at the client device via a standard communications protocol, the custom communications protocol interface at the client device being operable to communicate with the networked storage system using at least one non-standard interaction;
determining whether the data chunk is stored in a local chunk cache at the client device;
verifying that the data chunk is correctly identified by comparing a first length of the data chunk determined by parsing the data stream with a second length of the data chunk stored in the local chunk cache; and
in response to a determination that the data chunk is not stored in the local chunk cache, determining whether the data chunk is stored at the networked storage system by transmitting the fingerprint to the networked storage system via the network;
in response to receiving a message from the networked storage system that the data chunk is not stored at the networked storage system, transmitting the data chunk and a block map update request message to the networked storage system via a network, the block map update request message including information to update a block map at the networked storage system, the block map identifying a designated memory location at which the data chunk is to be stored at the networked storage system; and
after the block map update request message is transmitted to the network storage system, storing the data chunk in a local chunk cache.
23 Assignments
0 Petitions
Accused Products
Abstract
Techniques and mechanisms described herein facilitate the transmission of a data stream from a client device to a networked storage system. According to various embodiments, a fingerprint for a data chunk may be identified by applying a hash function to the data chunk via a processor. The data chunk may be determined by parsing a data stream at the client device. A determination may be made as to whether the data chunk is stored in a chunk file repository at the client device. A block map update request message including information for updating a block map may be transmitted to a networked storage system via a network. The block map may identify a designated memory location at which the chunk is stored at the networked storage system.
59 Citations
15 Claims
-
1. A method, comprising:
-
at a client device comprising a processor and memory, identifying a fingerprint for a data chunk by applying a hash function to the data chunk via the processor, the data chunk determined by parsing a data stream at the client device, the data stream designated for storage at a networked storage system, using functions in a custom communications protocol interface, the custom communications protocol interface including a parser and a fingerprinter for facilitating client-side deduplication, the custom communications protocol interface being at the client device and being operable to communicate with at least one other module at the client device via a standard communications protocol, the custom communications protocol interface at the client device being operable to communicate with the networked storage system using at least one non-standard interaction; determining whether the data chunk is stored in a local chunk cache at the client device; verifying that the data chunk is correctly identified by comparing a first length of the data chunk determined by parsing the data stream with a second length of the data chunk stored in the local chunk cache; and in response to a determination that the data chunk is not stored in the local chunk cache, determining whether the data chunk is stored at the networked storage system by transmitting the fingerprint to the networked storage system via the network; in response to receiving a message from the networked storage system that the data chunk is not stored at the networked storage system, transmitting the data chunk and a block map update request message to the networked storage system via a network, the block map update request message including information to update a block map at the networked storage system, the block map identifying a designated memory location at which the data chunk is to be stored at the networked storage system; and after the block map update request message is transmitted to the network storage system, storing the data chunk in a local chunk cache. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A device comprising:
-
memory operable to store a fingerprint for a data chunk generated by applying a hash function to the data chunk, the data chunk determined by parsing a data stream at the device, the data stream designated for storage at a networked storage system; a processor operable to; determine whether the data chunk is stored in a local chunk cache at the client device; verify that the data chunk is correctly identified by comparing a first length of the data chunk determined by parsing the data stream with a second length of the data chunk stored in the local chunk cache; and in response to a determination that the data chunk is not stored in the local chunk cache, determine whether the data chunk is stored at the networked storage system by transmitting the fingerprint to the networked storage system via the network; and a custom communications protocol interface operable to transmit a the data chunk and block map update request message to the networked storage system via a network in response to receiving a message from the networked storage system that the data chunk is not stored at the networked storage system, the block map update request message including information for updating a block map at the networked storage system, the block map identifying a designated memory location at which the data chunk is stored at the networked storage system, the custom communications protocol interface including a parser and fingerprinter for facilitating client-side deduplication, the custom communications protocol interface being at the client device and being operable to communicate with at least one other module at the client device via a standard communications protocol, the custom communications protocol interface at the client device being operable to communicate with the networked storage system using at least one non-standard interaction, wherein the processor is further configured to store the data chunk in the local chunk cache, after the block map update request message is transmitted to the network storage system. - View Dependent Claims (11, 12, 13)
-
-
14. One or more non-transitory computer readable media having instructions stored thereon for performing a method, the method comprising:
-
at a client device comprising a processor and memory, identifying a fingerprint for a data chunk by applying a hash function to the data chunk via a processor, the data chunk determined by parsing a data stream at the client device, the data stream designated for storage at a networked storage system, using functions in a custom communications protocol interface, the custom communications protocol interface including a parser and fingerprinter for facilitating client-side deduplication, the custom communications protocol interface being at the client device and being operable to communicate with at least one other module at the client device via the standard communications protocol, the custom communications protocol interface at the client device being operable to communicate with the networked storage system using at least one non-standard interaction; determining whether the data chunk is stored in a local chunk cache at the client device; verifying that the data chunk is correctly identified by comparing a first length of the data chunk determined by parsing the data stream with a second length of the data chunk stored in the local chunk cache; in response to a determination that the data chunk is not stored in the local chunk cache, determining whether the data chunk is stored at the networked storage system by transmitting the fingerprint to the networked storage system via the network; and in response to receiving a message from the networked storage system that the data chunk is not stored at the networked storage system, transmitting the data chunk and a block map update request message to the networked storage system via a network, the block map update request message including information to update a block map at the networked storage system, the block map identifying a designated memory location at which the data chunk is to be stored at the networked storage system; wherein after the block map update request message is transmitted to the network storage system, storing the data chunk in a local chunk cache. - View Dependent Claims (15)
-
Specification