Method and apparatus for content-aware and adaptive deduplication
First Claim
1. A method for transmission of data across a network, comprising the steps of:
- receiving a data stream comprising a plurality of zones;
analyzing the received data stream to determine a starting location and an ending location of each zone within the received data stream;
based on the starting and ending locations, generating a zone stamp identifying the zone, the zone stamp includes a sequence of contiguous characters representing of at least a portion of data in the zone, wherein the order of characters in the zone stamp corresponds to the order of data in the zone, wherein said zone stamp is configured to have a length between a predetermined minimum zone stamp length and a predetermined maximum zone stamp length;
comparing the zone stamp with another zone stamp of another zone in any data stream received;
determining whether the zone is substantially similar to another zone by detecting that the zone stamp is substantially similar to another zone stamp;
delta-compressing zones within any data stream received that have been determined to have substantially similar zone stamps, thereby deduplicating zones having substantially similar zone stamps within the received data stream;
transmitting the deduplicated zones across the network from one storage location to another storage location;
wherein each zone in any data stream received is characterized by a predetermined minimum and maximum zone size and a predetermined minimum and maximum zone stamp length;
wherein zones that are to be delta-compressed have a size greater than the predetermined minimum zone size and less than the predetermined maximum size and a stamp length greater than the predetermined minimum zone stamp length.
6 Assignments
0 Petitions
Accused Products
Abstract
A method, a system, an apparatus, and a computer readable medium for transmission of data across a network are disclosed. The method includes receiving a data stream, analyzing the received data stream to determine a starting location and an ending location of each zone within the received data stream, based on the starting and ending locations, generating a zone stamp identifying the zone, the zone stamp includes a sequence of contiguous characters representing at least a portion of data in the zone, wherein the order of characters in the zone stamp corresponds to the order of data in the zone, comparing the zone stamp with another zone stamp of another zone in any data stream received, determining whether the zone is substantially similar to another zone by detecting that the zone stamp is substantially similar to another zone stamp, delta-compressing zones within any data stream received that have been determined to have substantially similar zone stamps, thereby deduplicating zones having substantially similar zone stamps within any data stream received, and transmitting the deduplicated zones across the network from one storage location to another storage location.
-
Citations
79 Claims
-
1. A method for transmission of data across a network, comprising the steps of:
-
receiving a data stream comprising a plurality of zones; analyzing the received data stream to determine a starting location and an ending location of each zone within the received data stream; based on the starting and ending locations, generating a zone stamp identifying the zone, the zone stamp includes a sequence of contiguous characters representing of at least a portion of data in the zone, wherein the order of characters in the zone stamp corresponds to the order of data in the zone, wherein said zone stamp is configured to have a length between a predetermined minimum zone stamp length and a predetermined maximum zone stamp length; comparing the zone stamp with another zone stamp of another zone in any data stream received; determining whether the zone is substantially similar to another zone by detecting that the zone stamp is substantially similar to another zone stamp; delta-compressing zones within any data stream received that have been determined to have substantially similar zone stamps, thereby deduplicating zones having substantially similar zone stamps within the received data stream; transmitting the deduplicated zones across the network from one storage location to another storage location; wherein each zone in any data stream received is characterized by a predetermined minimum and maximum zone size and a predetermined minimum and maximum zone stamp length; wherein zones that are to be delta-compressed have a size greater than the predetermined minimum zone size and less than the predetermined maximum size and a stamp length greater than the predetermined minimum zone stamp length. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 42, 44, 45, 48, 50, 52, 54)
-
-
19. A system for transmission of data across a network, comprising:
-
a processor coupled to a storage system; said processor is configured to receive a data stream comprising a plurality of zones; analyze the received data stream to determine a starting location and an ending location of each zone within the received data stream; based on the starting and ending locations, generate a zone stamp identifying the zone, the zone stamp includes a sequence of contiguous characters representing at least a portion of data in the zone, wherein the order of characters in the zone stamp corresponds to the order of data in the zone, wherein said zone stamp is configured to have a length between a predetermined minimum zone stamp length and a predetermined maximum zone stamp length; compare the zone stamp with another zone stamp of another zone in any data stream received; determine whether the zone is substantially similar to another zone by detecting that the zone stamp is substantially similar to another zone stamp; delta-compress zones within any data stream received that have been determined to have substantially similar zone stamps, thereby deduplicating zones having substantially similar zone stamps within any data stream received; transmit the deduplicated zones across the network from one storage location to another storage location in the storage system; wherein each zone in any data stream received is characterized by a predetermined minimum and maximum zone size and a predetermined minimum and maximum zone stamp length; wherein zones that are to be delta-compressed have a size greater than the predetermined minimum zone size and less than the predetermined maximum size and a stamp length greater than the predetermined minimum zone stamp length. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 43, 46, 47, 49, 51, 53, 55)
-
-
37. A method for deduplicating of data across a network, comprising the steps of:
-
receiving a data stream comprising a plurality of zones; analyzing the received data stream to determine a starting location and an ending location of each zone within the received data stream; based on the starting and ending locations, generating a zone stamp identifying the zone, the zone stamp includes a sequence of contiguous characters representing at least a portion of data in the zone, wherein the order of characters in the zone stamp corresponds to the order of data in the zone, wherein said zone stamp is configured to have a length between a predetermined minimum zone stamp length and a predetermined maximum zone stamp length; comparing the zone stamp with another zone stamp of another zone in any data stream received; determining whether the zone is substantially similar to another zone by detecting that the zone stamp is substantially similar to another zone stamp; and delta-compressing zones within any data stream received that have been determined to have substantially similar zone stamps, thereby deduplicating zones having substantially similar zone stamps within any data stream received; wherein each zone in any data stream received is characterized by a predetermined minimum and maximum zone size and a predetermined minimum and maximum zone stamp length; wherein zones that are to be delta-compressed have a size greater than the predetermined minimum zone size and less than the predetermined maximum size and a stamp length greater than the predetermined minimum zone stamp length.
-
-
38. A non-transitory computer-readable medium encoded with computer program instructions for performing method for transmitting data across a network, comprising the steps of:
-
receiving a data stream comprising a plurality of zones; analyzing the received data stream to determine a starting location and an ending location of each zone within the received data stream; based on the starting and ending locations, generating a zone stamp identifying the zone, the zone stamp includes a sequence of contiguous characters representing at least a portion of data in the zone, wherein the order of characters in the zone stamp corresponds to the order of data in the zone, wherein said zone stamp is configured to have a length between a predetermined minimum zone stamp length and a predetermined maximum zone stamp length; comparing the zone stamp with another zone stamp of another zone in any data stream received; determining whether the zone is substantially similar to another zone by detecting that the zone stamp is substantially similar to another zone stamp; delta-compressing zones within any data stream received that have been determined to have substantially similar zone stamps, thereby deduplicating zones having substantially similar zone stamps within any data stream received; transmitting the deduplicated zones across the network from one storage location to another storage location; wherein each zone in any data stream received is characterized by a predetermined minimum and maximum zone size and a predetermined minimum and maximum zone stamp length wherein zones that are to be delta-compressed have a size greater than the predetermined minimum zone size and less than the predetermined maximum size and a stamp length greater than the predetermined minimum zone stamp length. - View Dependent Claims (56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79)
-
-
39. A system for storing and transmitting data, comprising
a processor communicating with a storage location, wherein said processor is configured to receive a data stream comprising a plurality of zones from a plurality of sources; -
said processor is configured to analyze the received data stream to determine a starting location and an ending location of each zone within the received data stream; based on the starting and ending locations, generate a zone stamp identifying the zone, the zone stamp includes a sequence of contiguous characters representing a portion of data in the zone, wherein the order of characters in the zone stamp corresponds to the order of data in the zone, wherein said zone stamp is configured to have a length between a predetermined minimum zone stamp length and a predetermined maximum zone stamp length; compare the zone stamp with another zone stamp of another zone in any data stream received; determine whether the zone is substantially similar to another zone by detecting that the zone stamp is substantially similar to another zone stamp; delta-compress zones within any data stream received that have been determined to have substantially similar zone stamps, thereby deduplicating zones having substantially similar zone stamps within the received data stream; transmit the deduplicated zones to the storage device for storage; wherein each zone in any data stream received is characterized by a predetermined minimum and maximum zone size and a predetermined minimum and maximum zone stamp length; wherein zones that are to be delta-compressed have a size greater than the predetermined minimum zone size and less than the predetermined maximum size and a stamp length greater than the predetermined minimum zone stamp length. - View Dependent Claims (40, 41)
-
Specification