System and method for segmenting a data stream
First Claim
1. A method of limiting redundant storage of data for a backup mass storage device, comprising:
- receiving a data stream;
partitioning the data stream into a series of data chunks;
generating at least one content hash value for a set of data chunks based on data content of the set of data chunks;
grouping a plurality of data chunks into a particular segment with at least one boundary of the particular segment defined based on an evaluation of content hash values of data chunks;
comparing content hash values of data chunks of the particular segment to content hash values of data chunks of a plurality of segments stored on a backup mass storage device; and
storing a pointer to a stored data chunk of an existing segment of the plurality of segments stored on the backup mass storage device if a content hash value of a data chunk of the particular segment matches the content hash value of the stored data chunk.
2 Assignments
0 Petitions
Accused Products
Abstract
A method of limiting redundant storage of data comprises receiving a data stream and partitioning the data stream into a series of data chunks. At least one content hash value for a set of data chunks is generated based on data content of the set of data chunks. One or more data chunks are grouped into a segment with at least one boundary of the segment defined based on an evaluation of content hash values of data chunks. Content hash values of data chunks of the segment are compared to content hash values of data chunks of segments stored on a backup mass storage device. A pointer to a stored data chunk of an existing segment is stored on the backup mass storage device if a content hash value of a data chunk of the segment matches the content hash value of the stored data chunk.
39 Citations
15 Claims
-
1. A method of limiting redundant storage of data for a backup mass storage device, comprising:
-
receiving a data stream; partitioning the data stream into a series of data chunks; generating at least one content hash value for a set of data chunks based on data content of the set of data chunks; grouping a plurality of data chunks into a particular segment with at least one boundary of the particular segment defined based on an evaluation of content hash values of data chunks; comparing content hash values of data chunks of the particular segment to content hash values of data chunks of a plurality of segments stored on a backup mass storage device; and storing a pointer to a stored data chunk of an existing segment of the plurality of segments stored on the backup mass storage device if a content hash value of a data chunk of the particular segment matches the content hash value of the stored data chunk. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A system for limiting redundant storage of data to a backup mass storage device, comprising:
-
a mass storage device; a backup mass storage device; a backup module, operable to receive a data stream from the mass storage device and to transmit a modified data stream to the backup mass storage device; a hash generating module, operable to generate at least one content hash value for a set of data chunks based on data content of the set of data chunks; a segmentation module, operable to group a plurality of data chunks into a particular segment while defining at least one boundary of the particular segment based on an evaluation of content hash values of data chunks; and a comparison module, operable to compare content hash values of data chunks of the particular segment to content hash values of data chunks of a plurality of segments stored on the backup mass storage device; the backup module operable to store a pointer to a stored data chunk of an existing segment of the plurality of segments stored on the mass storage device if a content hash value of a data chunk of the particular segment matches the content hash value of the stored data chunk. - View Dependent Claims (8, 9, 10, 11, 12, 13, 14, 15)
-
Specification