SYSTEM AND METHOD FOR SEGMENTING A DATA STREAM
First Claim
1. A method of limiting redundant storage of data for a backup mass storage device, comprising:
- receiving a data stream;
partitioning the data stream into a series of data chunks;
generating at least one content hash value for a set of data chunks based on data content of the set of data chunks;
grouping one or more data chunks into a segment with at least one boundary of the segment defined based on an evaluation of content hash values of data chunks;
comparing content hash values of data chunks of the segment to content hash values of data chunks of segments stored on a backup mass storage device; and
storing a pointer to a stored data chunk of an existing segment stored on the backup mass storage device if a content hash value of a data chunk of the segment matches the content hash value of the stored data chunk.
2 Assignments
0 Petitions
Accused Products
Abstract
A method of limiting redundant storage of data comprises receiving a data stream and partitioning the data stream into a series of data chunks. At least one content hash value for a set of data chunks is generated based on data content of the set of data chunks. One or more data chunks are grouped into a segment with at least one boundary of the segment defined based on an evaluation of content hash values of data chunks. Content hash values of data chunks of the segment are compared to content hash values of data chunks of segments stored on a backup mass storage device. A pointer to a stored data chunk of an existing segment is stored on the backup mass storage device if a content hash value of a data chunk of the segment matches the content hash value of the stored data chunk.
129 Citations
18 Claims
-
1. A method of limiting redundant storage of data for a backup mass storage device, comprising:
-
receiving a data stream; partitioning the data stream into a series of data chunks; generating at least one content hash value for a set of data chunks based on data content of the set of data chunks; grouping one or more data chunks into a segment with at least one boundary of the segment defined based on an evaluation of content hash values of data chunks; comparing content hash values of data chunks of the segment to content hash values of data chunks of segments stored on a backup mass storage device; and storing a pointer to a stored data chunk of an existing segment stored on the backup mass storage device if a content hash value of a data chunk of the segment matches the content hash value of the stored data chunk. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system for limiting redundant storage of data to a backup mass storage device, comprising:
-
a mass storage device; a backup mass storage device; a backup module, operable to receive a data stream from the mass storage device and to transmit a modified data stream to the backup mass storage device; a hash generating module, operable to generate at least one content hash value for a set of data chunks based on data content of the set of data chunks; a segmentation module, operable to group a plurality of data chunks into a segment while defining at least one boundary of the segment based on an evaluation of content hash values of data chunks; and a comparison module, operable to compare content hash values of data chunks of the segment to content hash values of data chunks of segments stored on the backup mass storage device; the backup module operable to store a pointer to a stored data chunk of an existing segment stored on the mass storage device if a content hash value of a data chunk of the segment matches the content hash value of the stored data chunk. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
Specification