Methods and Systems For Vectored Data De-Duplication
First Claim
Patent Images
1. A method, comprising:
- comparing a de-duplication code for a first block of data received as part of an input stream to a de-duplication code for a previously processed block of data;
upon determining that the de-duplication code for the first block of data matches the code for the previously processed block of data, storing in an output stream a vector instead of the first block of data, where the vector points in the output stream to one of, the previously processed block of data, or another vector,where the vector is placed in a location in the output data stream where the first block of data would have been placed, andwhere the vector contains fewer bits than the first block of data, andconfiguring the output stream to receive the next item to be stored after the end of the vector that was stored in the output stream, where the next item is to be processed from the input stream.
7 Assignments
0 Petitions
Accused Products
Abstract
The present invention is directed toward methods and systems for data de-duplication. More particularly, in various embodiments, the present invention provides systems and methods for data de-duplication that may utilize a vectoring method for data de-duplication wherein a stream of data is divided into “data sets” or blocks. For each block, a code, such as a hash or cyclic redundancy code may be calculated and stored. The first block of the set may be written normally and its address and hash can be stored and noted. Subsequent block hashes may be compared with previously written block hashes.
-
Citations
15 Claims
-
1. A method, comprising:
-
comparing a de-duplication code for a first block of data received as part of an input stream to a de-duplication code for a previously processed block of data; upon determining that the de-duplication code for the first block of data matches the code for the previously processed block of data, storing in an output stream a vector instead of the first block of data, where the vector points in the output stream to one of, the previously processed block of data, or another vector, where the vector is placed in a location in the output data stream where the first block of data would have been placed, and where the vector contains fewer bits than the first block of data, and configuring the output stream to receive the next item to be stored after the end of the vector that was stored in the output stream, where the next item is to be processed from the input stream. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
Specification