Efficient data storage using two level delta resemblance
First Claim
Patent Images
1. A system for storage using resemblance of data segments comprising:
- a processor configure to;
break up a new input data stream into a plurality of data segments, wherein a new segment comprises one of the plurality of data segments, and wherein the breaking up of the new input data stream is based at least in part on a hash of content of the data stream;
determine that a new segment resembles a second prior stored segment wherein the second prior stored segment is represented as a first stored delta and a first prior stored segment, wherein resemblance is based at least in part on a summary feature set, wherein the summary feature set is determined by (a) selecting m subsegments of a first segment;
(b) selecting n different functions, and wherein n is greater than 1; and
(c) for function i, wherein i is a value from 1 to n, computing m values fi(subsegmentj) wherein j is a value from 1 to m; and
(d) selecting a subset of values from the fi(subsegmentj) values, wherein the fi(subsegmentj) values encompass all of the values as a result of applying the n different functions to the m subsegments anddetermine a second delta between the new segment and the second prior stored segment; and
store a representation of the new segment based at least in part on the second delta; and
a memory coupled with the processor, wherein the memory is configured to provide the processor with instructions.
12 Assignments
0 Petitions
Accused Products
Abstract
Storage using resemblance of data segments is disclosed. It is determined that a new segment resembles a second prior stored segment wherein the second prior stored segment is represented as a first stored delta and a first prior stored segment. A second delta between the new segment and the prior stored segment is determined. A representation of the new segment based at least in part on the second delta is stored.
-
Citations
21 Claims
-
1. A system for storage using resemblance of data segments comprising:
-
a processor configure to; break up a new input data stream into a plurality of data segments, wherein a new segment comprises one of the plurality of data segments, and wherein the breaking up of the new input data stream is based at least in part on a hash of content of the data stream; determine that a new segment resembles a second prior stored segment wherein the second prior stored segment is represented as a first stored delta and a first prior stored segment, wherein resemblance is based at least in part on a summary feature set, wherein the summary feature set is determined by (a) selecting m subsegments of a first segment;
(b) selecting n different functions, and wherein n is greater than 1; and
(c) for function i, wherein i is a value from 1 to n, computing m values fi(subsegmentj) wherein j is a value from 1 to m; and
(d) selecting a subset of values from the fi(subsegmentj) values, wherein the fi(subsegmentj) values encompass all of the values as a result of applying the n different functions to the m subsegments anddetermine a second delta between the new segment and the second prior stored segment; and store a representation of the new segment based at least in part on the second delta; and a memory coupled with the processor, wherein the memory is configured to provide the processor with instructions. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A method for storage using resemblance of data segments, comprising:
-
breaking up a new input data stream into a plurality of data segments, wherein a new segment comprises one of the plurality of data segments, and wherein the breaking up of the new input data stream is based at least in part on a hash of content of the data stream; determining that a new segment resembles a second prior stored segment wherein the second prior stored segment is represented as a first stored delta and a first prior stored segment, wherein resemblance is based at least in part on a summary feature set, wherein the summary feature set is determined by (a) selecting m subsegments of a first segment;
(b) selecting n different functions, and wherein n is greater than 1; and
(c) for function i, wherein i is a value from 1 to n, computing m values fi(subsegmentj) wherein j is a value from 1 to m; and
(d) selecting a subset of values from the fi(subsegmentj) values, wherein the fi(subsegmentj) values encompass all of the values as a result of applying the n different functions to the m subsegments; anddetermining a second delta between the new segment and the second prior stored segment; and storing a representation of the new segment based at least in part on the second delta.
-
-
21. A computer program product for storage using resemblance of data segments, the computer program product being embodied in a non-transitory computer readable medium and comprising computer instructions for:
-
breaking up a new input data stream into a plurality of data segments, wherein a new segment comprises one of the plurality of data segments, and wherein the breaking up of the new input data stream is based at least in part on a hash of content of the data stream; determining that a new segment resembles a second prior stored segment wherein the second prior stored segment is represented as a first stored delta and a first prior stored segment, wherein resemblance is based at least in part on a summary feature set, wherein the summary feature set is determined by (a) selecting m subsegments of a first segment;
(b) selecting n different functions, and wherein n is greater than 1; and
(c) for function i, wherein i is a value from 1 to n, computing m values fi(subsegmentj) wherein j is a value from 1 to m; and
(d) selecting a subset of values from the fi(subsegmentj) values, wherein the fi(subsegmentj) values encompass all of the values from applying the n different functions to the m subsegments; anddetermining a second delta between the new segment and the second prior stored segment; and storing a representation of the new segment based at least in part on the second delta.
-
Specification