METHOD AND APPARATUS FOR BLOCK SIZE OPTIMIZATION IN DE-DUPLICATION
First Claim
1. A method of determining sizing of chunk portions in data de-duplication, comprising:
- chunking input data into a first plurality of data segments each having a first size;
assigning an identifier to each of the first plurality of data segments;
assigning an index to each of said identifiers;
creating a suffix structure and a longest common prefix structure from the indexes;
detecting repeated sequences of indexes and non-repeated indexes from the suffix structure and the longest common prefix structure;
determining a second size based on said detected repeated sequences and non-repeated indexes; and
chunking the input data into a second plurality of data segments each having the second size.
4 Assignments
0 Petitions
Accused Products
Abstract
The invention provides a method and apparatus for determining sizing of chunk portions in data de-duplication. The method chunks input data into segments where each segment has a first size, assigns an identifier to each of the data segments, assigns an index to each of the identifiers, creates a suffix structure and a longest common prefix structure from the indexes, detects repeated sequences of indexes and non-repeated indexes from the suffix structure and the longest common prefix structure, determines a second size based on said detected repeated sequences and non-repeated indexes, and chunks the input data into a second plurality of data segments each having the second size.
81 Citations
20 Claims
-
1. A method of determining sizing of chunk portions in data de-duplication, comprising:
-
chunking input data into a first plurality of data segments each having a first size; assigning an identifier to each of the first plurality of data segments; assigning an index to each of said identifiers; creating a suffix structure and a longest common prefix structure from the indexes; detecting repeated sequences of indexes and non-repeated indexes from the suffix structure and the longest common prefix structure; determining a second size based on said detected repeated sequences and non-repeated indexes; and chunking the input data into a second plurality of data segments each having the second size. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. An apparatus for determining segment size in de-duplication, comprising:
-
a chunking module configured to segment input data into a first plurality of data segments each having a first size; an indexing module configured to assign an index to each of the first plurality of data segments, an identifier module configured to assign an identifier to each of the first plurality of data segments, where the indexing module is further configured to assign an index to each of said identifiers, and create a suffix structure and a longest common prefix structure from the indexes; and an array processor module configured to detect repeated sequences of indexes and non-repeated indexes from the suffix structure and longest common prefix structure, and to determine a second size based on the detected repeated sequences of indexes and non-repeated indexes, wherein the chunking module further segments the input data into a second plurality of data segments each having the second size. - View Dependent Claims (10, 11, 12)
-
-
13. A computer program product comprising a computer usable medium including a computer readable program, wherein the computer readable program when executed on a computer causes the computer to:
-
chunk input data into a first plurality of data segments each having a first size; assign an identifier to each of the first plurality of data segments; assign an index to each of said identifiers; create a first structure and a second structure from the indexes; detect repeated sequences of indexes and non-repeated indexes from the first structure and the second structure; determine a second size based on said detected repeated sequences of indexes and non-repeated indexes; and chunk the input data into a second plurality of data segments each having the second size. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
-
Specification