Content-based segmentation scheme for data compression in storage and transmission including hierarchical segment representation
First Claim
1. A method comprising:
- compressing a data stream by;
segmenting at least portions of the data stream into segments based on content of the data stream;
responsive to a respective segment being a recurring segment, replacing the segment with a corresponding reference; and
responsive to a respective segment not being a recurring segment, creating a reference for the segment and replacing the segment with the created reference; and
storing the compressed data in a storage system.
20 Assignments
0 Petitions
Accused Products
Abstract
In a coding system, input data within a system is encoded. The input data might include sequences of symbols that repeat in the input data or occur in other input data encoded in the system. The encoding includes determining a target segment size, determining a window size, identifying a fingerprint within a window of symbols at an offset in the input data, determining whether the offset is to be designated as a cut point and segmenting the input data as indicated by the set of cut points. For each segment so identified, the encoder determines whether the segment is to be a referenced segment or an unreferenced segment, replacing the segment data of each referenced segment with a reference label and storing a reference binding in a persistent segment store for each referenced segment, if needed. Hierarchically, the process can be repeated by grouping references into groups, replacing the grouped references with a group label, storing a binding between the grouped references and group label, if one is not already present, and repeating the process. The number of levels of hierarchy can be fixed in advanced or it can be determined from the content encoded.
-
Citations
34 Claims
-
1. A method comprising:
-
compressing a data stream by; segmenting at least portions of the data stream into segments based on content of the data stream; responsive to a respective segment being a recurring segment, replacing the segment with a corresponding reference; and responsive to a respective segment not being a recurring segment, creating a reference for the segment and replacing the segment with the created reference; and storing the compressed data in a storage system. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A system, comprising:
-
a segmentation mechanism configured to segment at least portions of a data stream into segments based on content of the data stream; a replacement mechanism configured to replace a respective segment with a corresponding reference in response to the segment being a recurring segment; and a binding mechanism configured to create a reference for a respective segment and replacing the segment with the created reference in response to the segment not being a recurring segment, thereby compressing the data stream. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A method comprising:
-
receiving a file or a portion of a file; compressing the file or portion thereof by; at least partially segmenting the file or portion thereof into segments based on content of the file; responsive to a respective segment being a recurring segment, replacing the segment with a corresponding reference; responsive to a respective segment not being a recurring segment, creating a reference for the segment and replacing the segment with the created reference; and saving the file or portion thereof to a storage system.
-
-
24. A method comprising:
-
compressing a file by; segmenting at least portions of the file into segments based on content of the file; responsive to a respective segment being a recurring segment, replacing the segment with a corresponding reference; and responsive to a respective segment not being a recurring segment, creating a reference for the segment and replacing the segment with the created reference; and storing the compressed file in a storage system. - View Dependent Claims (25, 26, 27, 28, 29, 30, 31, 32, 33, 34)
-
Specification