Content-based segmentation scheme for data compression in storage and transmission including hierarchical segment representation
First Claim
1. A method for storing data, comprising:
- receiving a file from a client at a server;
segmenting the file into one or more segments;
forming a list, comprising;
determining whether each segment is present in a segment store, wherein a segment that is present in the segment store has an assigned reference label;
for each segment present in the segment store, adding to the list the assigned reference label; and
for each segment not present in the segment store, assigning a reference label to the segment, storing the segment and the reference label, and adding the reference label to the list; and
storing an association between the file and the list.
22 Assignments
0 Petitions
Accused Products
Abstract
In a coding system, input data within a system is encoded. The input data might include sequences of symbols that repeat in the input data or occur in other input data encoded in the system. The encoding includes determining a target segment size, determining a window size, identifying a fingerprint within a window of symbols at an offset in the input data, determining whether the offset is to be designated as a cut point and segmenting the input data as indicated by the set of cut points. For each segment so identified, the encoder determines whether the segment is to be a referenced segment or an unreferenced segment, replacing the segment data of each referenced segment with a reference label and storing a reference binding in a persistent segment store for each referenced segment, if needed. Hierarchically, the process can be repeated by grouping references into groups, replacing the grouped references with a group label, storing a binding between the grouped references and group label, if one is not already present, and repeating the process. The number of levels of hierarchy can be fixed in advanced or it can be determined from the content encoded.
-
Citations
30 Claims
-
1. A method for storing data, comprising:
-
receiving a file from a client at a server; segmenting the file into one or more segments; forming a list, comprising; determining whether each segment is present in a segment store, wherein a segment that is present in the segment store has an assigned reference label; for each segment present in the segment store, adding to the list the assigned reference label; and for each segment not present in the segment store, assigning a reference label to the segment, storing the segment and the reference label, and adding the reference label to the list; and storing an association between the file and the list. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A system for storing data, comprising:
-
a server for receiving a file from a client; a segmenter configured to segment the file; an encoder for forming a list, comprising; logic configured to determine whether each segment is present in a segment store, wherein a segment that is present in the segment store has an assigned reference label; for each segment present in the segment store, logic configured to add to the list the assigned reference label; and for each segment not present in the segment store, logic configured to assign a reference label to the segment, store the segment and the reference label, and add the reference label to the list; and logic configured to store an association between the file and the list. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A method for retrieving data, comprising:
-
receiving a request for a file from a client at a server; retrieving a list including reference labels to segments of the file, wherein the list is associated with the file; decoding the list, comprising; replacing reference labels with segments of the file; and
sending the file from the server to the client. - View Dependent Claims (14, 15, 16, 17, 18)
-
-
19. A system for storing files, comprising:
-
a front-end file system for receiving from a client a file command for a file, wherein the front-end file system includes a front-end file server; a back-end storage system including logic to segment the file; and a segment store for storing the segments; a network file system interface for sending or receiving contents of the file between the front-end file system and the back-end storage system. - View Dependent Claims (20, 21, 22, 23, 24)
-
-
25. A method for storing files, comprising:
-
receiving, at a front-end file system from a client, a file command for a file, wherein the front-end file system includes a front-end file server; based on the file command, using a network file system interface to transfer contents of the file between the front-end file system and a back-end storage system; segmenting the file at the back-end storage system; and maintaining a storage of the segments in a segment store of the back-end storage system. - View Dependent Claims (26, 27, 28, 29, 30)
-
Specification