×

Content-based segmentation scheme for data compression in storage and transmission including hierarchical segment representation

  • US 6,667,700 B1
  • Filed: 10/30/2002
  • Issued: 12/23/2003
  • Est. Priority Date: 10/30/2002
  • Status: Expired due to Term
First Claim
Patent Images

1. A method of encoding input data within a system, wherein the input data include sequences of symbols that repeat in the input data or occur in other input data encoded in the system, the method comprising:

  • identifying, within a number of sequential input data symbols defined by an offset and a window size, a fingerprint representation of the number of sequential input data symbols;

    determining, from the fingerprint representation, whether the offset is to be designated as a cut point;

    repeating the above steps of identifying and determining to arrive at a set of cut points;

    segmenting the input data as indicated by the set of cut points;

    for each segment, determining whether the segment is to be a referenced segment or an unreferenced segment;

    for each referenced segment, replacing the segment data of the referenced segment with a reference label;

    for each referenced segment not already present in a persistent segment store, storing a reference binding in the persistent segment store, wherein a reference binding associates a referenced segment'"'"'s data and its reference label;

    determining whether any sequence of segments is to be grouped as a reference group;

    for each reference group, replacing the references in the group with a group label; and

    for each reference group not already present in the persistent segment store, storing a group reference binding in the persistent segment store, wherein a group reference binding associates a reference group'"'"'s references with its group label.

View all claims
  • 20 Assignments
Timeline View
Assignment View
    ×
    ×