×

Method for compressed data with reduced dictionary sizes by coding value prefixes

  • US 7,609,179 B2
  • Filed: 01/08/2008
  • Issued: 10/27/2009
  • Est. Priority Date: 01/08/2008
  • Status: Active Grant
First Claim
Patent Images

1. A computer-based method to compress a dataset of values from one or more domains, said computer-based method implemented in computer readable program code stored in computer memory, said computer-based method comprising the steps of:

  • a. receiving input data;

    b. identifying a dictionary size budget;

    c. scanning a sample of input data values to identify occurrence frequencies of each value prefix in said sample;

    d. based on said identified dictionary size budget in (b) and said identified occurrence frequencies in (c), choosing a subset of value prefixes for which codewords are to be assigned, wherein a choice of assigning prefix codes to common prefixes associated with a subset of said set of data values is optimized to provide maximum compression for a given constraint on the number of codewords;

    e. building a dictionary and assigning prefix codes to said chosen subset of value prefixes in (d);

    f. rescanning said input data wherein, for each given value v, finding its longest prefix u in said dictionary, chosen in step (e) and outputting a concatenation of codeword(u) and w, wherein w is the suffix of value v, said suffix comprising remaining uncoded bits of said given value, andwherein said optimization is performed via a dynamic programming algorithm, wherein, in said dynamic programming algorithm, best prefixes are chosen based on whether a given node v is a leaf node or whether the given node v is not a leaf node.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×