Enhanced data compression for sparse multidimensional ordered series data

US 9,571,122 B2
Filed: 07/01/2016
Issued: 02/14/2017
Est. Priority Date: 10/07/2014
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method of compressing a sparse multidimensional ordered series of spectroscopic data, the method comprising:

a) receiving the sparse multidimensional ordered series data containing values that fall within a dynamic range of less than 10 orders of magnitude, wherein the data comprise indexed data sets, each indexed data set comprising an index (n), a first variable (x_n) representing a mass to charge ratio (m/z), and a second variable (y_n) representing signal intensity;

b) defining a predictor that calculates each first variable (x_n);

c) assigning an amplitude code word to each y_n;

d) calculating a hop offset value (Δ

_n) for each y_n;

e) assigning a hop code word to each Δ

_nbased on the value of the Δ

_n; and

f) generating a compressed output, said compressed output comprising;

i) a decoder legend comprising;

a reverse amplitude code word dictionary associated with y_n; and

a reverse hop code word dictionary associated with Δ

_n; and

ii) code word data comprising an amplitude code word and a hop code word for each y_nand each Δ

_n.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed are methods and systems for significantly compressing sparse multidimensional ordered series data comprised of indexed data sets, wherein each data set comprises an index, a first variable and a second variable. The methods and systems are particularly suited for compression of data recorded in double precision floating point format.

67 Citations

View as Search Results

19 Claims

1. A computer-implemented method of compressing a sparse multidimensional ordered series of spectroscopic data, the method comprising:
- a) receiving the sparse multidimensional ordered series data containing values that fall within a dynamic range of less than 10 orders of magnitude, wherein the data comprise indexed data sets, each indexed data set comprising an index (n), a first variable (x_n) representing a mass to charge ratio (m/z), and a second variable (y_n) representing signal intensity;
  
  b) defining a predictor that calculates each first variable (x_n);
  
  c) assigning an amplitude code word to each y_n;
  
  d) calculating a hop offset value (Δ
  
  _n) for each y_n;
  
  e) assigning a hop code word to each Δ
  
  _nbased on the value of the Δ
  
  _n; and
  
  f) generating a compressed output, said compressed output comprising;
  
  i) a decoder legend comprising;
  
  a reverse amplitude code word dictionary associated with y_n; and
  
  a reverse hop code word dictionary associated with Δ
  
  _n; and
  
  ii) code word data comprising an amplitude code word and a hop code word for each y_nand each Δ
  
  _n.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
- - 2. The method of claim 1, wherein the sparse multidimensional ordered series data is in double precision floating point format.
  - 3. The method of claim 1, wherein the sparse multidimensional ordered series data comprises a plurality of indexed x,y pairs.
  - 4. The method of claim 1, wherein the predictor is a global predictor function.
  - 5. The method of claim 4, wherein the global predictor is an n^thorder polynomial function.
  - 6. The method of claim 5, wherein the function is g(n)=a₀+a₁*n+a₂*n²+a₃*n³.
  - 7. The method of claim 1, wherein the predictor is a piecewise predictor.
  - 8. The method of claim 1, wherein the predictor is a local predictor.
  - 9. The method of claim 1, wherein the predictor further comprises an error correction mechanism.
  - 10. The method of claim 1, wherein the second variable y_ndata is comprised of a sequence of variable amplitude measurements interspaced with intervals of relatively quiet periods during which the y_ndata remains moderately constant and primarily dominated by noise.
  - 11. The method of claim 1, wherein the second variable y_ndata is comprised of a non-uniform multi-modal distribution of amplitude ranges, where certain amplitude ranges that occur frequently are interspaced with other amplitude ranges that occur much less frequently.
  - 12. The method of claim 11, wherein the second variable y_ndata is comprised of a discrete set of observable amplitude ranges interspaced with intervals of amplitude ranges that are not observed in the data.
  - 13. The method of claim 1, wherein assigning an amplitude code word to each y_ncomprises:
    - i) generating a hash table for amplitude values;
      
      ii) looking up each of the second variable (y_n) value in turn,wherein if the y_nvalue is not previously seen, then the y_nvalue is added to a list of amplitude values and an associated frequency occurrence is set to one, andwherein if the y_nis already present on the list of amplitude values, then the associated frequency occurrence is incremented by one;
      
      iii) sorting the list of amplitude values by their associated frequency occurrence;
      
      iv) assigning a unique amplitude code word to each unique amplitude value in the list of amplitude values, wherein the shortest code words are assigned to the most frequently occurring amplitude values.
  - 14. The method of claim 13, wherein any second variable (y_n) value less than or equal to a baseline threshold is skipped.
  - 15. The method of claim 1, wherein the sparse multidimensional ordered series data describe a non-uniform multi-modal distribution of hop Δ
    - _nranges, where certain hop ranges that are frequently and considerably more likely to occur are interspaced with other hop ranges that are much less likely to occur.
  - 16. The method of claim 15, wherein the hop offset values are comprised of a discrete set of observable amplitude ranges interspaced with intervals of amplitude ranges that are not observed in the data.
  - 17. The method of claim 1, wherein calculating a hop offset value (Δ
    - n) for each y_ncomprises;
      
      i) identifying an initial hop offset value (Δ
      
      ₀) and entering the Δ
      
      ₀into a previous register as a previous peak location;
      
      ii) feeding each index (n) into the previous register subtracting the previous peak location from the index (n) to calculate the hop offset value (Δ
      
      _n) and then replacing the previous peak location with the index (n);
      
      iii) repeating step ii) for each index (n) in the sparse multidimensional ordered series data.
  - 18. The method of claim 1, wherein calculating a hop offset value (Δ
    - _n) for each y_ncomprises;
      
      i) identifying an initial hop offset value (Δ
      
      ₀) and entering the Δ
      
      ₀into a previous register as a previous peak location;
      
      ii) feeding each first variable (x_n) value into the previous register subtracting the previous peak location from the first variable (x_n) value to calculate the hop offset value (Δ
      
      _n) and then replacing the previous peak location with the first variable (x_n) value;
      
      iii) repeating step ii) for each first variable (x_n) value in the sparse multidimensional ordered series data.
  - 19. The method of claim 1, wherein assigning a hop code word to each Δ
    - _nbased on the value and frequency of the Δ
      
      _ncomprises;
      
      i) generating a hash table for hop offset values;
      
      ii) looking up each hop offset value (Δ
      
      _n) value in turn,wherein if the Δ
      
      _nvalue is not previously seen, then the Δ
      
      _nvalue is added to a list of hop values and an associated frequency occurrence is set to one, andwherein if the Δ
      
      _nis already present on the list of hop values, then the associated frequency occurrence is incremented by one;
      
      iii) sorting the list of hop values by their associated frequency occurrence;
      
      iv) assigning a unique hop code word to each unique hop value in the list of hop values, wherein the shortest code words are assigned to the most frequently occurring hop values.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Protein Metrics LLC
Original Assignee
Protein Metrics Inc. (Insightful Science LLC)
Inventors
Kletter, Doron
Primary Examiner(s)
NGUYEN, LINH V

Application Number

US15/200,494
Publication Number

US 20160315632A1
Time in Patent Office

228 Days
Field of Search

341/51, 341/65, 341/67, 341/87, 341/106, 341/107
US Class Current

1/1
CPC Class Codes

H03M 7/24   Conversion to or from float...

H03M 7/30   Compression speech analysis...

H03M 7/3084   using adaptive string match...

H03M 7/3088   employing the use of a dict...

H03M 7/40   Conversion to or from varia...

H03M 7/42   using table look-up for the...

H03M 7/46   Conversion to or from run-l...

Enhanced data compression for sparse multidimensional ordered series data

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

67 Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Enhanced data compression for sparse multidimensional ordered series data

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

67 Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links