Systems and methods for computation of optimal distance bounds on compressed time-series data

US 7,882,126 B2
Filed: 02/07/2008
Issued: 02/01/2011
Est. Priority Date: 02/07/2008
Status: Expired due to Fees

First Claim

Patent Images

1. A method for similarity search, comprising:

transforming sequence data into a compressed sequence represented by top-k coefficients of the sequence data and a sum of the energy of omitted coefficients of the sequence data; and

computing at least one of a lower bound and an upper bound on a distance range between a query sequence and the compressed sequence, given a first and a second constraint, the first constraint being a sum of squares of the omitted coefficients being less than a sum of the energy of the omitted coefficients, the second constraint being the energy of the omitted coefficients being less than the energy of a lowest energy one of the top-k coefficients,wherein any of the lower bound and the upper bound is substantially identical to an actual distance between the query sequence and the compressed sequence subject to an amount of compression of at least the compressed sequence.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

There are provided a method and a system for computation of optimal distance bounds on compressed time-series data. In a method for similarity search, the method includes the step of transforming sequence data into a compressed sequence represented by top-k coefficients of the sequence data and a sum of the energy of omitted coefficients of the sequence data. The method further includes the step of computing at least one of a lower bound and an upper bound on a distance range between a query sequence and the compressed sequence, given a first and a second constraint. The first constraint is that a sum of squares of the omitted coefficients is less than a sum of the energy of the omitted coefficients. The second constraint is that the energy of the omitted coefficients is less than the energy of a lowest energy one of the top-k coefficients.

5 Citations

18 Claims

1. A method for similarity search, comprising:
- transforming sequence data into a compressed sequence represented by top-k coefficients of the sequence data and a sum of the energy of omitted coefficients of the sequence data; and
  
  computing at least one of a lower bound and an upper bound on a distance range between a query sequence and the compressed sequence, given a first and a second constraint, the first constraint being a sum of squares of the omitted coefficients being less than a sum of the energy of the omitted coefficients, the second constraint being the energy of the omitted coefficients being less than the energy of a lowest energy one of the top-k coefficients,wherein any of the lower bound and the upper bound is substantially identical to an actual distance between the query sequence and the compressed sequence subject to an amount of compression of at least the compressed sequence.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, wherein the query sequence is compressed.
  - 3. The method of claim 2, wherein any of the lower bound and the upper bound is substantially identical to the actual distance between the query sequence and the compressed sequence subject to the amount of compression of the query sequence and the compressed sequence.
  - 4. The method of claim 1, wherein said computing step uses a linear distance function, and is unrestricted with respect to the linear distance function used.
  - 5. The method of claim 1, wherein the linear distance function is Euclidean distance.
  - 6. The method of claim 1, wherein said computing step uses a non-linear distance function that is capable of being bounded by a linear distance.
  - 7. The method of claim 1, wherein the non-linear distance function is at least one of as Time-Warping and Longest Common Subsequence.
  - 8. The method of claim 1, wherein the sequence data is transformed using an orthonormal transform.
  - 9. The method of claim 8, wherein the orthonormal transform involves at least one of Fourier components, wavelet components, and principal components of the sequence data.

10. A computer readable storage medium comprising a computer readable program for similarity search, wherein the computer readable program when executed on a computer causes the computer to perform the steps of:
- transforming sequence data into a compressed sequence represented by top-k coefficients of the sequence data and a sum of the energy of omitted coefficients of the sequence data; and
  
  computing at least one of a lower bound and an upper bound on a distance range between a query sequence and the compressed sequence, given a first and a second constraint, the first constraint being a sum of squares of the omitted coefficients being less than a sum of the energy of the omitted coefficients, the second constraint being the energy of the omitted coefficients being less than the energy of a lowest energy one of the top-k coefficients,wherein any of the lower bound and the upper bound is substantially identical to an actual distance between the query sequence and the compressed sequence subject to an amount of compression of at least the compressed sequence.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
- - 11. The computer readable storage medium of claim 10, wherein the query sequence is compressed.
  - 12. The computer readable storage medium of claim 11, wherein any of the lower bound and the upper bound is substantially identical to the actual distance between the query sequence and the compressed sequence subject to the amount of compression of the query sequence and the compressed sequence.
  - 13. The computer readable storage medium of claim 10, wherein said computing step uses a linear distance function, and is unrestricted with respect to the linear distance function used.
  - 14. The computer readable storage medium of claim 10, wherein the linear distance function is Euclidean distance.
  - 15. The computer readable storage medium of claim 10, wherein said computing step uses a non-linear distance function that is capable of being bounded by a linear distance.
  - 16. The computer readable storage medium of claim 10, wherein the non-linear distance function is at least one of as Time-Warping and Longest Common Subsequence.
  - 17. The computer readable storage medium of claim 10, wherein the sequence data is transformed using an orthonormal transform.
  - 18. The computer readable storage medium of claim 17, wherein the orthonormal transform involves at least one of Fourier components, wavelet components, and principal components of the sequence data.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines SA (International Business Machines Corporation)
Original Assignee
International Business Machines Corporation
Inventors
Vlachos, Michail, Yu, Philip Shi-Lung
Primary Examiner(s)
Breene; John E
Assistant Examiner(s)
LE, THU NGUYET T

Application Number

US12/027,294
Publication Number

US 20090204574A1
Time in Patent Office

1,090 Days
Field of Search

707/693, 707/705, 707/713, 707/769, 707/999.004, 707/999.101, 707/E17.14
US Class Current

707/769
CPC Class Codes

G06F 16/2474 Sequence data queries, e.g....

G06F 2216/03 Data mining

Systems and methods for computation of optimal distance bounds on compressed time-series data

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

5 Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Systems and methods for computation of optimal distance bounds on compressed time-series data

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

5 Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links