Method and Apparatus for Automatic Comparison of Data Sequences

US 20090024555A1
Filed: 12/08/2006
Published: 01/22/2009
Est. Priority Date: 12/09/2005
Status: Active Grant

First Claim

Patent Images

1-13. -13. (canceled)

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The invention is concerned with a method and an apparatus for automatic comparison of at least two data sequences characterized in—an evaluation of a local relationship between any pair of subsequences in two or more sequences; —an evaluation of a global relationship by means of aggregation of the evaluations of said local relationships.

81 Citations

View as Search Results

26 Claims

1-13. -13. (canceled)

14. A method for automatic comparison of at least two data sequences comprising the steps ofperforming an evaluation of a local relationship between any pair of subsequences in two or more sequences;
- andperforming an evaluation of a global relationship by aggregation of a plurality of evaluations of said local relationships.
- View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
- - 15. The method according to claim 14, wherein the evaluation of the local relationship is performed between a first subsequence in a first data sequence and a second subsequence in a second data sequence, wherein at least one subsequence of the first or second data sequence is missing in the other data sequence.
  - 16. The method according to claim 15, wherein subsequences for evaluation of the local relationship are specified by a subsequence selection mode comprising one of:
    - words, wherein the words are subsequences separated by a given set of delimiters;
      
      n-grams, wherein the n-grams are overlapping subsequences of a given length n; and
      
      all possible subsequences of two or more sequences.
  - 17. The method according to claim 15, wherein the totality of local and global relationships comprises a measure s for similarity or dissimilarity of two or more sequences.
  - 18. The method according to claim 15, wherein evaluation of the local and global relationships is performed by one of the following data structures or a representation thereof:
    - a hash table or indexed table;
      
      a trie or compacted trie;
      
      a suffix tree or suffix array; and
      
      a generalized suffix tree or generalized suffix array.
  - 19. The method according to claim 15, wherein at least one of the first and second data sequences comprise one or more of symbols, images, text, ASCII characters, genetic data, protein data, bytes, binary data, and tokens as objects for which the local relationship is evaluated.
  - 20. The method according to claim 17, wherein the totality of local and global relationships comprises one of the following similarity or dissimilarity measures s:
    - Manhattan or taxicab distance;
      
      Euclidean distance;
      
      Minkowski distance;
      
      Canberra distance;
      
      Chi-Square distance;
      
      Chebyshev distance;
      
      Geodesic distance;
      
      Jensen or symmetric Kullback-Leibler divergence;
      
      Position-independent Hamming distance;
      
      1^stand 2^ndKulczynski similarity coefficient;
      
      Czekanowski or Sorensen-Dice similarity coefficient;
      
      Jaccard similarity coefficient;
      
      Simpson similarity coefficient;
      
      Sokal-Sneath or Anderberg similarity coefficient;
      
      Otsuka or Ochiai similarity coefficient; and
      
      Braun-Blanquet similarity coefficient.
  - 21. The method according to claim 14, wherein the first and second data sequences (X, Y) each contain a plurality of objects to be detected;
    - a similarity measure s is automatically computed for subsequences in the first and second data sequences; and
      
      depending on the similarity measure s, further processing steps are taken.
  - 22. The method according to claim 21, wherein the first and second data sequences (X, Y) comprise data transmitted between computers in a computer network and depending on an on-line computation of the similarity measure s, at least one signal indicating an abnormal data stream or an intrusion is automatically generated.
  - 23. The method according to claim 22, wherein the computers are part of a network for transmission of monetary information.
  - 24. The method according to claim 15, wherein the first and second data sequences comprise one or more of genetic data, data exchanged between computers, text, image data, binary data, and symbols.

25. Sew):
- An apparatus for the comparison of data sequences comprising;
  
  means for representing data sequences in a data structure selected from one of;
  
  a hash table or indexed table;
  
  a trie or compacted trie;
  
  a suffix tree or suffix array; and
  
  a generalized suffix tree or generalized suffix array.means for performing an evaluation of a local relationship between any pair of subsequences in said data sequences;
  
  means for performing an evaluation of a global relationship by aggregation of a plurality of evaluations of said local relationships; and
  
  means for computation of a totality of the local and global relationship, wherein at least one of the first and second data sequences comprise one or more of symbols, images, text, ASCII characters, genetic data, protein data, bytes, binary data, and tokens as objects for which the local relationship is evaluated.

26. A system for processing and analysis of data sequences comprising:
- means for input of data sequences comprising a data structure selected from one of;
  
  a hash table or indexed table;
  
  a trie or compacted trie;
  
  a suffix tree or suffix array; and
  
  a generalized suffix tree or generalized suffix array,means for comparison of data sequences comprising one of;
  
  Manhattan or taxicab distance;
  
  Euclidean distance;
  
  Minkowski distance;
  
  Canberra distance;
  
  Chi-Square distance;
  
  Chebyshev distance;
  
  Geodesic distance;
  
  Jensen or symmetric Kullback-Leibler divergence;
  
  Position-independent Hamming distance;
  
  1^stand 2^ndKulczynski similarity coefficient;
  
  Czekanowski or Sorensen-Dice similarity coefficient;
  
  Jaccard similarity coefficient;
  
  Simpson similarity coefficient;
  
  Sokal-Sneath or Anderberg similarity coefficient;
  
  Otsuka or Ochiai similarity coefficient; and
  
  Braun-Blanquet similarity coefficient;
  
  means for analysis of data sequences including classification, regression, novelty detection, ranking, clustering, and structural inference; and
  
  means for reporting of results of the analysis.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Fraunhofer Gesellschaft Zur Foerderung Der Angewandten Forsching E.V.
Original Assignee
Fraunhofer Gesellschaft Zur Foerderung Der Angewandten Forsching E.V.
Inventors
Dussel, Patrick, Muller, Klaus-Robert, Rieck, Konrad, Laskov, Pavel

Granted Patent

US 8,271,403 B2
Time in Patent Office

Days
Field of Search
US Class Current

706/54
CPC Class Codes

G06F 18/22   Matching criteria, e.g. pro...

G06F 7/02   Comparing digital values G0...

G06N 20/10   using kernel methods, e.g. ...

G16B 30/00   ICT specially adapted for s...

G16B 30/10   Sequence alignment; Homolog...

G16B 40/00   ICT specially adapted for b...

G16B 40/20   Supervised data analysis

G16B 40/30   Unsupervised data analysis

H04L 63/1416   Event detection, e.g. attac...

H04L 63/1441   Countermeasures against mal...

H04L 9/3231   Biological data, e.g. finge...

H04L 9/3236   using cryptographic hash fu...

Method and Apparatus for Automatic Comparison of Data Sequences

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

81 Citations

26 Claims

Specification

Solutions

Use Cases

Quick Links

Method and Apparatus for Automatic Comparison of Data Sequences

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

81 Citations

26 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links