System and method for identifying critical features in an ordered scale space within a multi-dimensional feature space

US 20050171948A1
Filed: 12/11/2002
Published: 08/04/2005
Est. Priority Date: 12/11/2002
Status: Abandoned Application

First Claim

Patent Images

1. A system for identifying critical features in an ordered scale space within a multi-dimensional feature space, comprising:

a feature analyzer initially processing features, comprising;

a feature extractor extracting the features from a plurality of data collections, each data collection characterized by a collection of features semantically-related by a grammar;

a database manager normalizing each feature and determining frequencies of occurrence and co-occurrences for the features for each of the data collections;

a mapper mapping the occurrence frequencies and the co-occurrence frequencies for each of the features into a set of patterns of occurrence frequencies and a set of patterns of co-occurrence frequencies with one such pattern for each data collection;

an unsupervised classifier selecting the pattern for each data collection and calculating similarity measures between each occurrence frequency in the selected pattern;

a scale space transformation projecting the occurrence frequencies onto a one-dimensional document signal in order of relative decreasing similarity using the similarity measures; and

a critical feature identifier deriving wavelet and scaling coefficients from the one-dimensional document signal.

View all claims

11 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for identifying critical features in an ordered scale space within a multi-dimensional feature space is described. Features are extracted from a plurality of data collections. Each data collection is characterized by a collection of features semantically-related by a grammar. Each feature is normalized and frequencies of occurrence and co-occurrences for the feature for each of the data collections is determined. The occurrence frequencies and the co-occurrence frequencies for each of the features are mapped into a set of patterns of occurrence frequencies and a set of patterns of co-occurrence frequencies. The pattern for each data collection is selected and distance (similarity) measures between each occurrence frequency in the selected pattern is calculated. The occurrence frequencies are projected onto a one-dimensional document signal in order of relative decreasing similarity using the similarity measures. Wavelet and scaling coefficients are derived from the one-dimensional document signal using multiresolution analysis.

Citations

49 Claims

1. A system for identifying critical features in an ordered scale space within a multi-dimensional feature space, comprising:
- a feature analyzer initially processing features, comprising;
  
  a feature extractor extracting the features from a plurality of data collections, each data collection characterized by a collection of features semantically-related by a grammar;
  
  a database manager normalizing each feature and determining frequencies of occurrence and co-occurrences for the features for each of the data collections;
  
  a mapper mapping the occurrence frequencies and the co-occurrence frequencies for each of the features into a set of patterns of occurrence frequencies and a set of patterns of co-occurrence frequencies with one such pattern for each data collection;
  
  an unsupervised classifier selecting the pattern for each data collection and calculating similarity measures between each occurrence frequency in the selected pattern;
  
  a scale space transformation projecting the occurrence frequencies onto a one-dimensional document signal in order of relative decreasing similarity using the similarity measures; and
  
  a critical feature identifier deriving wavelet and scaling coefficients from the one-dimensional document signal.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. A system according to claim 1, further comprising:
    - a preprocessor preprocessing each of the data collections prior to feature extraction to identify and logically remove non-probative content.
  - 3. A system according to claim 1, further comprising:
    - a database record storing a single occurrence of each feature in normalized form.
  - 4. A system according to claim 1, further comprising:
    - a feature frequency mapping arranging the patterns into a document feature matrix according to the data collection from which the features in each pattern were extracted.
  - 5. A system according to claim 1, further comprising:
    - a similarity module calculating a distance measure between each occurrence frequency as a similarity measure.
  - 6. A system according to claim 5, further comprising:
    - a defined variance bounding each of the similarity measures; and
      
      a cluster module forming the occurrence frequencies into clusters, each cluster comprising at least one of the features with such a similarity measure falling within the variance.
  - 7. A system according to claim 1, further comprising:
    - a pattern module forming each pattern as a vector in a multi-dimensional feature space; and
      
      a projection module projecting the multi-dimensional feature space into the one-dimensional document signal.
  - 8. A system according to claim 7, further comprising:
    - a self-organizing map of the multi-dimensional feature space formed prior to projection.
  - 9. A system according to claim 1, further comprising:
    - a quantizer quantizing the one-dimensional document signal.
  - 10. A system according to claim 9, further comprising:
    - an encoder encoding the quantized one-dimensional document signal.
  - 11. A system according to claim 1, further comprising:
    - wavelet and scaling coefficients generated through a multiresolution analysis of the one-dimensional document signal.

12. A method for identifying critical features in an ordered scale space within a multi-dimensional feature space, comprising:
- extracting features from a plurality of data collections, each data collection characterized by a collection of features semantically-related by a grammar;
  
  normalizing each feature and determining frequencies of occurrence and co-occurrences for the feature for each of the data collections;
  
  mapping the occurrence frequencies and the co-occurrence frequencies for each of the features into a set of patterns of occurrence frequencies and a set of patterns of co-occurrence frequencies with one such pattern for each data collection;
  
  selecting the pattern for each data collection and calculating similarity measures between each occurrence frequency in the selected pattern;
  
  projecting the occurrence frequencies onto a one-dimensional document signal in order of relative decreasing similarity using the similarity measures; and
  
  deriving wavelet and scaling coefficients from the one-dimensional document signal.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
- - 13. A method according to claim 12, further comprising:
    - preprocessing each of the data collections prior to feature extraction to identify and logically remove non-probative content.
  - 14. A method according to claim 12, further comprising:
    - storing a single occurrence of each feature in normalized form.
  - 15. A method according to claim 12, further comprising:
    - arranging the patterns into a document feature matrix according to the data collection from which the features in each pattern were extracted.
  - 16. A method according to claim 12, further comprising:
    - calculating a distance measure between each occurrence frequency as a similarity measure.
  - 17. A method according to claim 16, further comprising:
    - defining a variance bounding each of the similarity measures; and
      
      forming the occurrence frequencies into clusters, each cluster comprising at least one of the features with such a similarity measure falling within the variance.
  - 18. A method according to claim 12, further comprising:
    - forming each pattern as a vector in a multi-dimensional feature space; and
      
      projecting the multi-dimensional feature space into the one-dimensional document signal.
  - 19. A method according to claim 18, further comprising:
    - generating a self-organizing map of the multi-dimensional feature space prior to projection.
  - 20. A method according to claim 12, further comprising:
    - quantizing the one-dimensional document signal.
  - 21. A method according to claim 20, further comprising:
    - encoding the quantized one-dimensional document signal.
  - 22. A method according to claim 12, further comprising:
    - generating wavelet and scaling coefficients through a multiresolution analysis of the one-dimensional document signal.
  - 23. A computer-readable storage medium for a device holding code for performing the method according to claim 12.

24. A system for abstracting semantically latent concepts extracted from a plurality of documents, comprising:
- a concept analyzer extracting terms and phrases from a plurality of documents, each document comprising a collection of terms, phrases and non-probative words, parsing the terms and phrases into concepts and reducing the concepts into a single root word form, and accumulating a frequency of occurrence for each concept;
  
  a map comprising the occurrence frequencies for each of the concepts mapped into a set of patterns of occurrence frequencies, one such pattern per document, arranged in a two-dimensional document feature matrix;
  
  an unsupervised classifier iteratively selecting each pattern from the document feature matrix for each document and calculating similarity measures between each pattern;
  
  a scale space transformation transforming the occurrence frequencies, beginning from a substantially maximal similarity value, into a one-dimensional signal in scaleable vector form ordered in sequence of relative decreasing similarity; and
  
  a critical feature identifier deriving wavelet and scaling coefficients from the one-dimensional scale signal.
- View Dependent Claims (25, 26, 27, 28, 29)
- - 25. A system according to claim 24, further comprising:
    - a preprocessor preprocessing each of the documents prior to term and phrase extraction to identify and logically remove non-probative words for the documents.
  - 26. A system according to claim 24, further comprising:
    - a variance bounding each of the similarity measures; and
      
      a cluster module calculating, for each concept, a distance measure between each occurrence frequency and building clusters of concepts, each cluster comprising at least one of the concepts with the distance measure falling within the variance.
  - 27. A system according to claim 24, further comprising:
    - a self-organizing map of the occurrence frequencies of each of the concepts.
  - 28. A system according to claim 24, further comprising:
    - a quantizer quantizing the one-dimensional scale signal; and
      
      an encoder encoding the quantized one-dimensional scale signal.
  - 29. A system according to claim 24, further comprising:
    - wavelet and scaling coefficients generated through a multiresolution analysis of the one-dimensional scale signal.

30. A method for abstracting semantically latent concepts extracted from a plurality of documents, comprising:
- extracting terms and phrases from a plurality of documents, each document comprising a collection of terms, phrases and non-probative words;
  
  parsing the terms and phrases into concepts and reducing the concepts into a single root word form;
  
  accumulating a frequency of occurrence for each concept;
  
  mapping the occurrence frequencies for each of the concepts into a set of patterns of occurrence frequencies, one such pattern per document, arranged in a two-dimensional document feature matrix;
  
  iteratively selecting each pattern from the document feature matrix for each document and calculating similarity measures between each pattern;
  
  transforming the occurrence frequencies, beginning from a substantially maximal similarity value, into a one-dimensional signal in scaleable vector form ordered in sequence of relative decreasing similarity; and
  
  deriving wavelet and scaling coefficients from the one-dimensional scale signal.
- View Dependent Claims (31, 32, 33, 34, 35, 36)
- - 31. A method according to claim 30, further comprising:
    - preprocessing each of the documents prior to term and phrase extraction to identify and logically remove non-probative words for the documents.
  - 32. A method according to claim 30, further comprising:
    - defining a variance bounding each of the similarity measures;
      
      for each concept, calculating a distance measure between each occurrence frequency; and
      
      building clusters of concepts, each cluster comprising at least one of the concepts with the distance measure falling within the variance.
  - 33. A method according to claim 30, further comprising:
    - generating a self-organizing map of the occurrence frequencies of each of the concepts.
  - 34. A method according to claim 30, further comprising:
    - quantizing the one-dimensional scale signal; and
      
      encoding the quantized one-dimensional scale signal.
  - 35. A method according to claim 30, further comprising:
    - generating wavelet and scaling coefficients through a multiresolution analysis of the one-dimensional scale signal.
  - 36. A computer-readable storage medium for a device holding code for performing the method according to claim 30.

37. A system for abstracting semantically latent genetic subsequences extracted from a plurality of genetic sequences, comprising:
- a genetic sequence analyzer extracting generic subsequences from a plurality of genetic sequences, each genetic sequence comprising a collection of at least one of genetic codes for DNA nucleotides and amino acids, and accumulating a frequency of occurrence for each genetic subsequence for each of the genetic sequences from which the genetic subsequences originated;
  
  a map comprising the occurrence frequencies for each of the genetic subsequences mapped into a set of patterns of occurrence frequencies, one such pattern per genetic sequence, arranged in a two-dimensional genetic subsequence matrix;
  
  an unsupervised classifier iteratively selecting each pattern from the genetic subsequence matrix for each genetic sequence and calculating similarity measures between each occurrence frequency in each selected pattern;
  
  a scale space transformation projecting the occurrence frequencies, beginning from a substantially maximal similarity measure, onto a one-dimensional signal in scaleable vector form ordered in sequence of relative decreasing similarity; and
  
  a critical feature identifier deriving wavelet and scaling coefficients from the one-dimensional scale signal.
- View Dependent Claims (38, 39, 40, 41, 42)
- - 38. A system according to claim 37, further comprising:
    - a preprocessor preprocessing each of the genetic sequences prior to extraction to identify and logically remove non-probative data from the genetic sequences.
  - 39. A system according to claim 37, further comprising:
    - a variance bounding each of the similarity measures; and
      
      a cluster module calculating, for each genetic subsequence, a distance measure between each occurrence frequency and building clusters of genetic subsequences, each cluster comprising at least one of the genetic subsequences with the distance measure falling within the variance.
  - 40. A system according to claim 37, further comprising:
    - a self-organizing map of the occurrence frequencies of each of the genetic subsequences.
  - 41. A system according to claim 37, further comprising:
    - a quantizer quantizing the one-dimensional scale signal; and
      
      an encoder encoding the quantized one-dimensional scale signal.
  - 42. A system according to claim 37, further comprising:
    - wavelet and scaling coefficients generated through a multiresolution analysis of the one-dimensional scale signal.

43. A method for abstracting semantically latent genetic subsequences extracted from a plurality of genetic sequences, comprising:
- extracting generic subsequences from a plurality of genetic sequences, each genetic sequence comprising a collection of at least one of genetic codes for DNA nucleotides and amino acids;
  
  accumulating a frequency of occurrence for each genetic subsequence for each of the genetic sequences from which the genetic subsequences originated;
  
  mapping the occurrence frequencies for each of the genetic subsequences into a set of patterns of occurrence frequencies, one such pattern per genetic sequence, arranged in a two-dimensional genetic subsequence matrix;
  
  iteratively selecting each pattern from the genetic subsequence matrix for each genetic sequence and calculating similarity measures between each occurrence frequency in each selected pattern;
  
  projecting the occurrence frequencies, beginning from a substantially maximal similarity measure, onto a one-dimensional signal in scaleable vector form ordered in sequence of relative decreasing similarity; and
  
  deriving wavelet and scaling coefficients from the one-dimensional scale signal.
- View Dependent Claims (44, 45, 46, 47, 48, 49)
- - 44. A method according to claim 43, further comprising:
    - preprocessing each of the genetic sequences prior to extraction to identify and logically remove non-probative data from the genetic sequences.
  - 45. A method according to claim 43, further comprising:
    - defining a variance bounding each of the similarity measures;
      
      for each genetic subsequence, calculating a distance measure between each occurrence frequency; and
      
      building clusters of genetic subsequences, each cluster comprising at least one of the genetic subsequences with the distance measure falling within the variance.
  - 46. A method according to claim 43, further comprising:
    - generating a self-organizing map of the occurrence frequencies of each of the genetic subsequences.
  - 47. A method according to claim 43, further comprising:
    - quantizing the one-dimensional scale signal; and
      
      encoding the quantized one-dimensional scale signal.
  - 48. A method according to claim 43, further comprising:
    - generating wavelet and scaling coefficients through a multiresolution analysis of the one-dimensional scale signal.
  - 49. A computer-readable storage medium for a device holding code for performing the method according to claim 43.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuix North America Inc. (Nuix Ltd.)
Original Assignee
Nuix North America Inc. (Nuix Ltd.)
Inventors
Knight, William C.

Application Number

US10/317,438
Publication Number

US 20050171948A1
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 16/313 Selection or weighting of t...

System and method for identifying critical features in an ordered scale space within a multi-dimensional feature space

First Claim

11 Assignments

0 Petitions

Accused Products

Abstract

Citations

49 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for identifying critical features in an ordered scale space within a multi-dimensional feature space

First Claim

11 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

49 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links