System and method for identifying critical features in an ordered scale space within a multi-dimensional feature space
First Claim
1. A system for identifying critical features in an ordered scale space within a multi-dimensional feature space, comprising:
- a feature analyzer initially processing features, comprising;
a feature extractor extracting the features from a plurality of data collections, each data collection characterized by a collection of features semantically-related by a grammar;
a database manager normalizing each feature and determining frequencies of occurrence and co-occurrences for the features for each of the data collections;
a mapper mapping the occurrence frequencies and the co-occurrence frequencies for each of the features into a set of patterns of occurrence frequencies and a set of patterns of co-occurrence frequencies with one such pattern for each data collection;
an unsupervised classifier selecting the pattern for each data collection and calculating similarity measures between each occurrence frequency in the selected pattern;
a scale space transformation projecting the occurrence frequencies onto a one-dimensional document signal in order of relative decreasing similarity using the similarity measures; and
a critical feature identifier deriving wavelet and scaling coefficients from the one-dimensional document signal.
11 Assignments
0 Petitions
Accused Products
Abstract
A system and method for identifying critical features in an ordered scale space within a multi-dimensional feature space is described. Features are extracted from a plurality of data collections. Each data collection is characterized by a collection of features semantically-related by a grammar. Each feature is normalized and frequencies of occurrence and co-occurrences for the feature for each of the data collections is determined. The occurrence frequencies and the co-occurrence frequencies for each of the features are mapped into a set of patterns of occurrence frequencies and a set of patterns of co-occurrence frequencies. The pattern for each data collection is selected and distance (similarity) measures between each occurrence frequency in the selected pattern is calculated. The occurrence frequencies are projected onto a one-dimensional document signal in order of relative decreasing similarity using the similarity measures. Wavelet and scaling coefficients are derived from the one-dimensional document signal using multiresolution analysis.
-
Citations
49 Claims
-
1. A system for identifying critical features in an ordered scale space within a multi-dimensional feature space, comprising:
-
a feature analyzer initially processing features, comprising;
a feature extractor extracting the features from a plurality of data collections, each data collection characterized by a collection of features semantically-related by a grammar;
a database manager normalizing each feature and determining frequencies of occurrence and co-occurrences for the features for each of the data collections;
a mapper mapping the occurrence frequencies and the co-occurrence frequencies for each of the features into a set of patterns of occurrence frequencies and a set of patterns of co-occurrence frequencies with one such pattern for each data collection;
an unsupervised classifier selecting the pattern for each data collection and calculating similarity measures between each occurrence frequency in the selected pattern;
a scale space transformation projecting the occurrence frequencies onto a one-dimensional document signal in order of relative decreasing similarity using the similarity measures; and
a critical feature identifier deriving wavelet and scaling coefficients from the one-dimensional document signal. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A method for identifying critical features in an ordered scale space within a multi-dimensional feature space, comprising:
-
extracting features from a plurality of data collections, each data collection characterized by a collection of features semantically-related by a grammar;
normalizing each feature and determining frequencies of occurrence and co-occurrences for the feature for each of the data collections;
mapping the occurrence frequencies and the co-occurrence frequencies for each of the features into a set of patterns of occurrence frequencies and a set of patterns of co-occurrence frequencies with one such pattern for each data collection;
selecting the pattern for each data collection and calculating similarity measures between each occurrence frequency in the selected pattern;
projecting the occurrence frequencies onto a one-dimensional document signal in order of relative decreasing similarity using the similarity measures; and
deriving wavelet and scaling coefficients from the one-dimensional document signal. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
-
-
24. A system for abstracting semantically latent concepts extracted from a plurality of documents, comprising:
-
a concept analyzer extracting terms and phrases from a plurality of documents, each document comprising a collection of terms, phrases and non-probative words, parsing the terms and phrases into concepts and reducing the concepts into a single root word form, and accumulating a frequency of occurrence for each concept;
a map comprising the occurrence frequencies for each of the concepts mapped into a set of patterns of occurrence frequencies, one such pattern per document, arranged in a two-dimensional document feature matrix;
an unsupervised classifier iteratively selecting each pattern from the document feature matrix for each document and calculating similarity measures between each pattern;
a scale space transformation transforming the occurrence frequencies, beginning from a substantially maximal similarity value, into a one-dimensional signal in scaleable vector form ordered in sequence of relative decreasing similarity; and
a critical feature identifier deriving wavelet and scaling coefficients from the one-dimensional scale signal. - View Dependent Claims (25, 26, 27, 28, 29)
-
-
30. A method for abstracting semantically latent concepts extracted from a plurality of documents, comprising:
-
extracting terms and phrases from a plurality of documents, each document comprising a collection of terms, phrases and non-probative words;
parsing the terms and phrases into concepts and reducing the concepts into a single root word form;
accumulating a frequency of occurrence for each concept;
mapping the occurrence frequencies for each of the concepts into a set of patterns of occurrence frequencies, one such pattern per document, arranged in a two-dimensional document feature matrix;
iteratively selecting each pattern from the document feature matrix for each document and calculating similarity measures between each pattern;
transforming the occurrence frequencies, beginning from a substantially maximal similarity value, into a one-dimensional signal in scaleable vector form ordered in sequence of relative decreasing similarity; and
deriving wavelet and scaling coefficients from the one-dimensional scale signal. - View Dependent Claims (31, 32, 33, 34, 35, 36)
-
-
37. A system for abstracting semantically latent genetic subsequences extracted from a plurality of genetic sequences, comprising:
-
a genetic sequence analyzer extracting generic subsequences from a plurality of genetic sequences, each genetic sequence comprising a collection of at least one of genetic codes for DNA nucleotides and amino acids, and accumulating a frequency of occurrence for each genetic subsequence for each of the genetic sequences from which the genetic subsequences originated;
a map comprising the occurrence frequencies for each of the genetic subsequences mapped into a set of patterns of occurrence frequencies, one such pattern per genetic sequence, arranged in a two-dimensional genetic subsequence matrix;
an unsupervised classifier iteratively selecting each pattern from the genetic subsequence matrix for each genetic sequence and calculating similarity measures between each occurrence frequency in each selected pattern;
a scale space transformation projecting the occurrence frequencies, beginning from a substantially maximal similarity measure, onto a one-dimensional signal in scaleable vector form ordered in sequence of relative decreasing similarity; and
a critical feature identifier deriving wavelet and scaling coefficients from the one-dimensional scale signal. - View Dependent Claims (38, 39, 40, 41, 42)
-
-
43. A method for abstracting semantically latent genetic subsequences extracted from a plurality of genetic sequences, comprising:
-
extracting generic subsequences from a plurality of genetic sequences, each genetic sequence comprising a collection of at least one of genetic codes for DNA nucleotides and amino acids;
accumulating a frequency of occurrence for each genetic subsequence for each of the genetic sequences from which the genetic subsequences originated;
mapping the occurrence frequencies for each of the genetic subsequences into a set of patterns of occurrence frequencies, one such pattern per genetic sequence, arranged in a two-dimensional genetic subsequence matrix;
iteratively selecting each pattern from the genetic subsequence matrix for each genetic sequence and calculating similarity measures between each occurrence frequency in each selected pattern;
projecting the occurrence frequencies, beginning from a substantially maximal similarity measure, onto a one-dimensional signal in scaleable vector form ordered in sequence of relative decreasing similarity; and
deriving wavelet and scaling coefficients from the one-dimensional scale signal. - View Dependent Claims (44, 45, 46, 47, 48, 49)
-
Specification