Apparatus and method for removing non-discriminatory indices of an indexed dataset
First Claim
1. A data analyzer for use with a pattern classifier to compress a set of indexed data having common characteristics and noise, comprising:
- a. means for determining a common characteristic threshold for the indexed data set;
b. means for removing indices having an ensemble statistic higher than the common characteristic threshold value in order to provide a retained dataset, wherein the ensemble statistic is a statistic taken from across a set of spectra;
c. means for calculating the ensemble statistic of each retained index in the retained dataset;
d. means for determining a noise threshold;
e. means for removing indices from the retained dataset wherein the ensemble statistic is lower than a noise threshold value; and
f. means for normalizing the indexed data.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention provides a device and method for removing non-discriminatory indices of an indexed dataset using ensemble statistics analysis. The device may include a data removal module (320) for removing non-discriminatory indices. For example, the data removal module (320) may comprise a common characteristic removal module and/or a noise removal module. In addition, the data analyzer (300) may comprise a normalization means (310) for normalizing the indexed data. The method of the present invention comprises the steps of identifying and removing portions of the set of data having insufficient discriminatory power based on ensemble statistics of the set of indexed data. For example, the method may include the steps of identifying and removing common characteristics and/or noise portions of the set of indexed data. In addition, the method may comprise the step of normalizing the indexed data either prior to or after the step of removing portions of the set of data.
16 Citations
7 Claims
-
1. A data analyzer for use with a pattern classifier to compress a set of indexed data having common characteristics and noise, comprising:
-
a. means for determining a common characteristic threshold for the indexed data set; b. means for removing indices having an ensemble statistic higher than the common characteristic threshold value in order to provide a retained dataset, wherein the ensemble statistic is a statistic taken from across a set of spectra; c. means for calculating the ensemble statistic of each retained index in the retained dataset; d. means for determining a noise threshold; e. means for removing indices from the retained dataset wherein the ensemble statistic is lower than a noise threshold value; and f. means for normalizing the indexed data. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method for classifying a set of indexed data that includes obtaining a collection of control spectra obtained via mass spectrometry, comprising the steps of:
-
a. calculating an ensemble statistic at each index in the control spectra obtained via mass spectrometry, wherein the ensemble statistic is a statistic taken from across a set of spectra; b. identifying those indices at which the ensemble statistic exceeds a first selected threshold; c. removing the identified indices from all spectra in the set of indexed data to provide a set of compressed indexed data; d. calculating an ensemble statistic at each index of the compressed indexed data; e. removing all indices from each compressed spectrum that have an ensemble statistic that is lower than a second selected threshold value to provide a set of reduced indexed data; f. extracting a feature portion of each of the reduced indexed data to provide a set of feature spectra; g. classifying the set of feature spectra into clusters; and wherein the step of calculating the ensemble statistic at each index in the control spectra comprises computing an ensemble variance of the control spectra.
-
-
7. A method for classifying a set of indexed data that includes obtaining a set of control spectra obtained via mass spectrometry, comprising the steps of:
-
a. calculating an ensemble statistic at each index in the control spectra obtained via mass spectrometry, wherein the ensemble statistic is a statistic taken from across a set of spectra; b. identifying those indices at which the ensemble statistic exceeds a first selected threshold; c. removing the identified indices from all spectra in the set of indexed data to provide a set of compressed indexed data; d. calculating an ensemble statistic at each index of the compressed indexed data; e. removing all indices from each compressed spectrum that have an ensemble statistic that is lower than a second selected threshold value to provide a set of reduced indexed data; f. extracting a feature portion of each of the reduced indexed data to provide a set of feature spectra; g. classifying the set of feature spectra into clusters; and wherein the step of calculating the ensemble statistic at each index of the compressed indexed data comprises computing an ensemble variance of the compressed indexed data.
-
Specification