Apparatus and method for removing non-discriminatory indices of an indexed dataset

US 8,010,296 B2
Filed: 12/18/2003
Issued: 08/30/2011
Est. Priority Date: 12/19/2002
Status: Active Grant

First Claim

Patent Images

1. A data analyzer for use with a pattern classifier to compress a set of indexed data having common characteristics and noise, comprising:

a. means for determining a common characteristic threshold for the indexed data set;

b. means for removing indices having an ensemble statistic higher than the common characteristic threshold value in order to provide a retained dataset, wherein the ensemble statistic is a statistic taken from across a set of spectra;

c. means for calculating the ensemble statistic of each retained index in the retained dataset;

d. means for determining a noise threshold;

e. means for removing indices from the retained dataset wherein the ensemble statistic is lower than a noise threshold value; and

f. means for normalizing the indexed data.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention provides a device and method for removing non-discriminatory indices of an indexed dataset using ensemble statistics analysis. The device may include a data removal module (320) for removing non-discriminatory indices. For example, the data removal module (320) may comprise a common characteristic removal module and/or a noise removal module. In addition, the data analyzer (300) may comprise a normalization means (310) for normalizing the indexed data. The method of the present invention comprises the steps of identifying and removing portions of the set of data having insufficient discriminatory power based on ensemble statistics of the set of indexed data. For example, the method may include the steps of identifying and removing common characteristics and/or noise portions of the set of indexed data. In addition, the method may comprise the step of normalizing the indexed data either prior to or after the step of removing portions of the set of data.

16 Citations

View as Search Results

7 Claims

1. A data analyzer for use with a pattern classifier to compress a set of indexed data having common characteristics and noise, comprising:
- a. means for determining a common characteristic threshold for the indexed data set;
  
  b. means for removing indices having an ensemble statistic higher than the common characteristic threshold value in order to provide a retained dataset, wherein the ensemble statistic is a statistic taken from across a set of spectra;
  
  c. means for calculating the ensemble statistic of each retained index in the retained dataset;
  
  d. means for determining a noise threshold;
  
  e. means for removing indices from the retained dataset wherein the ensemble statistic is lower than a noise threshold value; and
  
  f. means for normalizing the indexed data.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The data analyzer according to claim 1, wherein the normalization means is configured to process the indexed data prior to processing by the common characteristic threshold means.
  - 3. The data analyzer according to claim 1, wherein the normalization means is configured to process the indexed data after processing by the common characteristic threshold means.
  - 4. The data analyzer according to claim 1, wherein the normalization means comprises means for normalizing a member of the set to the standard deviation of the member.
  - 5. The data analyzer according to claim 1, wherein the normalization means comprises means for normalizing a member of the set to the maximum value of the member.

6. A method for classifying a set of indexed data that includes obtaining a collection of control spectra obtained via mass spectrometry, comprising the steps of:
- a. calculating an ensemble statistic at each index in the control spectra obtained via mass spectrometry, wherein the ensemble statistic is a statistic taken from across a set of spectra;
  
  b. identifying those indices at which the ensemble statistic exceeds a first selected threshold;
  
  c. removing the identified indices from all spectra in the set of indexed data to provide a set of compressed indexed data;
  
  d. calculating an ensemble statistic at each index of the compressed indexed data;
  
  e. removing all indices from each compressed spectrum that have an ensemble statistic that is lower than a second selected threshold value to provide a set of reduced indexed data;
  
  f. extracting a feature portion of each of the reduced indexed data to provide a set of feature spectra;
  
  g. classifying the set of feature spectra into clusters; and
  
  wherein the step of calculating the ensemble statistic at each index in the control spectra comprises computing an ensemble variance of the control spectra.

7. A method for classifying a set of indexed data that includes obtaining a set of control spectra obtained via mass spectrometry, comprising the steps of:
- a. calculating an ensemble statistic at each index in the control spectra obtained via mass spectrometry, wherein the ensemble statistic is a statistic taken from across a set of spectra;
  
  b. identifying those indices at which the ensemble statistic exceeds a first selected threshold;
  
  c. removing the identified indices from all spectra in the set of indexed data to provide a set of compressed indexed data;
  
  d. calculating an ensemble statistic at each index of the compressed indexed data;
  
  e. removing all indices from each compressed spectrum that have an ensemble statistic that is lower than a second selected threshold value to provide a set of reduced indexed data;
  
  f. extracting a feature portion of each of the reduced indexed data to provide a set of feature spectra;
  
  g. classifying the set of feature spectra into clusters; and
  
  wherein the step of calculating the ensemble statistic at each index of the compressed indexed data comprises computing an ensemble variance of the compressed indexed data.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Drexel University
Original Assignee
Drexel University
Inventors
Hrebien, Leonid, Kam, Moshe, Loo, Lit-Hsin
Primary Examiner(s)
Clow; Lori A

Application Number

US10/538,390
Publication Number

US 20070009160A1
Time in Patent Office

2,812 Days
Field of Search

None
US Class Current

702/19
CPC Class Codes

G06F 18/10   Pre-processing; Data cleansing

G06F 2218/04   Denoising

G16B 25/00   ICT specially adapted for h...

G16B 25/10   Gene or protein expression ...

G16B 40/00   ICT specially adapted for b...

G16B 40/20   Supervised data analysis

G16B 40/30   Unsupervised data analysis

Y10T 436/24   Nuclear magnetic resonance,...

Apparatus and method for removing non-discriminatory indices of an indexed dataset

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

16 Citations

7 Claims

Specification

Solutions

Use Cases

Quick Links

Apparatus and method for removing non-discriminatory indices of an indexed dataset

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

16 Citations

7 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links