Methods and apparatus to integrate systematic data scaling into genetic algorithm-based feature subset selection

US 8,311,310 B2
Filed: 08/02/2007
Issued: 11/13/2012
Est. Priority Date: 08/11/2006
Status: Active Grant

First Claim

Patent Images

1. A method of improving classification accuracy and reducing false positives in data mining, computer aided-detection, computer-aided diagnosis and artificial intelligence, the method comprising:

choosing a training set from a set of training cases using systematic data scaling, the training set including one or more training cases for true nodules and one or more training cases for false nodules, the systematic data scaling removing only one or more training cases for false nodules, which is proximate a classification boundary for true and false nodules, from the training set; and

,creating a classifier based on the training set using a classification method, wherein the systematic data scaling method and the classification method produce the classifier thereby reducing false positives and improving classification accuracy.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods and apparatus for training a system for developing a process of data mining, false positive reduction, computer-aided detection, computer-aided diagnosis and artificial intelligence are provided. A method includes choosing a training set from a set of training cases using systematic data scaling and creating a classifier based on the training set using a classification method. The classifier yields fewer false positives. The method is suitable for use with a variety of data mining techniques including support vector machines, neural networks and decision trees.

Citations

17 Claims

1. A method of improving classification accuracy and reducing false positives in data mining, computer aided-detection, computer-aided diagnosis and artificial intelligence, the method comprising:
- choosing a training set from a set of training cases using systematic data scaling, the training set including one or more training cases for true nodules and one or more training cases for false nodules, the systematic data scaling removing only one or more training cases for false nodules, which is proximate a classification boundary for true and false nodules, from the training set; and
  
  ,creating a classifier based on the training set using a classification method, wherein the systematic data scaling method and the classification method produce the classifier thereby reducing false positives and improving classification accuracy.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
- - 2. The method according to claim 1 wherein the classifier is selected from the group consisting of:
    - support vector machines, neural networks, and decision trees.
  - 3. The method according to claim 1, the method further comprising evaluating the classifier produced by the classification method based on the training set using a testing set.
  - 4. The method according to claim 1 wherein choosing further comprises removing from the training set false nodules that form Tomek links with true nodules until a threshold is met.
  - 5. The method according to claim 4 wherein the threshold is determined with respect to a downscaling factor, x, such that the number of false nodules remaining in the training set after systematic data scaling is no more than x times the number of true nodules in the training set.
  - 6. The method according to claim 1 wherein the method further comprises validating the classifier with the set of training cases or a subset thereof.
  - 7. A genetic algorithm that when executed implements the method of claim 1.
  - 8. The genetic algorithm according to claim 7 wherein the genetic algorithm is the CHC algorithm.
  - 9. A method of choosing features from a feature pool, the method comprising:
    - providing each of a first genetic algorithm and a second genetic algorithm according to claim 7, wherein the first genetic algorithm is used to determine the best size of the feature set; and
      
      , fixing the feature set size and using the second genetic algorithm to select the features.
  - 10. The method according to claim 9 wherein in providing the first genetic algorithm, the method further comprises analyzing results using at least one of:
    - number of occurrences of chromosomes representing different feature subset sizes and number of average errors.
  - 11. The method according to claim 10, wherein number of average errors is a number of misclassified lung nodules.
  - 12. A non-transitory computer-readable medium which when executed implements the method of claim 1.
  - 13. An article of manufacture which is an imaging device or a false positive reduction device, wherein the device is a computer that is programmed to analyze image data by implementing the method of claim 1.
  - 14. The article of manufacture according to claim 13, wherein the imaging device is selected from the group consisting of:
    - computed tomography (CT), computed axial tomography (CAT), multi-slice computed tomography (MSCT), body section roentgenography, ultrasound, magnetic resonance imaging (MRI), magnetic resonance tomography (MRT), nuclear magnetic resonance (NMR), X-ray, microscopy, fluoroscopy, tomography, and digital imaging.
  - 15. The article of manufacture according to claim 13, wherein the article of manufacture is a lung nodule CAD system.
  - 16. The method of claim 1, wherein the number of the at least one removed training case for false nodules exceeds a threshold, the threshold determined with respect to a downscaling factor, x, wherein the number of false nodules remaining in the training set after systematic data scaling is no more than x times the number of true nodules in the training set.
  - 17. The method of claim 16, wherein the downscaling factor is the largest factor that maximizes specificity while keeping 100% sensitivity.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Koninklijke Philips Electronics N.V. (Koninklijke Philips N.V.)
Original Assignee
Koninklijke Philips Electronics N.V. (Koninklijke Philips N.V.)
Inventors
Zhao, Luyin, Boroczky, Lilla, Lee, Kwok Pun
Primary Examiner(s)
Ko, Tony
Assistant Examiner(s)
Reilly-Diakun, Jori S

Application Number

US12/377,245
Publication Number

US 20100177943A1
Time in Patent Office

1,930 Days
Field of Search

382128-134, 382159-161, 382155-156
US Class Current

382/133
CPC Class Codes

G06F 18/2111   by using evolutionary compu...

G06N 3/126   Evolutionary algorithms, e....

G06V 10/771   Feature selection, e.g. sel...

G16H 50/20   for computer-aided diagnosi...

Methods and apparatus to integrate systematic data scaling into genetic algorithm-based feature subset selection

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Methods and apparatus to integrate systematic data scaling into genetic algorithm-based feature subset selection

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links