STRATIFICATION METHOD FOR OVERCOMING UNBALANCED CASE NUMBERS IN COMPUTER-AIDED LUNG NODULE FALSE POSITIVE REDUCTION

US 20090175514A1
Filed: 11/21/2005
Published: 07/09/2009
Est. Priority Date: 11/19/2004
Status: Abandoned Application

First Claim

Patent Images

1. A method for computer-assisted detection (CAD) of regions or volumes of interest (“

regions”

) within medical image data that includes CAD processing to detect and delineate candidate regions, and post-CAD machine learning in a training phase to maximize specificity and reduce the number of false positives reported after processing non-training data, which method includes the steps of;

training a classifier on a set of medical image training data selected to include a number of regions known to be true and known to be false for a ground truth, identifying and segmenting the regions using said CAD processing, extracting features to create a pool of features to qualify the regions, applying a genetic algorithmic processor to the pool of features to determine a minimal sub-set of features for use by a support vector machine (SVM) to identify candidate regions within non-training data with improved specificity, wherein if the medical image training data is unbalanced, implementing a stratification process to the unbalanced data;

detecting, after training, within non-training data, candidate regions;

segmenting the candidate regions identified within the non-training data;

extracting a set of candidate features relating to each segmented candidate region; and

mapping candidate regions into ground truth space based on the set of candidate features with practical specificity in accord with the training process.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for computer aided detection (CAD) and classification of regions of interest detected within HRCT medical image data. The method includes post-CAD machine learning techniques applied to maximize specificity and sensitivity of identification of a region/volume as being a nodule or non-nodule. The regions are identified by a CAD process, and automatically segmented. A feature pool is identified and extracted from each segmented region, and processed by genetic algorithm to identify an optimal feature subset, wherein a data stratification method is used to balance the number of cases in different classes. The subset determined by GA is used to train the support vector machine to classify candidate region/volumes found within non-training data.

19 Citations

View as Search Results

12 Claims

1. A method for computer-assisted detection (CAD) of regions or volumes of interest (“
- regions”
  
  ) within medical image data that includes CAD processing to detect and delineate candidate regions, and post-CAD machine learning in a training phase to maximize specificity and reduce the number of false positives reported after processing non-training data, which method includes the steps of;
  
  training a classifier on a set of medical image training data selected to include a number of regions known to be true and known to be false for a ground truth, identifying and segmenting the regions using said CAD processing, extracting features to create a pool of features to qualify the regions, applying a genetic algorithmic processor to the pool of features to determine a minimal sub-set of features for use by a support vector machine (SVM) to identify candidate regions within non-training data with improved specificity, wherein if the medical image training data is unbalanced, implementing a stratification process to the unbalanced data;
  
  detecting, after training, within non-training data, candidate regions;
  
  segmenting the candidate regions identified within the non-training data;
  
  extracting a set of candidate features relating to each segmented candidate region; and
  
  mapping candidate regions into ground truth space based on the set of candidate features with practical specificity in accord with the training process.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method as set forth in claim 1, wherein the step of training further includes determining both the size of the sub-set of features optimized by the GA during training, for each candidate region in the training data, and the actual features comprising the sub-sets.
  - 3. The method as set forth in claim 1, wherein the step of training further includes defining a pool of features identified within each region within the training data as a chromosome, where each gene represents a feature, and where the genetic algorithm initially populates the chromosomes by random selection of features, and iteratively searches for those chromosomes that have higher fitness, wherein the evaluation is repeated for each generation, and using mutation and crossover, generates new and more fit chromosomes during the training phase.
  - 4. The method as set forth in claim 3, wherein the determining includes applying the GA in two phases, including:
    - a.) identifying each chromosome as to both its set of features, and the number of features; and
      
      b.) analyzing, for each chromosome, the identified set of features, and the identified number of features, to determine the optimal size of the feature based on the number of occurrences of different chromosomes and the number of average errors.
  - 5. The method as set forth in claim 1, wherein the step of training includes identifying wall pixels utilizing filter masks.
  - 6. The method as set forth in claim 1, wherein if the data is unbalanced such that the number of false nodules is much greater than the number of true nodules, the stratification process chooses a number of false nodules based on a criteria such that the number of false nodule and true nodules is balanced.
  - 7. A computer readable medium comprising a set of computer readable instructions, which upon downloading to a general purpose computer, implements a method as set forth in claim 1.

8. A system for detecting and identifying regions and/or volumes of interest (“
- regions”
  
  ) within medical image data, including a CAD sub-system, and a false positive reduction (FPR) subsystem, for mapping regions to one of two ground truth states with improved specificity thereby minimizing the number of false positives reported by the system, comprising;
  
  a CAD sub-system for identifying and delineating regions of interest detected within image data;
  
  a false positive reduction sub-system in communication with the CAD sub-system, which is first trained on a set of training data, and subsequently operate upon candidate regions within non-training data with improved specificity, comprising;
  
  a feature extractor for extracting a pool of features corresponding to each CAD-delineated candidate region;
  
  a genetic algorithm in communication with the feature extractor to determine an optimal sub-set of features from pool of features of the CAD-delineated regions used in training; and
  
  a support vector machine (SVM) in communication with the feature extractor and GA, which maps each CAD-delineated candidate region detected in non-training data, post-training, based on the optimal subset of features;
  
  wherein the system is trained on imaging data including candidate regions with known ground truth, by extracting a pool of features from each segmented region, using the GA to identify an optimal sub-set of extracted features in order that the system displays sufficient discriminatory power during operation on non-training data in order to map the candidate regions with improved specificity, and wherein in the case where a total of true positives is outweighed by the number of false positives found in the training set, a stratification sub-system rearranges the training data such that there are approximately equal numbers of true and false positives in the training.
- View Dependent Claims (9, 10)
- - 9. The medical image classification system set forth in claim 8, where the CAD subsystem further includes a segmenting sub-system, which provides for reader input during the training to better delineate regions that are used for training.
  - 10. The medical image classification system as set forth in claim 8, wherein the GA operates upon a hierarchical fitness paradigm, in both training and operation on non-training data.

11. A method for classifying objects detected within medical imaging data that results a marked reduction in false positive classifications, comprising the steps of:
- CAD processing to detect and delineate objects present in the medical imaging data;
  
  post-CAD processing to generate a feature set with sufficient discriminatory power such that delineated objects may be classified with maximum specificity;
  
  wherein during a training phase, a set of known training data is CAD-processed to segment objects within the training data, a pool of features extracted/calculated from/for the segmented objects, and machine learning optimizes a sub-set of features from the pool of features, wherein if the training set has an unbalanced number of regions that are true positives and false positives, training is implemented in accord with a stratification process to train using balanced, as distinguished from unbalanced training data and wherein after training, candidate objects delineated by the CAD process are post-CAD processed, including object feature extraction, to classify the objects with high specificity in view of the post-CAD machine learning.

12. A method for training a classifier for the classification of morphologically interesting regions detected within medical imaging data, where the training includes choosing data to train the classifier in accordance with a stratification method, the stratification method comprising:
- separating the pool of false positive regions into N subsets based on region size, such that the Nth subset includes the largest regions subset;
  
  implementing a machine learning process using the Nth subset and all true regions;
  
  generating the classifier based on the machine learning; and
  
  applying the classifier to each of the remaining N−
  
  1 subsets.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Koninklijke Philips Electronics N.V. (Koninklijke Philips N.V.)
Original Assignee
Koninklijke Philips Electronics N.V. (Koninklijke Philips N.V.)
Inventors
Boroczky, Lilla, Zhao, Luyin, Lee, Kwok Pun

Application Number

US11/719,672
Publication Number

US 20090175514A1
Time in Patent Office

Days
Field of Search
US Class Current

382/128
CPC Class Codes

G06F 18/211   Selection of the most signi...

G06F 18/2411   based on the proximity to a...

G06T 2207/30061   Lung

G06T 7/0012   Biomedical image inspection

G06V 10/764   using classification, e.g. ...

G06V 10/771   Feature selection, e.g. sel...

STRATIFICATION METHOD FOR OVERCOMING UNBALANCED CASE NUMBERS IN COMPUTER-AIDED LUNG NODULE FALSE POSITIVE REDUCTION

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

19 Citations

12 Claims

Specification

Solutions

Use Cases

Quick Links

STRATIFICATION METHOD FOR OVERCOMING UNBALANCED CASE NUMBERS IN COMPUTER-AIDED LUNG NODULE FALSE POSITIVE REDUCTION

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

19 Citations

12 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links