STRATIFICATION METHOD FOR OVERCOMING UNBALANCED CASE NUMBERS IN COMPUTER-AIDED LUNG NODULE FALSE POSITIVE REDUCTION
First Claim
1. A method for computer-assisted detection (CAD) of regions or volumes of interest (“
- regions”
) within medical image data that includes CAD processing to detect and delineate candidate regions, and post-CAD machine learning in a training phase to maximize specificity and reduce the number of false positives reported after processing non-training data, which method includes the steps of;
training a classifier on a set of medical image training data selected to include a number of regions known to be true and known to be false for a ground truth, identifying and segmenting the regions using said CAD processing, extracting features to create a pool of features to qualify the regions, applying a genetic algorithmic processor to the pool of features to determine a minimal sub-set of features for use by a support vector machine (SVM) to identify candidate regions within non-training data with improved specificity, wherein if the medical image training data is unbalanced, implementing a stratification process to the unbalanced data;
detecting, after training, within non-training data, candidate regions;
segmenting the candidate regions identified within the non-training data;
extracting a set of candidate features relating to each segmented candidate region; and
mapping candidate regions into ground truth space based on the set of candidate features with practical specificity in accord with the training process.
1 Assignment
0 Petitions
Accused Products
Abstract
A method for computer aided detection (CAD) and classification of regions of interest detected within HRCT medical image data. The method includes post-CAD machine learning techniques applied to maximize specificity and sensitivity of identification of a region/volume as being a nodule or non-nodule. The regions are identified by a CAD process, and automatically segmented. A feature pool is identified and extracted from each segmented region, and processed by genetic algorithm to identify an optimal feature subset, wherein a data stratification method is used to balance the number of cases in different classes. The subset determined by GA is used to train the support vector machine to classify candidate region/volumes found within non-training data.
19 Citations
12 Claims
-
1. A method for computer-assisted detection (CAD) of regions or volumes of interest (“
- regions”
) within medical image data that includes CAD processing to detect and delineate candidate regions, and post-CAD machine learning in a training phase to maximize specificity and reduce the number of false positives reported after processing non-training data, which method includes the steps of;training a classifier on a set of medical image training data selected to include a number of regions known to be true and known to be false for a ground truth, identifying and segmenting the regions using said CAD processing, extracting features to create a pool of features to qualify the regions, applying a genetic algorithmic processor to the pool of features to determine a minimal sub-set of features for use by a support vector machine (SVM) to identify candidate regions within non-training data with improved specificity, wherein if the medical image training data is unbalanced, implementing a stratification process to the unbalanced data; detecting, after training, within non-training data, candidate regions; segmenting the candidate regions identified within the non-training data; extracting a set of candidate features relating to each segmented candidate region; and mapping candidate regions into ground truth space based on the set of candidate features with practical specificity in accord with the training process. - View Dependent Claims (2, 3, 4, 5, 6, 7)
- regions”
-
8. A system for detecting and identifying regions and/or volumes of interest (“
- regions”
) within medical image data, including a CAD sub-system, and a false positive reduction (FPR) subsystem, for mapping regions to one of two ground truth states with improved specificity thereby minimizing the number of false positives reported by the system, comprising;a CAD sub-system for identifying and delineating regions of interest detected within image data; a false positive reduction sub-system in communication with the CAD sub-system, which is first trained on a set of training data, and subsequently operate upon candidate regions within non-training data with improved specificity, comprising; a feature extractor for extracting a pool of features corresponding to each CAD-delineated candidate region; a genetic algorithm in communication with the feature extractor to determine an optimal sub-set of features from pool of features of the CAD-delineated regions used in training; and a support vector machine (SVM) in communication with the feature extractor and GA, which maps each CAD-delineated candidate region detected in non-training data, post-training, based on the optimal subset of features; wherein the system is trained on imaging data including candidate regions with known ground truth, by extracting a pool of features from each segmented region, using the GA to identify an optimal sub-set of extracted features in order that the system displays sufficient discriminatory power during operation on non-training data in order to map the candidate regions with improved specificity, and wherein in the case where a total of true positives is outweighed by the number of false positives found in the training set, a stratification sub-system rearranges the training data such that there are approximately equal numbers of true and false positives in the training. - View Dependent Claims (9, 10)
- regions”
-
11. A method for classifying objects detected within medical imaging data that results a marked reduction in false positive classifications, comprising the steps of:
-
CAD processing to detect and delineate objects present in the medical imaging data; post-CAD processing to generate a feature set with sufficient discriminatory power such that delineated objects may be classified with maximum specificity; wherein during a training phase, a set of known training data is CAD-processed to segment objects within the training data, a pool of features extracted/calculated from/for the segmented objects, and machine learning optimizes a sub-set of features from the pool of features, wherein if the training set has an unbalanced number of regions that are true positives and false positives, training is implemented in accord with a stratification process to train using balanced, as distinguished from unbalanced training data and wherein after training, candidate objects delineated by the CAD process are post-CAD processed, including object feature extraction, to classify the objects with high specificity in view of the post-CAD machine learning.
-
-
12. A method for training a classifier for the classification of morphologically interesting regions detected within medical imaging data, where the training includes choosing data to train the classifier in accordance with a stratification method, the stratification method comprising:
-
separating the pool of false positive regions into N subsets based on region size, such that the Nth subset includes the largest regions subset; implementing a machine learning process using the Nth subset and all true regions; generating the classifier based on the machine learning; and applying the classifier to each of the remaining N−
1 subsets.
-
Specification