Method and system for determining the accuracy of DNA base identifications
First Claim
1. A system for determining the quality of predicted DNA base identifications, the system comprising a processor configured to:
- receive a training data set, the training data set comprising a plurality of predicted DNA base identifications;
define a group of subsets;
compare the predicted DNA base identifications with actual DNA base identifications for training data within each subset of the group;
determine a sampling characteristic for each subset of the group based on training data within the respective subset; and
determine a quality characterization for predicted DNA base identifications within at least one of subset of the group based on the comparison and determined sampling characteristic;
wherein the sampling characteristic comprises a confidence value comprising a binomial proportion confidence interval value; and
detect nucleotides based on the quality characterization.
1 Assignment
0 Petitions
Accused Products
Abstract
A system for determining the quality of predicted DNA base identifications is disclosed, the system comprising a processor configured to receive a training data set, the training data set comprising a plurality of predicted DNA base identifications, define a group of subsets, compare the predicted DNA base identifications with actual DNA base identifications for training data within each subset of the group, determine a sampling characteristic for each subset of the group based on training data within the respective subset, and determine a quality characterization for predicted DNA base identifications within at least one of subset of the group based on the comparison and determined sampling characteristic, wherein the sampling characteristic comprises a confidence value comprising a binomial proportion confidence interval value.
4 Citations
19 Claims
-
1. A system for determining the quality of predicted DNA base identifications, the system comprising a processor configured to:
-
receive a training data set, the training data set comprising a plurality of predicted DNA base identifications; define a group of subsets; compare the predicted DNA base identifications with actual DNA base identifications for training data within each subset of the group; determine a sampling characteristic for each subset of the group based on training data within the respective subset; and determine a quality characterization for predicted DNA base identifications within at least one of subset of the group based on the comparison and determined sampling characteristic; wherein the sampling characteristic comprises a confidence value comprising a binomial proportion confidence interval value; and detect nucleotides based on the quality characterization. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
Specification