Artificial intelligence and global normalization methods for genotyping
First Claim
Patent Images
1. A method of normalizing genetic data for n loci, wherein n is an integer greater than one, comprising(a) obtaining genetic data comprising n sets of first and second signal values related in a coordinate system, wherein said first and second signal values are indicative of the levels of a first and second allele, respectively, at n loci;
- (b) identifying a set of sweep points in said coordinate system;
(c) identifying a set of control points, said control points comprising at least a subset of said signal values that are proximal to said sweep points;
(d) projecting said control points to a line or curve passing through said sweep points, thereby forming set points;
(e) determining parameters of a registration transformation equation based on said set of control points and said set points; and
(f) transforming said n sets of first and second signal values according to said registration transformation equation and said parameters, thereby normalizing said genetic data.
1 Assignment
0 Petitions
Accused Products
Abstract
Described herein are systems and methods for normalizing data without the use of external controls. Also described herein are systems and methods for analyzing cluster data, such as genotyping data, using an artificial neural network.
65 Citations
72 Claims
-
1. A method of normalizing genetic data for n loci, wherein n is an integer greater than one, comprising
(a) obtaining genetic data comprising n sets of first and second signal values related in a coordinate system, wherein said first and second signal values are indicative of the levels of a first and second allele, respectively, at n loci; -
(b) identifying a set of sweep points in said coordinate system; (c) identifying a set of control points, said control points comprising at least a subset of said signal values that are proximal to said sweep points; (d) projecting said control points to a line or curve passing through said sweep points, thereby forming set points; (e) determining parameters of a registration transformation equation based on said set of control points and said set points; and (f) transforming said n sets of first and second signal values according to said registration transformation equation and said parameters, thereby normalizing said genetic data. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
-
-
29. A system for analysis of genetic data, comprising
(a) an array reader configured to detect signals from separate locations on an array substrate; -
(b) a computer processor configured to receive signal values from said array reader; (c) a normalization module comprising commands for (i) reading said signal values; (ii) identifying a set of sweep points for said signal values in a coordinate system; (iii) identifying a set of control points, said control points comprising at least a subset of said signal values that are proximal to said sweep points; (iv) projecting said control points to a line or curve passing through said sweep points, thereby forming set points; (v) determining parameters of a registration transformation equation based on said control points and said set points; and (vi) transforming said signal values according to said registration transformation equation and said parameters, thereby providing normalized genetic data; and (d) a clustering module comprising commands for (i) reading said normalized genetic data; (ii) comparing fit of said normalized genetic data to each of a plurality of cluster models using an artificial neural network, thereby determining a best fit cluster model; and (iii) assigning said signal values to at least one cluster according to said best fit cluster model, wherein if said best fit cluster model contains at least one actual cluster and at least one missing cluster, then using a second artificial neural network to propose a location for said at least one missing cluster. - View Dependent Claims (32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46)
-
-
30. A method of determining the alleles present at n loci for an individual, comprising
(a) obtaining genetic data comprising n sets of first and second signal values related in a coordinate system, wherein said first and second signal values are indicative of the levels of a first and second allele, respectively, at n loci; -
(b) identifying a set of sweep points in said coordinate system; (c) identifying a set of control points, said control points comprising at least a subset of said signal values that are proximal to said sweep points; (d) projecting said control points to a line or curve passing through said sweep points, thereby forming set points; (e) determining parameters of a registration transformation equation based on said set of control points and said set points; and (f) transforming said n sets of first and second signal values according to said registration transformation equation and said parameters, thereby normalizing said genetic data; (g) comparing fit of said normalized genetic data to each of a plurality of cluster models using an artificial neural network, thereby determining a best fit cluster model; (h) assigning said signal values to at least one cluster according to said best fit cluster model, wherein if said best fit cluster model contains at least one actual cluster and at least one missing cluster, then using a second artificial neural network to propose a location for said at least one missing cluster; and (i) determining, for an individual, the alleles present at said n loci. - View Dependent Claims (31, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72)
-
Specification