ESTIMATION OF PHENOTYPES USING DNA, PEDIGREE, AND HISTORICAL DATA
First Claim
1. A computer-implemented method of predicting a trait of a target individual, comprising:
- accessing a set of DNA features of the target individual;
accessing a set of non-DNA features of the target individual;
generating a feature vector that combines the set of DNA features and the set of non-DNA features, the feature vector including a set of numerical values, at least a first one of the numerical values representing one of the DNA features, at least a second one of the numerical values representing one of the non-DNA features; and
inputting the feature vector to a machine learning model to generate a prediction of whether the target individual has the trait.
4 Assignments
0 Petitions
Accused Products
Abstract
Disclosed are techniques for predicting a trait of an individual and identifying a set of enriched record collections of a genetic community. To predict a trait of an individual, DNA features and non-DNA features of the individual are accessed to generate a feature vector that is inputted into a machine learning model. The machine learning model generates a prediction of the trait. The prediction may be based on an inheritance prediction and/or a community prediction. To identify a set of enriched record collections, individuals belonging to a genetic community are identified and a set of candidate record collections are accessed. A community count and a background count is determined for each candidate record collection. The set of enriched record collections are identified based on a comparison of the community count and the background count. The genetic community may be annotated using the set of enriched record collections.
4 Citations
26 Claims
-
1. A computer-implemented method of predicting a trait of a target individual, comprising:
-
accessing a set of DNA features of the target individual; accessing a set of non-DNA features of the target individual; generating a feature vector that combines the set of DNA features and the set of non-DNA features, the feature vector including a set of numerical values, at least a first one of the numerical values representing one of the DNA features, at least a second one of the numerical values representing one of the non-DNA features; and inputting the feature vector to a machine learning model to generate a prediction of whether the target individual has the trait. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer-implemented method of predicting a trait of a target individual, comprising:
-
accessing a first dataset representing a family tree of the target individual, the family tree describing relationships among the target individual and related individuals who are related to the target individual; generating, from the first dataset, a second dataset representing a trait tree, the trait tree comprising nodes representing the target individual and one or more related individuals, each node being connected to at least another node in accordance with the relationships described in the family tree, the nodes comprising (i) a first node representing the target individual, the first node having an unknown value, and (ii) a second node having a known value representing the trait that is known for one of the related individuals represented by the second node; determining the unknown value of the first node, the unknown value determined at least based on an inheritance probability propagated from the known value of the second node along one or more branches of the trait tree; and generating a prediction of the trait of the target individual based on a determined value of the first node. - View Dependent Claims (11, 12, 13, 14)
-
-
15. A computer-implemented method of predicting a set of enriched record collections for a genetic community among a plurality of genetic communities, comprising:
-
identifying individuals belonging to the genetic community; accessing a set of candidate record collections; determining a community count of each candidate record collection in the set based on how often the candidate record collection is associated with one of the individuals; determining a background count of each candidate record collection in the set based on how often the candidate record collection is associated with any individual in the plurality of genetic communities; and identifying the set of enriched record collections of the genetic community based on a comparison of the community count and background count of each candidate record collection. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
-
Specification