ESTIMATION OF PHENOTYPES USING DNA, PEDIGREE, AND HISTORICAL DATA

US 20200135296A1
Filed: 10/31/2019
Published: 04/30/2020
Est. Priority Date: 10/31/2018
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method of predicting a trait of a target individual, comprising:

accessing a set of DNA features of the target individual;

accessing a set of non-DNA features of the target individual;

generating a feature vector that combines the set of DNA features and the set of non-DNA features, the feature vector including a set of numerical values, at least a first one of the numerical values representing one of the DNA features, at least a second one of the numerical values representing one of the non-DNA features; and

inputting the feature vector to a machine learning model to generate a prediction of whether the target individual has the trait.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed are techniques for predicting a trait of an individual and identifying a set of enriched record collections of a genetic community. To predict a trait of an individual, DNA features and non-DNA features of the individual are accessed to generate a feature vector that is inputted into a machine learning model. The machine learning model generates a prediction of the trait. The prediction may be based on an inheritance prediction and/or a community prediction. To identify a set of enriched record collections, individuals belonging to a genetic community are identified and a set of candidate record collections are accessed. A community count and a background count is determined for each candidate record collection. The set of enriched record collections are identified based on a comparison of the community count and the background count. The genetic community may be annotated using the set of enriched record collections.

4 Citations

26 Claims

1. A computer-implemented method of predicting a trait of a target individual, comprising:
- accessing a set of DNA features of the target individual;
  
  accessing a set of non-DNA features of the target individual;
  
  generating a feature vector that combines the set of DNA features and the set of non-DNA features, the feature vector including a set of numerical values, at least a first one of the numerical values representing one of the DNA features, at least a second one of the numerical values representing one of the non-DNA features; and
  
  inputting the feature vector to a machine learning model to generate a prediction of whether the target individual has the trait.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The computer-implemented method of claim 1, wherein the machine learning model was trained by:
    - initializing weights of the machine learning model with an initial set of values;
      
      accessing DNA features of sample individuals, each of the sample individuals associated with a label indicating whether the sample individual is known to have the trait;
      
      accessing non-DNA features of the sample individuals;
      
      generating, for each sample individual, a sample feature vector that combines the DNA features and non-DNA features of the sample individual;
      
      inputting the sample feature vectors to the machine learning model to generate a prediction of whether each sample individual has the trait; and
      
      updating the weights of the machine learning model based on the predictions and the label associated with each of the sample individuals.
  - 3. The computer-implemented method of claim 2, further comprising:
    - identifying a subset of DNA features of the sample individual that are disproportionately associated with individuals in different genetic communities; and
      
      identifying a subset of non-DNA features based on the subset of DNA features.
  - 4. The computer-implemented method of claim 1, wherein the non-DNA features include one or more of the following:
    - historical records, medical records, survey responses, or digital photographs.
  - 5. The computer-implemented method of claim 1, wherein the set of non-DNA features of the target individual include non-DNA features of related individuals that are genetically related to the target individual.
  - 6. The computer-implemented method of claim 1, further comprising:
    - normalizing the set of DNA features and/or the set of non-DNA features of the target individual.
  - 7. The computer-implemented method of claim 6, wherein normalizing comprises bringing one of the numerical values of a non-DNA feature to a range that is in the same order of magnitude of a numerical value of one of the DNA features.
  - 8. The computer-implemented method of claim 1, wherein at least one of the DNA features in the set is a value of single nucleotide polymorphism (SNP) at a genetic locus of the target individual.
  - 9. The computer-implemented method of claim 1, wherein the individual is a member of a genetic community, and wherein the prediction of whether the target individual has the trait is an initial prediction, the method further comprising:
    - determining a community prediction based on a prevalence of the trait among the genetic community;
      
      updating the initial prediction of whether the target individual has the trait based on the community prediction.

10. A computer-implemented method of predicting a trait of a target individual, comprising:
- accessing a first dataset representing a family tree of the target individual, the family tree describing relationships among the target individual and related individuals who are related to the target individual;
  
  generating, from the first dataset, a second dataset representing a trait tree, the trait tree comprising nodes representing the target individual and one or more related individuals, each node being connected to at least another node in accordance with the relationships described in the family tree, the nodes comprising (i) a first node representing the target individual, the first node having an unknown value, and (ii) a second node having a known value representing the trait that is known for one of the related individuals represented by the second node;
  
  determining the unknown value of the first node, the unknown value determined at least based on an inheritance probability propagated from the known value of the second node along one or more branches of the trait tree; and
  
  generating a prediction of the trait of the target individual based on a determined value of the first node.
- View Dependent Claims (11, 12, 13, 14)
- - 11. The computer-implemented method of claim 10, the method further comprising:
    - identifying a third node in the trait tree representing another related individual, the third node having an unknown value; and
      
      determining the unknown value of the third node, the unknown value determined at least based on the inheritance probability propagated from the known value of the second node along one or more branches of the trait tree and the prediction of the trait of the target individual.
  - 12. The computer-implemented method of claim 10, wherein generating a prediction further comprises:
    - estimating a community trait probability based on a prevalence of the trait among individuals associated with a genetic community of the target individual; and
      
      updating the prediction of the trait based on the community trait probability.
  - 13. The computer-implemented method of claim 10, wherein the determination of the unknown value of the first node is based on a probabilistic model.
  - 14. The computer-implemented method of claim 10, wherein the determination of the unknown value of the first node is based on updating the unknown values of one or more nodes by propagating calculations of inheritance probabilities in backwards and forwards directions along at least one branch of the trait tree.

15. A computer-implemented method of predicting a set of enriched record collections for a genetic community among a plurality of genetic communities, comprising:
- identifying individuals belonging to the genetic community;
  
  accessing a set of candidate record collections;
  
  determining a community count of each candidate record collection in the set based on how often the candidate record collection is associated with one of the individuals;
  
  determining a background count of each candidate record collection in the set based on how often the candidate record collection is associated with any individual in the plurality of genetic communities; and
  
  identifying the set of enriched record collections of the genetic community based on a comparison of the community count and background count of each candidate record collection.
- View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
- - 16. The computer-implemented method of claim 15, further comprising:
    - providing an indication that the set of enriched record collections is related to the genetic community.
  - 17. The computer-implemented method of claim 15, wherein the genetic community includes individuals that are genetically matched.
  - 18. The computer-implemented method of claim 15, wherein the genetic community includes individuals that are linked by one or more family trees.
  - 19. The computer-implemented method of claim 15, wherein the genetic community includes individuals that are a part of a carrier population.
  - 20. The computer-implemented method of claim 15, wherein the genetic community includes individuals that are a part of an admixed population.
  - 21. The computer-implemented method of claim 15, further comprising:
    - identifying an enriched characteristic among the individuals belonging to the genetic community.
  - 22. The computer-implemented method of claim 21, wherein identifying the enriched characteristic comprises:
    - determining a community frequency for one or more characteristics, a respective community frequency based on how often a respective characteristic is associated with individuals belonging to the genetic community;
      
      determining a background frequency for each of the one or more characteristics, a respective background frequency based on how often the respective characteristic exhibits in individuals of any of the plurality of genetic communities;
      
      identifying one or more enriched characteristics among the individuals belonging to the genetic community based on a comparison of the community frequency and background frequency of each of the one or more characteristics.
  - 23. The computer-implemented method of claim 22, wherein at least one of the enriched characteristics includes a trait, a phenotype, a historical environment, a location, an occupation, family arrangement, or socioeconomic status.
  - 24. The computer-implemented method of claim 15, further comprising:
    - filtering the set of enriched record collections based on an attribute of the individuals belonging to the genetic community, wherein the attribute includes one or more of;
      
      a gender, an age, a historical era, or a pedigree.
  - 25. The computer-implemented method of claim 15, wherein the set of enriched record collections includes DNA data of the individuals belonging to the genetic community.
  - 26. The computer-implemented method of claim 15, further comprising:
    - generating content items for the genetic community based on the set of enriched record collections.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Ancestry.com DNA LLC (Ancestry.com Incorporated)
Original Assignee
Ancestry.com DNA LLC (Ancestry.com Incorporated)
Inventors
Girshick, Ahna R., Telis, Natalie, Granka, Julie M., Haug Baltzell, Asher Keith, Song, Shiya

Granted Patent

US 10,896,742 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G16B 10/00   ICT specially adapted for e...

G16B 20/20   Allele or variant detection...

G16B 20/40   Population genetics; Linkag...

G16B 40/20   Supervised data analysis

G16B 40/30   Unsupervised data analysis

G16B 5/20   Probabilistic models

ESTIMATION OF PHENOTYPES USING DNA, PEDIGREE, AND HISTORICAL DATA

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

4 Citations

26 Claims

Specification

Use Cases

Quick Links

Others

ESTIMATION OF PHENOTYPES USING DNA, PEDIGREE, AND HISTORICAL DATA

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

4 Citations

26 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others