Methods and systems for genotyping genetic samples
First Claim
Patent Images
1. A system for genotyping a genetic sample, the system comprising:
- a processor; and
a tangible, non-transitory memory storing a plurality of sequence reads corresponding to the genetic sample, and a reference directed acyclic graph (DAG) representing a reference sequence and genetic variation of the reference sequence, wherein the reference DAG comprises a first path corresponding to a first allele and a second path corresponding to a second allele at a first position, wherein the first allele comprises a genetic structural variation;
wherein the memory further comprises instructions that, when executed, cause the processor to;
align the plurality of sequence reads to the reference (DAG), wherein the aligning comprises;
comparing a string of symbols corresponding to a sequence read to the first path and the second path;
scoring overlaps between the string of symbols and each of the first path and the second path, wherein a higher score corresponds to a greater amount of overlap; and
identifying an overlap corresponding to the highest score for the sequence read, thereby aligning the sequence read to the reference DAG; and
determine a genotype for the genetic sample based upon the number of sequence reads aligned to the first path and the second path, wherein the determined genotype comprises the genetic structural variation.
10 Assignments
0 Petitions
Accused Products
Abstract
The invention provides methods and system for making specific base calls at specific loci using a reference sequence construct, e.g., a directed acyclic graph (DAG) that represents known variants at each locus of the genome. Because the sequence reads are aligned to the DAG during alignment, the subsequent step of comparing a mutation, vis-à-vis the reference genome, to a table of known mutations can be eliminated. The disclosed methods and systems are notably efficient in dealing with structural variations within a genome or mutations that are within a structural variation.
148 Citations
19 Claims
-
1. A system for genotyping a genetic sample, the system comprising:
-
a processor; and a tangible, non-transitory memory storing a plurality of sequence reads corresponding to the genetic sample, and a reference directed acyclic graph (DAG) representing a reference sequence and genetic variation of the reference sequence, wherein the reference DAG comprises a first path corresponding to a first allele and a second path corresponding to a second allele at a first position, wherein the first allele comprises a genetic structural variation; wherein the memory further comprises instructions that, when executed, cause the processor to; align the plurality of sequence reads to the reference (DAG), wherein the aligning comprises;
comparing a string of symbols corresponding to a sequence read to the first path and the second path;
scoring overlaps between the string of symbols and each of the first path and the second path, wherein a higher score corresponds to a greater amount of overlap; and
identifying an overlap corresponding to the highest score for the sequence read, thereby aligning the sequence read to the reference DAG; anddetermine a genotype for the genetic sample based upon the number of sequence reads aligned to the first path and the second path, wherein the determined genotype comprises the genetic structural variation. - View Dependent Claims (2, 3)
-
-
4. A method of genotyping a genetic sample, the method comprising:
using at least one computer hardware processor to perform; obtaining a plurality of sequence reads corresponding to a genetic sample; aligning the plurality of sequence reads to a reference directed acyclic graph (DAG) stored in a tangible, non-transitory memory connected to the at least one computer hardware processor, wherein the reference DAG comprises a first path corresponding to a first allele and a second path corresponding to a second allele at a first position, wherein the first allele comprises a genetic structural variation, wherein the aligning comprises;
comparing a string of symbols corresponding to a sequence read to the first path and the second path;
scoring overlaps between the string of symbols and each of the first path and the second path, wherein a higher score corresponds to a greater amount of overlap; and
identifying an overlap corresponding to the highest score for the sequence read, thereby aligning the sequence read to the reference DAG; anddetermining a genotype for the genetic sample based upon the number of sequence reads aligned to the first path and the second path, wherein the determined genotype comprises the genetic structural variation. - View Dependent Claims (5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
Specification