Systems and methods for genotyping with graph reference
First Claim
1. A method of determining a genotype, the method comprising:
- providing a reference graph in computer memory, the reference graph representing a plurality of genomic sequences and comprising nodes connected by edges into a plurality of paths, wherein variation across the plurality of genomic sequences is incorporated as divergent paths converging to nodes representing conserved sequence, such that each of the plurality of genomic sequences is represented as a path in the reference graph;
wherein the plurality of paths further comprises a first alternate path representing a first haplotype A at a position and a second alternate path representing a second haplotype B at the position, wherein each of the first and second alternate paths converge to a node representing a conserved sequence;
identifying one or more genotypes corresponding to either or both of the first haplotype A and the second haplotype B, the one or more genotypes comprising a first genotype AA, a second genotype AB, and a third genotype BB;
mapping a plurality of sequence reads from an organism to the reference graph, wherein the mapping comprises looking backwards to predecessor paths to identify an optimal location for a sequence read across multiple alternate paths;
assigning, based on properties of the mapped plurality of sequence reads to the reference graph, scores to each of the identified genotypes, wherein the assigning considers whether a sequence read mapping to the first haplotype A changes the probability of the genotype including the second haplotype B; and
identifying one of the first genotype AA, second genotype AB, or third genotype BB as having a highest score in said assigning step, thereby providing a genotype.
12 Assignments
0 Petitions
Accused Products
Abstract
Genomic references are structured as a reference graph that represents diploid genotypes in organisms. A path through a series of connected nodes and edges represents a genetic sequence. Genetic variation within a diploid organism is represented by multiple paths through the reference graph. The graph may be transformed into a traversal graph in which a path represents a diploid genotype. Genetic analysis using the traversal graph allows an organism'"'"'s diploid genotype to be elucidated, e.g., by mapping sequence reads to the reference graph and scoring paths in the traversal graph based on the mapping to determine the path through the traversal graph that best fits the sequence reads.
-
Citations
20 Claims
-
1. A method of determining a genotype, the method comprising:
-
providing a reference graph in computer memory, the reference graph representing a plurality of genomic sequences and comprising nodes connected by edges into a plurality of paths, wherein variation across the plurality of genomic sequences is incorporated as divergent paths converging to nodes representing conserved sequence, such that each of the plurality of genomic sequences is represented as a path in the reference graph; wherein the plurality of paths further comprises a first alternate path representing a first haplotype A at a position and a second alternate path representing a second haplotype B at the position, wherein each of the first and second alternate paths converge to a node representing a conserved sequence; identifying one or more genotypes corresponding to either or both of the first haplotype A and the second haplotype B, the one or more genotypes comprising a first genotype AA, a second genotype AB, and a third genotype BB; mapping a plurality of sequence reads from an organism to the reference graph, wherein the mapping comprises looking backwards to predecessor paths to identify an optimal location for a sequence read across multiple alternate paths; assigning, based on properties of the mapped plurality of sequence reads to the reference graph, scores to each of the identified genotypes, wherein the assigning considers whether a sequence read mapping to the first haplotype A changes the probability of the genotype including the second haplotype B; and identifying one of the first genotype AA, second genotype AB, or third genotype BB as having a highest score in said assigning step, thereby providing a genotype. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A system for determining a genotype, the system comprising a processor coupled to a tangible memory subsystem having stored therein:
-
a reference graph representing a plurality of genomic sequences and comprising nodes connected by edges into a plurality of paths, wherein variation across the plurality of genomic sequences is incorporated as divergent paths converging to nodes representing conserved sequence, such that each of the plurality of genomic sequences is represented as a path in the reference graph; wherein the plurality of paths further comprises a first alternate path representing a first haplotype A at a position and a second alternate path representing a second haplotype B at the position, wherein each of the first and second alternate paths converge to a node representing a conserved sequence; and instructions that when executed by the processor cause the system to; identify one or more genotypes corresponding to either or both of the first haplotype A and the second haplotype B, the one or more genotypes comprising a first genotype AA, a second genotype AB, and a third genotype BB; map a plurality of sequence reads from an organism to the reference graph, wherein the mapping comprises looking backwards to predecessor paths to identify an optimal location for a sequence read across multiple alternate paths; assign—
based on properties of the mapped plurality of sequence reads to the reference graph—
scores to each of the identified genotypes, wherein the assigning considers whether a sequence read mapping to the first haplotype A changes the probability of the genotype including the second haplotype B; andidentify one of the first genotype AA, second genotype AB, or third genotype BB as having a highest score in said assigning step, thereby providing a genotype. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification