Systems and methods for mitochondrial analysis
First Claim
1. A method for analyzing a mitochondrial genome from an organism, the method comprising using at least one hardware processor connected to a tangible memory subsystem to perform:
- creating, in the tangible memory subsystem, a mitochondrial DNA (mtDNA) reference graph representing a plurality of mitochondrial sequences, the mtDNA reference graph comprising a directed acyclic graph (DAG) comprising a plurality of vertices stored as objects in the tangible memory subsystem, wherein sequence strings of the plurality of mitochondrial sequences that match each other when aligned are each represented by a single common object and sequence strings that vary are represented as alternate objects, wherein at least one sequence string comprises a plurality of symbols, and wherein each object is stored in the tangible memory subsystem as a sequence string and a list of one or more pointers to adjacent objects, wherein each pointer identifies a physical location in the tangible memory subsystem at which an adjacent object is stored, such that the objects are linked to represent each of the mitochondrial sequences as a path through the mtDNA reference graph;
obtaining a plurality of sequence reads from a biological sample previously obtained from a subject;
aligning the plurality of sequence reads to paths through the mtDNA reference graph, wherein the aligning comprises calculating match scores between sequence reads in the plurality of sequence reads and sequence strings associated with vertices in the plurality of vertices, and looking backwards at each vertex, having one or more predecessor vertices, to the predecessor vertices if and only if a symbol comprises the first symbol of the sequence string associated with its vertex to select a path based on its score; and
providing a report that identifies one or more of the mitochondrial sequences that aligned to the plurality of sequence reads.
12 Assignments
0 Petitions
Accused Products
Abstract
The invention provides methods of analyzing an individual'"'"'s mtDNA by transforming available reference sequences into a directed graph that compactly represents all the information without duplication and comparing sequence reads from the mtDNA to the graph to identify the individual or describe their mtDNA. A directed graph can represent all of the genetic variation found among the mitochondrial genomes across all of a number of reference organisms while providing a single article to which sequence reads can be aligned or compared. Thus any sequence read or other sequence fragment can be compared, in a single operation, to the article that represents all of the reference mitochondrial sequences.
-
Citations
20 Claims
-
1. A method for analyzing a mitochondrial genome from an organism, the method comprising using at least one hardware processor connected to a tangible memory subsystem to perform:
-
creating, in the tangible memory subsystem, a mitochondrial DNA (mtDNA) reference graph representing a plurality of mitochondrial sequences, the mtDNA reference graph comprising a directed acyclic graph (DAG) comprising a plurality of vertices stored as objects in the tangible memory subsystem, wherein sequence strings of the plurality of mitochondrial sequences that match each other when aligned are each represented by a single common object and sequence strings that vary are represented as alternate objects, wherein at least one sequence string comprises a plurality of symbols, and wherein each object is stored in the tangible memory subsystem as a sequence string and a list of one or more pointers to adjacent objects, wherein each pointer identifies a physical location in the tangible memory subsystem at which an adjacent object is stored, such that the objects are linked to represent each of the mitochondrial sequences as a path through the mtDNA reference graph; obtaining a plurality of sequence reads from a biological sample previously obtained from a subject; aligning the plurality of sequence reads to paths through the mtDNA reference graph, wherein the aligning comprises calculating match scores between sequence reads in the plurality of sequence reads and sequence strings associated with vertices in the plurality of vertices, and looking backwards at each vertex, having one or more predecessor vertices, to the predecessor vertices if and only if a symbol comprises the first symbol of the sequence string associated with its vertex to select a path based on its score; and providing a report that identifies one or more of the mitochondrial sequences that aligned to the plurality of sequence reads. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A method of detecting mitochondrial heteroplasmy in a subject, the method comprising using at least one hardware processor connected to a tangible memory subsystem to perform:
-
creating, in the tangible memory subsystem, a mitochondrial DNA (mtDNA) reference graph representing a plurality of known variations in the mitochondrial genome, the mtDNA reference graph comprising a directed acyclic graph (DAG), in which each of the known variations is associated with a path through the DAG, the DAG comprising a plurality of objects, wherein each object is stored in the tangible memory subsystem as a sequence string and a list of one or more pointers to adjacent objects, wherein each sequence string represents mtDNA nucleotide sequence information and at least one sequence string comprises a plurality of symbols, and wherein each pointer identifies a physical location in the tangible memory subsystem at which an adjacent object is stored, such that the objects are linked to represent a plurality of mitochondrial genomes as a path through the mtDNA reference graph; obtaining a plurality of sequence reads from a biological sample previously obtained from a subject; aligning the plurality of sequence reads to the mtDNA reference graph, the aligning comprising finding a position on the DAG for a sequence read based on the sequence read and sequence strings associated with objects in the plurality of objects by calculating match scores between the sequence read and the sequence strings, and looking backwards at each object, having one or more predecessor objects, to the predecessor objects if and only if a symbol comprises the first symbol of the sequence string associated with its object to select a path based on its score; and identifying, based on the aligned sequence reads, at least one position in the mtDNA reference graph in which the aligned plurality of sequence reads align to alternate paths. - View Dependent Claims (14, 15, 16)
-
-
17. A method of identifying an unknown individual, comprising using at least one hardware processor connected to a tangible memory subsystem to perform:
-
creating, in the tangible memory subsystem, a mitochondrial DNA (mtDNA) reference graph representing a plurality of known variations in the mitochondrial genome, the mtDNA reference graph comprising a directed acyclic graph (DAG), in which each of the known variations is associated with a path through the DAG, the DAG comprising a plurality of objects, wherein each object is stored in the tangible memory subsystem, as a sequence string and a list of one or more pointers to adjacent objects, wherein each sequence string represents mtDNA nucleotide sequence information and at least one sequence string comprises a plurality of symbols, and wherein each pointer identifies a physical location in the tangible memory subsystem at which an adjacent object is stored, such that the objects are linked to represent a plurality of mitochondrial genomes as a plurality of paths through the mtDNA reference graph; obtaining a plurality of sequence reads from a biological sample previously obtained from a subject; aligning the plurality of sequence reads to the mtDNA reference graph, the aligning comprising finding a position on the DAG for a sequence read based on the sequence read and sequence strings associated with objects in the plurality of objects by calculating match scores between the sequence read and the sequence strings, and looking backwards at each object, having one or more predecessor objects, to the predecessor objects if and only if a symbol comprises the first symbol of the sequence string associated with its object to select a path based on its score; and determining, based on the aligned sequence reads, the identity of the unknown subject. - View Dependent Claims (18, 19, 20)
-
Specification