Systems and methods for analyzing sequence data
First Claim
1. A method for genomic analysis, the method comprising:
- representing a plurality of nucleic acids from a population of individuals as a reference directed acyclic graph (DAG) stored in a non-transitory memory, wherein the reference DAG includes nodes connected by edges in which at least one node includes a string of a plurality of nucleotide characters corresponding to a nucleotide sequence found within the plurality of nucleic acids;
obtaining a second DAG representing a second plurality of nucleic acids, the second plurality of nucleic acids comprising nucleic acids from one or more individuals;
determining, using a processor coupled to the non-transitory memory, an alignment between the second DAG and the reference DAG; and
creating, from the alignment, an aligned DAG comprising an aligned combination of the reference DAG and the second DAG.
12 Assignments
0 Petitions
Accused Products
Abstract
The invention provides methods for comparing one set of genetic sequences to another without discarding any information within either set. A set of genetic sequences is represented using a directed acyclic graph (DAG) avoiding any unwarranted reduction to a linear data structure. The invention provides a way to align one sequence DAG to another to produce an alignment that can itself be stored as a DAG. DAG-to-DAG alignment is a natural choice wherever a set of genomic information consisting of more than one string needs to be compared to any non-linear reference. For example, a subpopulation DAG could be compared to a population DAG in order to compare the genetic features of that subpopulation to those of the population.
-
Citations
19 Claims
-
1. A method for genomic analysis, the method comprising:
-
representing a plurality of nucleic acids from a population of individuals as a reference directed acyclic graph (DAG) stored in a non-transitory memory, wherein the reference DAG includes nodes connected by edges in which at least one node includes a string of a plurality of nucleotide characters corresponding to a nucleotide sequence found within the plurality of nucleic acids; obtaining a second DAG representing a second plurality of nucleic acids, the second plurality of nucleic acids comprising nucleic acids from one or more individuals; determining, using a processor coupled to the non-transitory memory, an alignment between the second DAG and the reference DAG; and creating, from the alignment, an aligned DAG comprising an aligned combination of the reference DAG and the second DAG. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
Specification