Methods and systems for detecting sequence variants
First Claim
1. A system for identifying a mutation in proximity to a structural variation in a sequence, the system comprising:
- a processor coupled to a memory; and
a reference directed acyclic graph (DAG) stored in the memory, wherein the reference DAG represents at least two known reference sequences and includes alternative sequences at a first position in the known reference sequences, wherein at least one of the two alternative sequences is a structural variation, wherein the system is operable to;
obtain a plurality of sequence reads,align each sequence read to the alternative sequences in the reference DAG and determine a location on the reference DAG where an alignment score for that sequence read is optimized, andidentify a mutation with at least one of the sequence reads, wherein the mutation is aligned to the reference DAG within 100 bp of the structural variation.
10 Assignments
0 Petitions
Accused Products
Abstract
The invention provides methods for identifying rare variants near a structural variation in a genetic sequence, for example, in a nucleic acid sample taken from a subject. The invention additionally includes methods for aligning reads (e.g., nucleic acid reads) to a reference sequence construct accounting for the structural variation, methods for building a reference sequence construct accounting for the structural variation or the structural variation and the rare variant, and systems that use the alignment methods to identify rare variants. The method is scalable, and can be used to align millions of reads to a construct thousands of bases long, or longer.
76 Citations
14 Claims
-
1. A system for identifying a mutation in proximity to a structural variation in a sequence, the system comprising:
-
a processor coupled to a memory; and a reference directed acyclic graph (DAG) stored in the memory, wherein the reference DAG represents at least two known reference sequences and includes alternative sequences at a first position in the known reference sequences, wherein at least one of the two alternative sequences is a structural variation, wherein the system is operable to; obtain a plurality of sequence reads, align each sequence read to the alternative sequences in the reference DAG and determine a location on the reference DAG where an alignment score for that sequence read is optimized, and identify a mutation with at least one of the sequence reads, wherein the mutation is aligned to the reference DAG within 100 bp of the structural variation. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
Specification