Methods and systems for detecting sequence variants
First Claim
1. A method for identifying a mutation in proximity to a structural variation in a sequence, the method comprising the steps of:
- obtaining a plurality of nucleic acid sequence reads, wherein at least one nucleic acid read comprises a mutation;
comparing said reads to a reference sequence construct, wherein said reference sequence construct is stored in computer memory as a directed acyclic graph comprising at least two alternative sequences at a position in the reference sequence construct, one of which is a structural variation,scoring sequence overlaps for each nucleic acid read against the reference sequence construct;
aligning each read to a location on the construct such that the score for each read is maximized; and
identifying the mutation as being aligned within 100 bp or fewer of the structural variation.
12 Assignments
0 Petitions
Accused Products
Abstract
The invention provides methods for identifying rare variants near a structural variation in a genetic sequence, for example, in a nucleic acid sample taken from a subject. The invention additionally includes methods for aligning reads (e.g., nucleic acid reads) to a reference sequence construct accounting for the structural variation, methods for building a reference sequence construct accounting for the structural variation or the structural variation and the rare variant, and systems that use the alignment methods to identify rare variants. The method is scalable, and can be used to align millions of reads to a construct thousands of bases long, or longer.
84 Citations
15 Claims
-
1. A method for identifying a mutation in proximity to a structural variation in a sequence, the method comprising the steps of:
-
obtaining a plurality of nucleic acid sequence reads, wherein at least one nucleic acid read comprises a mutation; comparing said reads to a reference sequence construct, wherein said reference sequence construct is stored in computer memory as a directed acyclic graph comprising at least two alternative sequences at a position in the reference sequence construct, one of which is a structural variation, scoring sequence overlaps for each nucleic acid read against the reference sequence construct; aligning each read to a location on the construct such that the score for each read is maximized; and identifying the mutation as being aligned within 100 bp or fewer of the structural variation. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
Specification