System and methods for indel identification using short read sequencing
First Claim
Patent Images
1. A computer implemented method of nucleic acid sequence analysis, comprising:
- receiving first nucleic acid sequence information comprising one or more mate pair sequences, wherein mate pair sequences comprise a first non-overlapping pairwise sequence and a second non-overlapping pairwise sequence separated by an intervening sequence length;
receiving second nucleic acid sequence information comprising at least one reference sequence;
performing a computer assisted mapping operation for the mate pair sequences in which the first non-overlapping pairwise sequence and the second non-overlapping pairwise sequence for a respective mate pair are aligned to the at least one reference sequence using a processor by the steps of;
performing a first mapping operation using a processor to align the first non-overlapping pairwise sequence of the mate pair sequences to the at least one reference sequence with a first selected mismatch constraint,identifying mate pair sequences having first non-overlapping pairwise sequences which are aligned to the at least one reference sequence while satisfying the selected mismatch constraint,designating a window region within the at least one reference sequence for the identified mate pair sequences based on the alignment of the first non-overlapping pairwise sequence to the at least one reference sequence,performing a second mapping operation using a processor to align the second non-overlapping pairwise sequence to the window region of the reference sequence with a second selected mismatch constraint,identifying mate pair sequences with first and second non-overlapping pairwise sequences that have mapped to the at least one reference sequence following performing the first and second mapping operations;
and,outputting the results of the mapping operations.
6 Assignments
0 Petitions
Accused Products
Abstract
Systems, methods, and analytical approaches for short read sequence assembly and for the detection of insertions and deletions (indels) in a reference genome. A method suitable for software implementation is presented in which indels may be readily identified in a computationally efficient manner.
61 Citations
20 Claims
-
1. A computer implemented method of nucleic acid sequence analysis, comprising:
-
receiving first nucleic acid sequence information comprising one or more mate pair sequences, wherein mate pair sequences comprise a first non-overlapping pairwise sequence and a second non-overlapping pairwise sequence separated by an intervening sequence length; receiving second nucleic acid sequence information comprising at least one reference sequence; performing a computer assisted mapping operation for the mate pair sequences in which the first non-overlapping pairwise sequence and the second non-overlapping pairwise sequence for a respective mate pair are aligned to the at least one reference sequence using a processor by the steps of; performing a first mapping operation using a processor to align the first non-overlapping pairwise sequence of the mate pair sequences to the at least one reference sequence with a first selected mismatch constraint, identifying mate pair sequences having first non-overlapping pairwise sequences which are aligned to the at least one reference sequence while satisfying the selected mismatch constraint, designating a window region within the at least one reference sequence for the identified mate pair sequences based on the alignment of the first non-overlapping pairwise sequence to the at least one reference sequence, performing a second mapping operation using a processor to align the second non-overlapping pairwise sequence to the window region of the reference sequence with a second selected mismatch constraint, identifying mate pair sequences with first and second non-overlapping pairwise sequences that have mapped to the at least one reference sequence following performing the first and second mapping operations; and, outputting the results of the mapping operations. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system for nucleic acid sequence analysis, comprising:
-
a data analysis component implemented on a computing device configured to; receive first nucleic acid sequence information for one or more mate pair sequences, wherein mate pair sequences comprise a first non-overlapping pairwise sequence and a second non-overlapping pairwise sequence separated by an intervening sequence length and further configured to receive second nucleic acid sequence information for at least one reference sequence; perform a mapping operation for the mate pair sequences in which the first non-overlapping pairwise sequence and the second non-overlapping pairwise sequence for a respective mate pair are aligned to the at least one reference sequence by the steps of; performing a first mapping operation aligning the first non-overlapping pairwise sequence of the mate pair sequences to the at least one reference sequence with a first selected mismatch constraint, identifying mate pair sequences having first non-overlapping pairwise sequences which are aligned to the at least one reference sequence while satisfying the selected mismatch constraint, designating a window region within the at least one reference sequence for the identified mate pair sequences based on the alignment of the first non-overlapping pairwise sequence to the at least one reference sequence, performing a second mapping operation to align the second non-overlapping pairwise sequence to the window region of the reference sequence with a second selected mismatch constraint, identify mate pair sequences with first and second non-overlapping pairwise sequences that have mapped to the at least one reference sequence following performing the first and second mapping operations; and, a data terminal for displaying the results of the mapping operations generated by the data analysis component to a user. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. A non-transitory computer-readable medium, the computer-readable medium being readable to execute a method of nucleic acid sequence analysis, the method comprising:
-
receiving first nucleic acid sequence information comprising one or more mate pair sequences, wherein mate pair sequences comprise a first non-overlapping pairwise sequence and a second non-overlapping pairwise sequence separated by an intervening sequence length; receiving second nucleic acid sequence information comprising at least one reference sequence; performing a computer assisted mapping operation for the mate pair sequences in which the first non-overlapping pairwise sequence and the second non-overlapping pairwise sequence for a respective mate pair are aligned to the at least one reference sequence by the steps of; performing a first mapping operation aligning the first non-overlapping pairwise sequence of mate pair sequences to the at least one reference sequence with a first selected mismatch constraint, identifying mate pair sequences having first non-overlapping pairwise sequences which are aligned to the at least one reference sequence while satisfying the selected mismatch constraint, designating a window region within the at least one reference sequence for the mate pair sequences based on the alignment of the first non-overlapping pairwise sequence to the at least one reference sequence, performing a second mapping operation to align the second non-overlapping pairwise sequence to the window region of the reference sequence with a second selected mismatch constraint, identifying mate pair sequences with first and second non-overlapping pairwise sequences that have mapped to the at least one reference sequence following performing the first and second mapping operations; and, outputting the results of the mapping operations. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
Specification