SYSTEM AND METHODS FOR INDEL IDENTIFICATION USING SHORT READ SEQUENCING
First Claim
Patent Images
1. A method of nucleic acid sequence analysis, comprising:
- receiving nucleic acid sequence information comprising one or more mate pair sequences, wherein mate pair sequences comprise non-overlapping pairwise sequences separated by an intervening sequence length;
receiving nucleic acid sequence information comprising at least one reference sequence;
performing a mapping operation for each of the one or more mate pair sequences in which the non-overlapping pairwise sequences are aligned to the at least one reference sequence by the steps of;
performing a first mapping operation aligning the non-overlapping pairwise sequences to the at least one reference sequence with a selected mismatch constraint identifying non-overlapping pairwise sequences for which one of the non-overlapping pairwise sequences align to the at least one reference sequence while satisfying the selected mismatch constraint;
performing a second mapping operation designating a window region of the reference sequence to align the non aligned pairwise sequence with a selected mismatch constraint;
identifying non-overlapping pairwise sequences that are successfully mapped following performing the first and second mapping operations;
and,outputting the results of the mapping operations.
6 Assignments
0 Petitions
Accused Products
Abstract
Systems, methods, and analytical approaches for short read sequence assembly and for the detection of insertions and deletions (indels) in a reference genome. A method suitable for software implementation is presented in which indels may be readily identified in a computationally efficient manner.
-
Citations
20 Claims
-
1. A method of nucleic acid sequence analysis, comprising:
-
receiving nucleic acid sequence information comprising one or more mate pair sequences, wherein mate pair sequences comprise non-overlapping pairwise sequences separated by an intervening sequence length; receiving nucleic acid sequence information comprising at least one reference sequence; performing a mapping operation for each of the one or more mate pair sequences in which the non-overlapping pairwise sequences are aligned to the at least one reference sequence by the steps of; performing a first mapping operation aligning the non-overlapping pairwise sequences to the at least one reference sequence with a selected mismatch constraint identifying non-overlapping pairwise sequences for which one of the non-overlapping pairwise sequences align to the at least one reference sequence while satisfying the selected mismatch constraint; performing a second mapping operation designating a window region of the reference sequence to align the non aligned pairwise sequence with a selected mismatch constraint; identifying non-overlapping pairwise sequences that are successfully mapped following performing the first and second mapping operations; and, outputting the results of the mapping operations. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system for nucleic acid sequence analysis, comprising:
-
a data analysis unit configured to; receive nucleic acid sequence information for one or more mate pair sequences, wherein mate pair sequences comprise non-overlapping pairwise sequences separated by an intervening sequence length and further configured to receive nucleic acid sequence information for at least one reference sequence; perform a mapping operation for each of the one or more mate pair sequences in which the non-overlapping pairwise sequences are aligned to the at least one reference sequence by the steps of; performing a first mapping operation aligning the non-overlapping pairwise sequences to the at least one reference sequence with a selected mismatch constraint identifying non-overlapping pairwise sequences for which one of the non-overlapping pairwise sequences align to the at least one reference sequence while satisfying the selected mismatch constraint; performing a second mapping operation designating a window region of the reference sequence to align the non aligned pairwise sequence with a selected mismatch constraint; identify non-overlapping pairwise sequences that are successfully mapped following performing the first and second mapping operations; and
,a data terminal for displaying the results of the mapping operations generated by the data analysis unit to a user. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. A computer-readable medium, the computer-readable medium being readable to execute a method of nucleic acid sequence analysis, the method comprising:
-
receiving nucleic acid sequence information comprising one or more mate pair sequences, wherein mate pair sequences comprise non-overlapping pairwise sequences separated by an intervening sequence length; receiving nucleic acid sequence information comprising at least one reference sequence; performing a mapping operation for each of the one or more mate pair sequences in which the non-overlapping pairwise sequences are aligned to the at least one reference sequence by the steps of; performing a first mapping operation aligning the non-overlapping pairwise sequences to the at least one reference sequence with a selected mismatch constraint identifying non-overlapping pairwise sequences for which one of the non-overlapping pairwise sequences align to the at least one reference sequence while satisfying the selected mismatch constraint; performing a second mapping operation designating a window region of the reference sequence to align the non aligned pairwise sequence with a selected mismatch constraint; identifying non-overlapping pairwise sequences that are successfully mapped following performing the first and second mapping operations; and, outputting the results of the mapping operations. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
Specification