Algorithms for sequence determination
First Claim
Patent Images
1. A method of determining a consensus sequence, comprising:
- a) providing a plurality of replicate sequence reads;
b) aligning the plurality of replicate sequence reads to generate a multiple sequence alignment;
c) determining a percent of majority calls for each position in a region of interest in the multiple sequence alignment;
d) based upon the results of step c, identifying a set of sequential positions in the region of interest, wherein all positions in the set of sequential positions have a percent of majority calls below a threshold; and
e) using information from the plurality of replicate sequence reads for the set of sequential positions in the region of interest in a pattern classification algorithm to generate a consensus sequence spanning the set of sequential positions, whereina call by the pattern classification algorithm for a first position in the set of sequential positions is dependent upon a call made for a second position in the set of sequential positions that is made by the pattern classification algorithm,the first position and the second position are adjacent to each other, andat least steps a), b), and e) are performed using a suitably programmed computer.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention is generally directed to powerful and flexible methods and systems for consensus sequence determination from replicate biomolecule sequence data. It is an object of the present invention to improve the accuracy of consensus biomolecule sequence determination from replicate sequence data by providing methods for assimilating replicate sequence into a final consensus sequence more accurately than any one-pass sequence analysis system.
75 Citations
29 Claims
-
1. A method of determining a consensus sequence, comprising:
-
a) providing a plurality of replicate sequence reads; b) aligning the plurality of replicate sequence reads to generate a multiple sequence alignment; c) determining a percent of majority calls for each position in a region of interest in the multiple sequence alignment; d) based upon the results of step c, identifying a set of sequential positions in the region of interest, wherein all positions in the set of sequential positions have a percent of majority calls below a threshold; and e) using information from the plurality of replicate sequence reads for the set of sequential positions in the region of interest in a pattern classification algorithm to generate a consensus sequence spanning the set of sequential positions, wherein a call by the pattern classification algorithm for a first position in the set of sequential positions is dependent upon a call made for a second position in the set of sequential positions that is made by the pattern classification algorithm, the first position and the second position are adjacent to each other, and at least steps a), b), and e) are performed using a suitably programmed computer. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27)
-
-
28. A system for determining a consensus sequence, comprising:
-
a processor; and memory storing an application to be executed by the processor, wherein the application comprises instructions for; a) obtaining a plurality of replicate sequence reads; b) aligning the plurality of replicate sequence reads to generate a multiple sequence alignment; c) determining a percent of majority calls for each position in a region of interest in the multiple sequence alignment; d) based upon the results of the determining c), identifying a set of sequential positions in the region of interest, wherein all positions in the set of sequential positions have a percent of majority calls below a threshold; and e) using information from the plurality of replicate sequence reads for the set of sequential positions in the region of interest in a pattern classification algorithm to generate a consensus sequence spanning the set of sequential positions, wherein a call by the pattern classification algorithm for a first position in the set of sequential positions is dependent upon a call made for a second position in the set of sequential positions that is made by the pattern classification algorithm, and the first position and the second position are adjacent to each other.
-
-
29. A non-transitory computer readable storage medium storing an application to be executed by a processor of a computer, wherein the application is for determining a consensus sequence and wherein the application comprises instructions for:
-
a) obtaining a plurality of replicate sequence reads; b) aligning the plurality of replicate sequence reads to generate a multiple sequence alignment; c) determining a percent of majority calls for each position in a region of interest in the multiple sequence alignment; d) based upon the results of the determining c), identifying a set of sequential positions in the region of interest, wherein all positions in the set of sequential positions have a percent of majority calls below a threshold; and e) using information from the plurality of replicate sequence reads for the set of sequential positions in the region of interest in a pattern classification algorithm to generate a consensus sequence spanning the set of sequential positions, wherein a call by the pattern classification algorithm for a first position in the set of sequential positions is dependent upon a call made for a second position in the set of sequential positions that is made by the pattern classification algorithm, and the first position and the second position are adjacent to each other.
-
Specification