Systems and methods for sequence data alignment quality assessment
First Claim
1. A method for classifying alignments of paired nucleic acid sequence reads, comprising:
- disposing a nucleic acid sample within a sample chamber of a sequencing instrument, the nucleic acid sample comprising a plurality of target nucleic acids, the target nucleic acids including first and second tags, the first tag being derived from a first region of a polynucleotide and the second tag being derived from a second region of the polynucleotide, the first and second tags being separated by an insert region;
detecting, by a detection device, a plurality of signals representative of the sequence of at least one of the target nucleic acids of the nucleic acid sample;
generating, by a computing device comprising a processor and memory, a paired nucleic acid sequence read from the plurality of signals, the paired nucleic acid sequence read including a first read of the first tag and second read of the second tag;
determining, by the computing device, potential alignments for the first and second reads of the paired nucleic acid sequence read to a reference sequence, wherein each potential alignment satisfies a minimum threshold mismatch constraint;
identifying, by the computing device, potential paired alignments of the paired nucleic acid sequence read, wherein a distance between the first and second reads of each potential paired alignment is within an estimated insert size range; and
calculating, by the computing device, an alignment score for each potential paired alignment based on;
a distance between the first and second reads, anda total number of mismatches for the first and second reads.
1 Assignment
0 Petitions
Accused Products
Abstract
A computer-implemented method for classifying alignments of paired nucleic acid sequence reads is disclosed. A plurality of paired nucleic acid sequence reads is received, wherein each read is comprised of a first tag and a second tag separated by an insert region. Potential alignments for the first and second tags of each read to a reference sequence is determined, wherein the potential alignments satisfies a minimum threshold mismatch constraint. Potential paired alignments of the first and second tags of each read are identified, wherein a distance between the first and second tags of each potential paired alignment is within an estimated insert size range. An alignment score is calculated for each potential paired alignment based on a distance between the first and second tags and a total number of mismatches for each tag.
-
Citations
13 Claims
-
1. A method for classifying alignments of paired nucleic acid sequence reads, comprising:
-
disposing a nucleic acid sample within a sample chamber of a sequencing instrument, the nucleic acid sample comprising a plurality of target nucleic acids, the target nucleic acids including first and second tags, the first tag being derived from a first region of a polynucleotide and the second tag being derived from a second region of the polynucleotide, the first and second tags being separated by an insert region; detecting, by a detection device, a plurality of signals representative of the sequence of at least one of the target nucleic acids of the nucleic acid sample; generating, by a computing device comprising a processor and memory, a paired nucleic acid sequence read from the plurality of signals, the paired nucleic acid sequence read including a first read of the first tag and second read of the second tag; determining, by the computing device, potential alignments for the first and second reads of the paired nucleic acid sequence read to a reference sequence, wherein each potential alignment satisfies a minimum threshold mismatch constraint; identifying, by the computing device, potential paired alignments of the paired nucleic acid sequence read, wherein a distance between the first and second reads of each potential paired alignment is within an estimated insert size range; and calculating, by the computing device, an alignment score for each potential paired alignment based on; a distance between the first and second reads, and a total number of mismatches for the first and second reads. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A system for identifying potential alignments for sequencing reads, comprising:
-
a sequencing instrument comprising; a sample processing unit configured to accept a nucleic acid sample, the nucleic acid sample comprising a plurality of target nucleic acids, the target nucleic acids including first and second tags, the first tag being derived from a first region of a polynucleotide and the second tag being derived from a second region of the polynucleotide, the first and second tags being separated by an insert region; a reagent delivery system configured to provide reagents for sequencing the target nucleic acids to the sample processing unit; a signal detection unit configured to detect a plurality of signals during sequencing, the signals representative of a sequence of at least one of the target nucleic acids; and a computer processor in communication with the sequencer, the processor configured to; generate a paired nucleic acid sequence read from the plurality of signals, the paired nucleic acid sequence read including a first read of the first tag and second read of the second tag, perform alignments of the paired nucleic acid sequence read to a reference sequence, calculate a quality value for the respective alignments, the quality value being a function of the distance between the first and second reads, and a total number of mismatches for the first and second reads, and output each alignment with its associated quality value. - View Dependent Claims (8, 9)
-
-
10. A method for determining possible alignments for sequencing reads, comprising:
-
disposing a nucleic acid sample within a sample chamber of a sequencing instrument, the nucleic acid sample comprising a plurality of target nucleic acids, the target nucleic acids including first and second tags, the first tag being derived from a first region of a polynucleotide and the second tag being derived from a second region of the polynucleotide, the first and second tags being separated by an insert region; detecting, by a detection device, a plurality of signals representative of the sequence of at least one of the target nucleic acids of the nucleic acid sample; generating, by a computing device comprising a processor and memory, a paired nucleic acid sequence read from the plurality of signals, the paired nucleic acid sequence read including a first read of the first tag and second read of the second tag; performing, by the computing device, alignments of the paired nucleic acid sequence read; calculating, by the computing device, a quality value for the alignments, the quality value being a function of the distance between the first and second reads, and a total number of mismatches for the first and second reads; and outputting, by the computing device, respective alignments and associated quality values. - View Dependent Claims (11, 12, 13)
-
Specification