Systems and methods for sequence data alignment quality assessment

US 9,268,903 B2
Filed: 07/06/2011
Issued: 02/23/2016
Est. Priority Date: 07/06/2010
Status: Active Grant

First Claim

Patent Images

1. A method for classifying alignments of paired nucleic acid sequence reads, comprising:

disposing a nucleic acid sample within a sample chamber of a sequencing instrument, the nucleic acid sample comprising a plurality of target nucleic acids, the target nucleic acids including first and second tags, the first tag being derived from a first region of a polynucleotide and the second tag being derived from a second region of the polynucleotide, the first and second tags being separated by an insert region;

detecting, by a detection device, a plurality of signals representative of the sequence of at least one of the target nucleic acids of the nucleic acid sample;

generating, by a computing device comprising a processor and memory, a paired nucleic acid sequence read from the plurality of signals, the paired nucleic acid sequence read including a first read of the first tag and second read of the second tag;

determining, by the computing device, potential alignments for the first and second reads of the paired nucleic acid sequence read to a reference sequence, wherein each potential alignment satisfies a minimum threshold mismatch constraint;

identifying, by the computing device, potential paired alignments of the paired nucleic acid sequence read, wherein a distance between the first and second reads of each potential paired alignment is within an estimated insert size range; and

calculating, by the computing device, an alignment score for each potential paired alignment based on;

a distance between the first and second reads, anda total number of mismatches for the first and second reads.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A computer-implemented method for classifying alignments of paired nucleic acid sequence reads is disclosed. A plurality of paired nucleic acid sequence reads is received, wherein each read is comprised of a first tag and a second tag separated by an insert region. Potential alignments for the first and second tags of each read to a reference sequence is determined, wherein the potential alignments satisfies a minimum threshold mismatch constraint. Potential paired alignments of the first and second tags of each read are identified, wherein a distance between the first and second tags of each potential paired alignment is within an estimated insert size range. An alignment score is calculated for each potential paired alignment based on a distance between the first and second tags and a total number of mismatches for each tag.

Citations

13 Claims

1. A method for classifying alignments of paired nucleic acid sequence reads, comprising:
- disposing a nucleic acid sample within a sample chamber of a sequencing instrument, the nucleic acid sample comprising a plurality of target nucleic acids, the target nucleic acids including first and second tags, the first tag being derived from a first region of a polynucleotide and the second tag being derived from a second region of the polynucleotide, the first and second tags being separated by an insert region;
  
  detecting, by a detection device, a plurality of signals representative of the sequence of at least one of the target nucleic acids of the nucleic acid sample;
  
  generating, by a computing device comprising a processor and memory, a paired nucleic acid sequence read from the plurality of signals, the paired nucleic acid sequence read including a first read of the first tag and second read of the second tag;
  
  determining, by the computing device, potential alignments for the first and second reads of the paired nucleic acid sequence read to a reference sequence, wherein each potential alignment satisfies a minimum threshold mismatch constraint;
  
  identifying, by the computing device, potential paired alignments of the paired nucleic acid sequence read, wherein a distance between the first and second reads of each potential paired alignment is within an estimated insert size range; and
  
  calculating, by the computing device, an alignment score for each potential paired alignment based on;
  
  a distance between the first and second reads, anda total number of mismatches for the first and second reads.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method for classifying alignments of paired nucleic acid sequence reads, as recited in claim 1, wherein the paired nucleic acid sequence read is a mate-pair read.
  - 3. The method for classifying alignments of paired nucleic acid sequence reads, as recited in claim 1, wherein the paired nucleic acid sequence read is a paired-end read.
  - 4. The method for classifying alignments of paired nucleic acid sequence reads, as recited in claim 1, wherein the estimated insert size range is a standard deviation of a distribution of insert sizes of the insert regions of the aligned paired nucleic acid sequence reads.
  - 5. The method for classifying alignments of paired nucleic acid sequence reads, as recited in claim 1, wherein the calculated alignment score is a function of read alignment length.
  - 6. The method for classifying alignments of paired nucleic acid sequence reads, as recited in claim 1, wherein the calculated alignment score is a function of a total number of possible alignments for each read.

7. A system for identifying potential alignments for sequencing reads, comprising:
- a sequencing instrument comprising;
  
  a sample processing unit configured to accept a nucleic acid sample, the nucleic acid sample comprising a plurality of target nucleic acids, the target nucleic acids including first and second tags, the first tag being derived from a first region of a polynucleotide and the second tag being derived from a second region of the polynucleotide, the first and second tags being separated by an insert region;
  
  a reagent delivery system configured to provide reagents for sequencing the target nucleic acids to the sample processing unit;
  
  a signal detection unit configured to detect a plurality of signals during sequencing, the signals representative of a sequence of at least one of the target nucleic acids; and
  
  a computer processor in communication with the sequencer, the processor configured to;
  
  generate a paired nucleic acid sequence read from the plurality of signals,the paired nucleic acid sequence read including a first read of the first tag and second read of the second tag,perform alignments of the paired nucleic acid sequence read to a reference sequence,calculate a quality value for the respective alignments, the quality value being a function of the distance between the first and second reads, and a total number of mismatches for the first and second reads, andoutput each alignment with its associated quality value.
- View Dependent Claims (8, 9)
- - 8. The system for identifying potential alignments for sequencing reads, as recited in claim 7, wherein aligned paired nucleic acid sequence reads have insert region sizes that fall within an estimated insert size range for the aligned paired nucleic acid sequence reads.
  - 9. The system for identifying potential alignments for sequencing reads, as recited in claim 8, wherein the estimated insert size range is based on a standard deviation value derived from a distribution insert sizes of the insert regions of the aligned paired nucleic acid sequence reads.

10. A method for determining possible alignments for sequencing reads, comprising:
- disposing a nucleic acid sample within a sample chamber of a sequencing instrument, the nucleic acid sample comprising a plurality of target nucleic acids, the target nucleic acids including first and second tags, the first tag being derived from a first region of a polynucleotide and the second tag being derived from a second region of the polynucleotide, the first and second tags being separated by an insert region;
  
  detecting, by a detection device, a plurality of signals representative of the sequence of at least one of the target nucleic acids of the nucleic acid sample;
  
  generating, by a computing device comprising a processor and memory, a paired nucleic acid sequence read from the plurality of signals, the paired nucleic acid sequence read including a first read of the first tag and second read of the second tag;
  
  performing, by the computing device, alignments of the paired nucleic acid sequence read;
  
  calculating, by the computing device, a quality value for the alignments, the quality value being a function of the distance between the first and second reads, and a total number of mismatches for the first and second reads; and
  
  outputting, by the computing device, respective alignments and associated quality values.
- View Dependent Claims (11, 12, 13)
- - 11. The computer-implemented method for determining possible alignments for sequencing reads, as recited in claim 10, wherein the calculated quality value for each alignment is a function of read alignment length.
  - 12. The computer-implemented method for determining possible alignments for sequencing reads, as recited in claim 10, wherein aligned paired nucleic acid sequence reads have insert region sizes that fall within an estimated insert size range for the aligned paired reads.
  - 13. The computer-implemented method for determining possible alignments for sequencing reads, as recited in claim 12, wherein the estimated insert size range is based on a standard deviation value derived from a distribution of insert sizes of the insert regions of the aligned paired nucleic acid sequence reads.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Life Technologies Corporation (Thermo Fisher Scientific Incorporated)
Original Assignee
Life Technologies Corporation (Thermo Fisher Scientific Incorporated)
Inventors
Zhang, Zheng, Utiramerur, Sowmi, Hyland, Fiona
Primary Examiner(s)
Brusca, John S

Application Number

US13/177,267
Publication Number

US 20120011086A1
Time in Patent Office

1,693 Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06N 3/126   Evolutionary algorithms, e....

G06N 7/01   Probabilistic graphical mod...

G16B 30/00   ICT specially adapted for s...

G16B 30/10   Sequence alignment; Homolog...

G16B 30/20   Sequence assembly

G16B 40/00   ICT specially adapted for b...

Systems and methods for sequence data alignment quality assessment

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

13 Claims

Specification

Solutions

Use Cases

Quick Links

Systems and methods for sequence data alignment quality assessment

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

13 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links