METHOD AND APPARATUS FOR CALLING SINGLE-NUCLEOTIDE VARIATIONS AND OTHER VARIATIONS
First Claim
Patent Images
1. A method of identifying at least one base call for a target sequence, the method comprising:
- accessing the reference sequence, the reference sequence including a plurality of base values that define the reference sequence;
accessing a plurality of sequencing reads, each sequencing read including a plurality of base values for a corresponding portion of the target sequence;
identifying a plurality of high-confidence locations in the sequencing reads, a high-confidence location being identified with a corresponding location in the reference sequence and satisfying a high-confidence condition for using base values of the sequencing reads at the high-confidence location to identify one or more base calls for the target sequence at the high-confidence location; and
identifying one or more base calls for the target sequence at a given location not satisfying the high-confidence condition by using base values of the sequencing reads at the high-confidence locations with base values of the sequencing reads at the given location and a base value of the reference sequence at the given location to identify the one or more base calls for the target sequence at the given location.
1 Assignment
0 Petitions
Accused Products
Abstract
Base calls for a target sequence may be identified relative to a reference sequence by using values from sequencing reads at locations satisfying a high-confidence condition to identify base calls at a given location not satisfying the high-confidence condition. The high-confidence condition may relate to the level of coverage by the sequencing reads at a location of the reference sequence. The quality of measurements of the sequencing reads may be incorporated into the base-call process.
-
Citations
20 Claims
-
1. A method of identifying at least one base call for a target sequence, the method comprising:
-
accessing the reference sequence, the reference sequence including a plurality of base values that define the reference sequence; accessing a plurality of sequencing reads, each sequencing read including a plurality of base values for a corresponding portion of the target sequence; identifying a plurality of high-confidence locations in the sequencing reads, a high-confidence location being identified with a corresponding location in the reference sequence and satisfying a high-confidence condition for using base values of the sequencing reads at the high-confidence location to identify one or more base calls for the target sequence at the high-confidence location; and identifying one or more base calls for the target sequence at a given location not satisfying the high-confidence condition by using base values of the sequencing reads at the high-confidence locations with base values of the sequencing reads at the given location and a base value of the reference sequence at the given location to identify the one or more base calls for the target sequence at the given location. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A non-transitory computer-readable medium that stores a computer program for identifying at least one base call for a target sequence, the computer program including instructions that, when executed by at least one computer, cause the at least one computer to perform operations comprising:
-
accessing the reference sequence, the reference sequence including a plurality of base values that define the reference sequence; accessing a plurality of sequencing reads, each sequencing read including a plurality of base values for a corresponding portion of the target sequence; identifying a plurality of high-confidence locations in the sequencing reads, a high-confidence location being identified with a corresponding location in the reference sequence and satisfying a high-confidence condition for using base values of the sequencing reads at the high-confidence location to identify one or more base calls for the target sequence at the high-confidence location; and identifying one or more base calls for the target sequence at a given location not satisfying the high-confidence condition by using base values of the sequencing reads at the high-confidence locations with base values of the sequencing reads at the given location and a base value of the reference sequence at the given location to identify the one or more base calls for the target sequence at the given location. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19)
-
-
20. An apparatus to identify at least one base call for a target sequence, the apparatus comprising at least one computer configured to perform operations for computer-implemented modules including:
-
a first access module to access the reference sequence, the reference sequence including a plurality of base values that define the reference sequence; a second access module to access a plurality of sequencing reads, each sequencing read including a plurality of base values for a corresponding portion of the target sequence; a first identification module to identify a plurality of high-confidence locations in the sequencing reads, a high-confidence location being identified with a corresponding location in the reference sequence and satisfying a high-confidence condition for using base values of the sequencing reads at the high-confidence location to identify one or more base calls for the target sequence at the high-confidence location; and a second identification module to identify one or more base calls for the target sequence at a given location not satisfying the high-confidence condition by using base values of the sequencing reads at the high-confidence locations with base values of the sequencing reads at the given location and a base value of the reference sequence at the given location to identify the one or more base calls for the target sequence at the given location.
-
Specification