Analyzing genome sequencing information to determine likelihood of co-segregating alleles on haplotypes
First Claim
1. A method of determining at least part of a genome of an organism from one or more samples, the one or more samples including nucleic acid molecules of the organism, the method comprising:
- receiving sequencing information of a plurality of the nucleic acid molecules in the one or more samples;
identifying a plurality of loci of a first chromosome;
computing, with a computer system, a first strength conveying a likelihood that a first allele of a first locus and a second allele of a second locus are on a first 2-allele haplotype of the organism, wherein computing the first strength includes;
determining a first positive contribution to the likelihood based on sequencing information consistent with the first allele and the second allele being on the first 2-allele haplotype;
determining a first negative contribution to the likelihood based on sequencing information inconsistent with the first allele and the second allele being on the first 2-allele haplotype; and
using the first positive and first negative contributions to compute the first strength; and
calculating two haplotypes involving the first locus and the second locus using the first strength.
1 Assignment
0 Petitions
Accused Products
Abstract
Sequencing information is used to correlate alleles at certain locations to alleles at other locations. The statistical information from the reads of fragments in a sample can be used to determine the phasing of haplotypes and to correct or confirm based calls at the locations. In one example, a confidence value (strength score) is determined for a particular hypothesis, which can include whether two alleles are on a same haplotype at two particular loci, as well as what the alleles are on another haplotype (e.g. for a diploid organism). The strength can include a positive contribution from data that is consistent with the hypothesis and a negative contribution from data is that inconsistent with the hypothesis, where both values can be used in a formula to determine the strength.
16 Citations
26 Claims
-
1. A method of determining at least part of a genome of an organism from one or more samples, the one or more samples including nucleic acid molecules of the organism, the method comprising:
-
receiving sequencing information of a plurality of the nucleic acid molecules in the one or more samples; identifying a plurality of loci of a first chromosome; computing, with a computer system, a first strength conveying a likelihood that a first allele of a first locus and a second allele of a second locus are on a first 2-allele haplotype of the organism, wherein computing the first strength includes; determining a first positive contribution to the likelihood based on sequencing information consistent with the first allele and the second allele being on the first 2-allele haplotype; determining a first negative contribution to the likelihood based on sequencing information inconsistent with the first allele and the second allele being on the first 2-allele haplotype; and using the first positive and first negative contributions to compute the first strength; and calculating two haplotypes involving the first locus and the second locus using the first strength. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
-
-
25. A computer program product comprising a tangible computer readable medium storing a plurality of instructions for controlling a processor to perform an operation for determining at least part of a genome of an organism from one or more samples, the one or more samples including nucleic acid molecules of the organism, the instructions comprising:
-
receiving sequencing information of a plurality of the nucleic acid molecules in the one or more samples; identifying a plurality of loci of a first chromosome; computing a first strength conveying a likelihood that a first allele of a first locus and a second allele of a second locus are on a first 2-allele haplotype of the organism, wherein computing the first strength includes; determining a first positive contribution to the likelihood based on sequencing information consistent with the first allele and the second allele being on the first 2-allele haplotype; determining a first negative contribution to the likelihood based on sequencing information inconsistent with the first allele and the second allele being on the first 2-allele haplotype; and using the first positive and first negative contributions to compute the first strength; and calculating two haplotypes involving the first locus and the second locus using the first strength.
-
-
26. A system for determining at least part of a genome of an organism from one or more samples, the one or more samples including nucleic acid molecules of the organism, the system comprising:
-
an input for receiving sequencing information of a plurality of the nucleic acid molecules in the one or more samples; and one or more processors configured to; identify a plurality of loci of a first chromosome; compute a first strength conveying a likelihood that a first allele of a first locus and a second allele of a second locus are on a first 2-allele haplotype of the organism, wherein computing the first strength includes; determining a first positive contribution to the likelihood based on sequencing information consistent with the first allele and the second allele being on the first 2-allele haplotype; determining a first negative contribution to the likelihood based on sequencing information inconsistent with the first allele and the second allele being on the first 2-allele haplotype; and using the first positive and first negative contributions to compute the first strength; and calculate two haplotypes involving the first locus and the second locus using the first strength.
-
Specification