SYSTEMS AND METHODS FOR DETERMINING STRUCTURAL VARIATION AND PHASING USING VARIANT CALL DATA

US 20160232291A1
Filed: 02/09/2016
Published: 08/11/2016
Est. Priority Date: 02/09/2015
Status: Active Grant

First Claim

Patent Images

1. A method of determining a likelihood of a structural variation occurring in a test nucleic acid obtained from a single biological sample, the method comprising:

at a computer system having one or more processors, and memory storing one or more programs for execution by the one or more processors;

(A) obtaining a plurality of sequence reads from a plurality of sequencing reactions in which the test nucleic acid is fragmented, whereineach respective sequence read in the plurality of sequence reads comprises a first portion that corresponds to a subset of the test nucleic acid and a second portion that encodes a respective barcode for the respective sequence read in a plurality of barcodes, andeach respective barcode is independent of the sequencing data of the test nucleic acid, andthe plurality of sequence reads collectively include the plurality of barcodes;

(B) obtaining bin information for a plurality of bins, whereineach respective bin in the plurality of bins represents a different portion of the test nucleic acid,the bin information identifies, for each respective bin in the plurality of bins, a set of sequence reads in a plurality of sets of sequence reads that are in the plurality of sequence reads, andthe respective first portion of each respective sequence read in each respective set of sequence reads in the plurality of sets of sequence reads corresponds to a subset of the test nucleic acid that at least partially overlaps the different portion of the test nucleic acid that is represented by the bin corresponding to the respective set of sequence reads;

(C) identifying, from among the plurality of bins, a first bin and a second bin that correspond to portions of the test nucleic acid that are nonoverlapping, wherein the first bin is represented by a first set of sequence reads in the plurality of sequence reads and the second bin is represented by a second set of sequence reads in the plurality of sequence reads;

(D) determining a first value that represents a numeric probability or likelihood that the number of barcodes common to the first set and the second set is attributable to chance;

(E) responsive to a determination that the first value satisfies a predetermined cut-off value, for each barcode that is common to the first bin and the second bin, obtaining a fragment pair thereby obtaining one or more fragment pairs, each fragment pair in the one or more fragment pairs (i) corresponding to a different barcode that is common to the first bin and the second bin and (ii) consisting of a different first calculated fragment and a different second calculated fragment, wherein, for each respective fragment pair in the one or more fragment pairs;

the different first calculated fragment consists of a respective first subset of sequence reads in the plurality of sequence reads having the barcode corresponding to the respective fragment pair, whereineach sequence read in the respective first subset of sequence reads is within a predefined genetic distance of another sequence read in the respective first subset of sequence reads,the different first calculated fragment of the respective fragment pair originates with a first sequence read having the barcode corresponding to the respective fragment pair in the first bin, andeach sequence read in the respective first subset of sequence reads is from the first bin, andthe different second calculated fragment consists of a respective second subset of sequence reads in the plurality of sequence reads having the barcode corresponding to the respective fragment pair, whereineach sequence read in the respective second subset of sequence reads is within a predefined genetic distance of another sequence read in the respective second subset of sequence reads,the different second calculated fragment of the respective fragment pair originates with a second sequence read having the barcode corresponding to the respective fragment pair in the second bin, andeach sequence read in the respective second subset of sequence reads is from the second bin; and

(F) computing a respective likelihood based upon a probability of occurrence of a first model and a probability of occurrence of a second model regarding the one or more fragment pairs to thereby provide a likelihood of a structural variation in the test nucleic acid, wherein(i) the first model specifies that the respective first calculated fragments and the respective second calculated fragments of the one or more fragment pairs are observed given no structural variation in the target nucleic acid sequence and are part of a common molecule, and(ii) the second model specifies that the respective first calculated fragments and the respective second calculated fragments of the one or more fragment pairs are observed given structural variation in the target nucleic acid sequence.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods for determining structural variation and phasing using variant call data obtained from nucleic acid of a biological sample are provided. Sequence reads are obtained, each comprising a portion corresponding to a subset of the test nucleic acid and a portion encoding a barcode independent of the sequencing data. Bin information is obtained. Each bin represents a different portion of the sample nucleic acid. Each bin corresponds to a set of sequence reads in a plurality of sets of sequence reads formed from the sequence reads such that each sequence read in a respective set of sequence reads corresponds to a subset of the nucleic acid represented by the bin corresponding to the respective set. Binomial tests identify bin pairs having more sequence reads with the same barcode in common than expected by chance. Probabilistic models determine structural variation likelihood from the sequence reads of these bin pairs.

Citations

53 Claims

1. A method of determining a likelihood of a structural variation occurring in a test nucleic acid obtained from a single biological sample, the method comprising:
- at a computer system having one or more processors, and memory storing one or more programs for execution by the one or more processors;
  
  (A) obtaining a plurality of sequence reads from a plurality of sequencing reactions in which the test nucleic acid is fragmented, whereineach respective sequence read in the plurality of sequence reads comprises a first portion that corresponds to a subset of the test nucleic acid and a second portion that encodes a respective barcode for the respective sequence read in a plurality of barcodes, andeach respective barcode is independent of the sequencing data of the test nucleic acid, andthe plurality of sequence reads collectively include the plurality of barcodes;
  
  (B) obtaining bin information for a plurality of bins, whereineach respective bin in the plurality of bins represents a different portion of the test nucleic acid,the bin information identifies, for each respective bin in the plurality of bins, a set of sequence reads in a plurality of sets of sequence reads that are in the plurality of sequence reads, andthe respective first portion of each respective sequence read in each respective set of sequence reads in the plurality of sets of sequence reads corresponds to a subset of the test nucleic acid that at least partially overlaps the different portion of the test nucleic acid that is represented by the bin corresponding to the respective set of sequence reads;
  
  (C) identifying, from among the plurality of bins, a first bin and a second bin that correspond to portions of the test nucleic acid that are nonoverlapping, wherein the first bin is represented by a first set of sequence reads in the plurality of sequence reads and the second bin is represented by a second set of sequence reads in the plurality of sequence reads;
  
  (D) determining a first value that represents a numeric probability or likelihood that the number of barcodes common to the first set and the second set is attributable to chance;
  
  (E) responsive to a determination that the first value satisfies a predetermined cut-off value, for each barcode that is common to the first bin and the second bin, obtaining a fragment pair thereby obtaining one or more fragment pairs, each fragment pair in the one or more fragment pairs (i) corresponding to a different barcode that is common to the first bin and the second bin and (ii) consisting of a different first calculated fragment and a different second calculated fragment, wherein, for each respective fragment pair in the one or more fragment pairs;
  
  the different first calculated fragment consists of a respective first subset of sequence reads in the plurality of sequence reads having the barcode corresponding to the respective fragment pair, whereineach sequence read in the respective first subset of sequence reads is within a predefined genetic distance of another sequence read in the respective first subset of sequence reads,the different first calculated fragment of the respective fragment pair originates with a first sequence read having the barcode corresponding to the respective fragment pair in the first bin, andeach sequence read in the respective first subset of sequence reads is from the first bin, andthe different second calculated fragment consists of a respective second subset of sequence reads in the plurality of sequence reads having the barcode corresponding to the respective fragment pair, whereineach sequence read in the respective second subset of sequence reads is within a predefined genetic distance of another sequence read in the respective second subset of sequence reads,the different second calculated fragment of the respective fragment pair originates with a second sequence read having the barcode corresponding to the respective fragment pair in the second bin, andeach sequence read in the respective second subset of sequence reads is from the second bin; and
  
  (F) computing a respective likelihood based upon a probability of occurrence of a first model and a probability of occurrence of a second model regarding the one or more fragment pairs to thereby provide a likelihood of a structural variation in the test nucleic acid, wherein(i) the first model specifies that the respective first calculated fragments and the respective second calculated fragments of the one or more fragment pairs are observed given no structural variation in the target nucleic acid sequence and are part of a common molecule, and(ii) the second model specifies that the respective first calculated fragments and the respective second calculated fragments of the one or more fragment pairs are observed given structural variation in the target nucleic acid sequence.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
- - 2. The method of claim 1, wherein the first bin and the second bin are at least 50 kilobases apart on the test nucleic acid.
  - 3. The method of claim 1, wherein the determining (D) uses a binomial test to compute the first value of the form:
    - p=1−
      
      P_Binom(n;
      
      n₁n₂/B)wherein,p is the first value, expressed as a p-value,n is the number of unique barcodes that is found in both in the first and second set of sequence reads,n₁is the number of unique barcodes in the first set of sequence reads,n₂is the number of unique barcodes in the second set of sequence reads, andB is the total number of unique barcodes across the plurality of bins.
  - 4. The method of claim 1, wherein the single biological sample is human, the test nucleic acid is the genome of the biological sample, and the first value satisfies the predetermined cut-off value when the first value is 10⁻
    - 14 or less or when the first value is 10^−
      
      15or less.
  - 5. The method claim 1, wherein each bin in the plurality of bins represents at least 20 kilobases of the test nucleic acid, at least 50 kilobases of the test nucleic acid, at least 100 kilobases of the test nucleic acid, at least 250 kilobases of the test nucleic acid, or at least 500 kilobases of the test nucleic acid.
  - 6. The method of claim 1, wherein each respective sequence read in each respective set of sequence reads in the plurality of sequence reads has a respective first portion that corresponds to a subset of the test nucleic acid that fully overlaps the different portion of the test nucleic acid that is represented by the bin corresponding to the respective set of sequence reads.
  - 7. The method of claim 1, wherein the barcode in the second portion of each respective sequence read in the plurality of sequence reads encodes a unique predetermined value selected from the set {1, . . . , 1024}, selected from the set {1, . . . , 4096}, selected from the set {1, . . . , 16384}, selected from the set {1, . . . , 65536}, selected from the set {1, . . . , 262144}, selected from the set {1, . . . , 1048576}, selected from the set {1, . . . , 4194304}, selected from the set {1, . . . , 16777216}, selected from the set {1, . . . , 67108864}, or selected from the set {1, . . . , 1×
    - 10¹²}.
  - 8. The method of claim 1, wherein the structural variation is deemed to have occurred, the method further comprising treating a subject that originated the biological sample with a treatment regimen responsive to the structural variation.
  - 9. The method claim 1, wherein an identity of the first and second bin is determined by the identifying (C) using sparse matrix multiplication of the form:
    - V=A₁^TA₂,wherein,A₁is a first B×
      
      N₁matrix of barcodes that includes the first bin,A₂is a second B×
      
      N₂matrix of barcodes that includes the second bin,B is the number of unique barcodes across the plurality of bins,N₁is the number of bins in A₁,N₂is the number of bins in A₂, andA₁^Tis the transpose of matrix A₁.
  - 10. The method of claim 1, wherein the computed likelihood in the computing (F) is computed as:
  - 11. The method of claim 10, wherein
    P(r₁,r₂,l₁,l₂,d|no SV;
    - a_b)=P(r₁,r₂,l₁,l₂,d|SM, no SV;
      
      a_b)P(SM|no SV)+P(r₁,r₂,l₁,l₂,d|DM, no SV;
      
      a_b)P(DM|no SV),wherein,SM is the hypothesis that the first calculated molecule and the second calculated molecule originated from the same fragment of the test nucleic acid in the plurality of sequencing reactions,DM is the hypothesis that the first calculated molecule and the second calculated molecule originated from different fragments of the test nucleic acid in the plurality of sequencing reactions,
      P(r₁,r₂,l₁,l₂,d|DM, no SV;
      
      a_b)=P_frag(r₁,l₁;
      
      a_b)P_frag(r₂,l₂;
      
      a_b), whereinP_frag(r₁,l₁;
      
      a_b) is the probability of observing r₁reads from a first molecule of unknown length such that the reads span an observed length of l₁, andP_frag(r₂,l₂;
      
      a_b) is the probability of observing r₂reads from a second molecule of unknown length such that the reads span an observed length of l₂.
  - 12. The method of claim 11, wherein P_frag(r₁,l₁;
    - a_b) and P_frag(r₂,l₂;
      
      a_b) are each computed as
  - 13. The method of claim 11, whereinP(r₁,r₂,l₁,l₂,d|SM, no SV;
    - a_b) is computed as
  - 14. The method of claim 10, wherein
    P(r₁,r₂,l₁,l₂,d|SV;
    - a_b)=P(r₁,r₂,l₁,l₂,2d′
      
      |SM, no SV;
      
      a_b)P(SM|no SV)+P(r₁,r₂,l₁,l₂,2d′
      
      |DM, no SV;
      
      a_b)P(DM|no SV),wherein,SM is the hypothesis that the first calculated molecule and the second calculated molecule originated from the same fragment of the test nucleic acid in the plurality of sequencing reactions,DM is the hypothesis that the first calculated molecule and the second calculated molecule originated from different fragments of the test nucleic acid in the plurality of sequencing reactions,
      P(r₁,r₂,l₁,l₂,2d′
      
      |DM,SV;
      
      a_b)=P_frag(r₁,l₁;
      
      a_b)P_frag(r₂,l₂;
      
      a_b), whereinP_frag(r₁,l₁;
      
      a_b) is the probability of observing r₁reads from a first molecule of unknown length such that the reads span an observed length of l₁, andP_frag(r₂,l₂;
      
      a_b) is the probability of observing r₂reads from a second molecule of unknown length such that the reads span an observed length of l₂, and2d′
      
      =is a distance between the first calculated fragment and the second calculated fragment of the respective fragment pair in the test nucleic acid taking into account an estimate of the breakpoints of a structural variation associated with the first calculated molecule and the second calculated molecule.
  - 15. The method of claim 14, wherein P_frag(r₁,l₁;
    - a_b) and P_frag(r₂,l₂;
      
      a_b) are each computed as
  - 16. The method of claim 14, whereinP(r₂,l₁,l₂,2d′
    - |SM,SV;
      
      a_b) is computed as
  - 17. The method of claim 1, wherein the (D) through (F) are computed for a plurality of first and second bins, thereby calling one or more structural variations in the test nucleic acid, the method further comprising refining a breakpoint location in the test nucleic acid using the plurality of sequence reads and the calling one or more structural variations.
  - 18. The method of claim 1, wherein the test nucleic acid is loaded onto a plurality of barcoded-oligo coated gel-beads from which the plurality of sequence reads is obtained and wherein the test nucleic acid is 50 ng or less.
  - 19. The method of claim 18, wherein the plurality of barcoded-oligo coated gel-beads comprises 10,000 beads, the test nucleic acid is 2.5 ng or less, and the plurality of sequencing reads is obtained within ten minutes of exposure to a plurality of barcodes.

20. A computing system, comprising:
- one or more processors;
  
  memory storing one or more programs to be executed by the one or more processors;
  
  the one or more programs comprising instructions for;
  
  (A) obtaining a plurality of sequence reads from a plurality of sequencing reactions in which the test nucleic acid is fragmented, whereineach respective sequence read in the plurality of sequence reads comprises a first portion that corresponds to a subset of the test nucleic acid and a second portion that encodes a respective barcode for the respective sequence read in a plurality of barcodes,each respective barcode is independent of the sequencing data of the test nucleic acid, andthe plurality of sequence reads collectively include the plurality of barcodes;
  
  (B) obtaining bin information for a plurality of bins, whereineach respective bin in the plurality of bins represents a different portion of the test nucleic acid,the bin information identifies, for each respective bin in the plurality of bins, a set of sequence reads in a plurality of sets of sequence reads that are in the plurality of sequence reads, andthe respective first portion of each respective sequence read in each respective set of sequence reads in the plurality of sets of sequence reads corresponds to a subset of the test nucleic acid that at least partially overlaps the different portion of the test nucleic acid that is represented by the bin corresponding to the respective set of sequence reads;
  
  (C) identifying, from among the plurality of bins, a first bin and a second bin that correspond to portions of the test nucleic acid that are nonoverlapping, wherein the first bin is represented by a first set of sequence reads in the plurality of sequence reads and the second bin is represented by a second set of sequence reads in the plurality of sequence reads;
  
  (D) determining a first value that represents a numeric probability or likelihood that the number of barcodes common to the first set and the second set is attributable to chance;
  
  (E) responsive to a determination that the first value satisfies a predetermined cut-off value, for each barcode that is common to the first bin and the second bin, obtaining a fragment pair thereby obtaining one or more fragment pairs, each fragment pair in the one or more fragment pairs (i) corresponding to a different barcode that is common to the first bin and the second bin and (ii) consisting of a different first calculated fragment and a different second calculated fragment, wherein, for each respective fragment pair in the one or more fragment pairs;
  
  the different first calculated fragment consists of a respective first subset of sequence reads in the plurality of sequence reads having the barcode corresponding to the respective fragment pair, whereineach sequence read in the respective first subset of sequence reads is within a predefined genetic distance of another sequence read in the respective first subset of sequence reads,the different first calculated fragment of the respective fragment pair originates with a first sequence read having the barcode corresponding to the respective fragment pair in the first bin, andeach sequence read in the respective first subset of sequence reads is from the first bin, andthe different second calculated fragment consists of a respective second subset of sequence reads in the plurality of sequence reads having the barcode corresponding to the respective fragment pair, whereineach sequence read in the respective second subset of sequence reads is within a predefined genetic distance of another sequence read in the respective second subset of sequence reads,the different second calculated fragment of the respective fragment pair originates with a second sequence read having the barcode corresponding to the respective fragment pair in the second bin, andeach sequence read in the respective second subset of sequence reads is from the second bin; and
  
  (F) computing a respective likelihood based upon a probability of occurrence of a first model and a probability of occurrence of a second model regarding the one or more fragment pairs to thereby provide a likelihood of a structural variation in the test nucleic acid, wherein(i) the first model specifies that the respective first calculated fragments and the respective second calculated fragments of the one or more fragment pairs are observed given no structural variation in the target nucleic acid sequence and are part of a common molecule, and(ii) the second model specifies that the respective first calculated fragments and the respective second calculated fragments of the one or more fragment pairs are observed given structural variation in the target nucleic acid sequence.

21. A non-transitory computer readable storage medium storing one or more programs configured for execution by a computer, the one or more programs comprising instructions for:
- (A) obtaining a plurality of sequence reads from a plurality of sequencing reactions in which the test nucleic acid is fragmented, whereineach respective sequence read in the plurality of sequence reads comprises a first portion that corresponds to a subset of the test nucleic acid and a second portion that encodes a respective barcode for the respective sequence read in a plurality of barcodes,each respective barcode is independent of the sequencing data of the test nucleic acid, andthe plurality of sequence reads collectively include the plurality of barcodes;
  
  (B) obtaining bin information for a plurality of bins, whereineach respective bin in the plurality of bins represents a different portion of the test nucleic acid,the bin information identifies, for each respective bin in the plurality of bins, a set of sequence reads in a plurality of sets of sequence reads that are in the plurality of sequence reads, andthe respective first portion of each respective sequence read in each respective set of sequence reads in the plurality of sets of sequence reads corresponds to a subset of the test nucleic acid that at least partially overlaps the different portion of the test nucleic acid that is represented by the bin corresponding to the respective set of sequence reads;
  
  (C) identifying, from among the plurality of bins, a first bin and a second bin that correspond to portions of the test nucleic acid that are nonoverlapping, wherein the first bin is represented by a first set of sequence reads in the plurality of sequence reads and the second bin is represented by a second set of sequence reads in the plurality of sequence reads;
  
  (D) determining a first value that represents a numeric probability or likelihood that the number of barcodes common to the first set and the second set is attributable to chance;
  
  (E) responsive to a determination that the first value satisfies a predetermined cut-off value, for each barcode that is common to the first bin and the second bin, obtaining a fragment pair thereby obtaining one or more fragment pairs, each fragment pair in the one or more fragment pairs (i) corresponding to a different barcode that is common to the first bin and the second bin and (ii) consisting of a different first calculated fragment and a different second calculated fragment, wherein, for each respective fragment pair in the one or more fragment pairs;
  
  the different first calculated fragment consists of a respective first subset of sequence reads in the plurality of sequence reads having the barcode corresponding to the respective fragment pair, whereineach sequence read in the respective first subset of sequence reads is within a predefined genetic distance of another sequence read in the respective first subset of sequence reads,the different first calculated fragment of the respective fragment pair originates with a first sequence read having the barcode corresponding to the respective fragment pair in the first bin, andeach sequence read in the respective first subset of sequence reads is from the first bin, andthe different second calculated fragment consists of a respective second subset of sequence reads in the plurality of sequence reads having the barcode corresponding to the respective fragment pair, whereineach sequence read in the respective second subset of sequence reads is within a predefined genetic distance of another sequence read in the respective second subset of sequence reads,the different second calculated fragment of the respective fragment pair originates with a second sequence read having the barcode corresponding to the respective fragment pair in the second bin, andeach sequence read in the respective second subset of sequence reads is from the second bin; and
  
  (F) computing a respective likelihood based upon a probability of occurrence of a first model and a probability of occurrence of a second model regarding the one or more fragment pairs to thereby provide a likelihood of a structural variation in the test nucleic acid, wherein(i) the first model specifies that the respective first calculated fragments and the respective second calculated fragments of the one or more fragment pairs are observed given no structural variation in the target nucleic acid sequence and are part of a common molecule, and(ii) the second model specifies that the respective first calculated fragments and the respective second calculated fragments of the one or more fragment pairs are observed given structural variation in the target nucleic acid sequence.

22. A method of phasing sequencing data of a test nucleic acid sample obtained from a biological sample from a single organism of a species, wherein the test nucleic acid sample comprises a first set of haplotypes (H₀) and a second set of haplotypes (H₁), the method comprising:
- at a computer system having one or more processors, and memory storing one or more programs for execution by the one or more processors;
  
  (A) obtaining a reference consensus sequence for all or a portion of a genome of the species;
  
  (B) obtaining a plurality of variant calls A_i;
  
  pfor the biological sample, whereini is an index to a position in the reference consensus sequence, andpε
  
  {0, 1} in which label 0 assigns a respective variant call in A_i;
  
  pto H_Oand label 1 assigns the respective variant call to H₁;
  
  (C) obtaining a plurality of sequence reads {right arrow over (O)} for the biological sample, whereineach respective sequence read {right arrow over (O)}_iin the plurality of sequence reads comprises a first portion that corresponds to a subset of the reference sequence and a second portion that encodes a respective barcode, independent of the reference sequence, for the respective sequence read, in a plurality of barcodes, andeach respective sequence read {right arrow over (O)}i in the plurality of sequence reads is ε
  
  {0, 1, −
  
  }ⁿ, wherein (i) n is the number of variants calls in A_i;
  
  p, (ii) each respective label 0 for the respective sequence read {right arrow over (O)}i assigns a corresponding variant call in A_i;
  
  pto H₀, (iii) each respective label 1 for the respective sequence read {right arrow over (O)}i assigns a corresponding variant call in A_i;
  
  pto H_O, and (iv) each respective label −
  
  for the respective sequence read {right arrow over (O)}i indicates that the corresponding variant call in A_i;
  
  pis not covered; and
  
  (D) refining a phasing result {right arrow over (X)} by optimization of haplotype assignments at individual positions i in A_i;
  
  pbetween H_Oand H₁for the plurality of sequence reads using the relationship;

23. A method of addressing error in the zygosity of variant calls in phasing sequencing data of a test nucleic acid sample obtained from a biological sample from a single organism of a species, wherein the test nucleic acid sample comprises a first set of haplotypes (H₀) and a second set of haplotypes (H₁), the method comprising:
- at a computer system having one or more processors, and memory storing one or more programs for execution by the one or more processors;
  
  (A) obtaining a reference consensus sequence for all or a portion of a genome of the species;
  
  (B) obtaining a plurality of variant calls A_i;
  
  pfor the biological sample, whereini is an index to a position in the reference consensus sequence, andpε
  
  {0, 1, −
  
  1} in which label 0 assigns a respective variant call in A_i;
  
  pto H_O, label 1 assigns the respective variant call to H₁, and label −
  
  1 assigns the respective variant call to the zygosity error condition H_−
  
  1;
  
  (C) obtaining a plurality of sequence reads {right arrow over (O)} for the biological sample, whereineach respective sequence read {right arrow over (O)}_iin the plurality of sequence reads comprises a first portion that corresponds to a subset of the reference sequence and a second portion that encodes a respective barcode, independent of the reference sequence, for the respective sequence read, in a plurality of barcodes, andeach respective sequence read {right arrow over (O)}i in the plurality of sequence reads is ε
  
  {0, 1, −
  
  1, −
  
  }ⁿ, wherein (i) n is the number of variants calls in A_i;
  
  p, (ii) each respective label 0 for the respective sequence read {right arrow over (O)}i assigns a corresponding variant call in A_i;
  
  pto H₀, (iii) each respective label 1 for the respective sequence read {right arrow over (O)}i assigns a corresponding variant call in A_i;
  
  pto H_O, (iv) each respective label −
  
  1 for the respective sequence read {right arrow over (O)}i assigns a corresponding variant call in A_i;
  
  p, to H_−
  
  1, and (v) each respective label −
  
  for the respective sequence read {right arrow over (O)}i indicates that the corresponding variant call in A_i;
  
  pis not covered; and
  
  (D) refining a phasing vector result {right arrow over (X)} by optimization of haplotype assignments at individual positions i in A_i;
  
  pbetween H_O, H₁and H_−
  
  1for the plurality of sequence reads using an overall objective function;
- View Dependent Claims (24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37)
- - 24. The method of claim 23, wherein
    (O_1,f, . . . ,O_N,f|{right arrow over (X)},H_f=0)=Π
    - _iP(O_i,f|A_i,X_i),
      P(O_1,f, . . . ,O_N,f|{right arrow over (X)},H_f=1)=Π
      
      _iP(O_i,f|A_i,1-X_i),
      P(O_1,f, . . . ,O_N,f|{right arrow over (X)},H_f=M)=Π
      
      _i0.5.M indicates a mixture of H_f=0 and H_f=1 for the respective barcode f,
  - 25. The method of claim 23, whereinthe first set of haplotypes (H₀) consists of maternal haplotypes for the single organism, andthe second set of haplotypes (H₁) consists of paternal haplotypes for the single organism.
  - 26. The method of claim 23 wherein the plurality of barcodes comprises 1000 or more barcodes, the plurality of variant calls A_i;
    - p comprises 1000 or more variant calls, and the plurality of sequence reads comprises 10,000 or more sequence reads.
  - 27. The method of claim 23, wherein {right arrow over (X)} is (x), whereinx is a binary string of length n,each value of 0 in x indicates origination of the corresponding variant call in the first set of haplotypes (H₀), andeach value of 1 in x indicates origination of the corresponding variant call in the second set of haplotypes (H₁).
  - 28. The method of claim 23, wherein the subset of sequence reads that include the same respective barcode f comprises 10 or more sequence reads.
  - 29. The method of claim 23, wherein the refining (D) optimizes the overall objective function using a hierarchical search over {right arrow over (X)}.
  - 30. The method of claim 29, wherein the hierarchical search comprises:
    - for each respective local block of variant calls in A_i;
      
      pthat are localized to a corresponding subset of the reference consensus sequence, using a beam search over the assignments of X_k, X_k+1, . . . , X_k+jin the respective local block of variant calls, wherein k is the first variant in the respective local block of variant calls, j is a number of variant calls in the respective local block of variant calls and wherein assignments of X_k, X_k+1, . . . , X_k+jare found by computing the objective function in which the phasing vector of the objective function in respective computations is limited to X_k, X_k+1, . . . , X_k+j, thereby finding an optimal phasing solution for each respective local block of variant calls, andgreedily joining neighboring local blocks of variant calls in A_i;
      
      pusing the optimal phasing solution for each respective local block of variant calls thereby obtaining an estimate of the optimal phasing configuration {circumflex over (X)}.
  - 31. The method of claim 30, wherein the refining the phase result further comprises iteratively swapping the phase result of individual x₁in the estimate of the optimal phasing configuration {circumflex over (X)} and recomputing the objective function, thereby obtaining {circumflex over (X)}.
  - 32. The method of claim 30, wherein a respective local block of variant calls consists of between 20 and 60 variants in A_i;
    - p.
  - 33. The method of claim 30, wherein an iteration of the beam search for the assignments of one of X_k, X_k+1, . . . , X_k+jdiscards all but a predetermined number of solutions for {circumflex over (X)}.
  - 34. The method of claim 23, wherein the barcode in the second portion of each respective sequence read in the plurality of sequence reads O encodes a unique predetermined value selected from the set {1, . . . , 1024}, selected from the set {1, . . . , 4096}, selected from the set {1, . . . , 16384}, selected from the set {1, . . . , 65536}, selected from the set {1, . . . , 262144}, selected from the set {1, . . . , 1048576}, selected from the set {1, . . . , 4194304}, selected from the set {1, . . . , 16777216}, selected from the set {1, . . . , 67108864}, or selected from the set {1, . . . , 1×
    - 10¹²}.
  - 35. The method of claim 23, wherein the plurality of variant calls is obtained from the plurality of sequence reads.
  - 36. The method of claim 23, wherein the plurality of sequence reads is obtained from a plurality of barcoded-oligo coated gel-beads and wherein the test nucleic acid sample is 50 ng or less.
  - 37. The method of claim 36 wherein the plurality of barcoded-oligo coated gel-beads comprises 10,000 beads, the test nucleic acid sample is 2.5 ng or less, and the plurality of sequencing reads {right arrow over (O)} is obtained within ten minutes of exposure to the plurality of barcodes.

38. A computing system, comprising:
- one or more processors;
  
  memory storing one or more programs to be executed by the one or more processors;
  
  the one or more programs comprising instructions for phasing sequencing data of a test nucleic acid sample obtained from a biological sample from a single organism of a species, wherein the test nucleic acid sample comprises a first set of haplotypes (H₀) and a second set of haplotypes (H₁), by executing a method comprising;
  
  (A) obtaining a reference consensus sequence for all or a portion of a genome of the species;
  
  (B) obtaining a plurality of variant calls A_i;
  
  pfor the biological sample, whereini is an index to a position in the reference consensus sequence, andpε
  
  {0, 1} in which label 0 assigns a respective variant call in A_i;
  
  pto H_Oand label 1 assigns the respective variant call to H₁;
  
  (C) obtaining a plurality of sequence reads {right arrow over (O)} for the biological sample, whereineach respective sequence read {right arrow over (O)}_iin the plurality of sequence reads comprises a first portion that corresponds to a subset of the reference sequence and a second portion that encodes a respective barcode, independent of the reference sequence, for the respective sequence read, in a plurality of barcodes, andeach respective sequence read {right arrow over (O)}i in the plurality of sequence reads is ε
  
  {0, 1, −
  
  }ⁿ, wherein (i) n is the number of variants calls in A_i;
  
  p, (ii) each respective label 0 for the respective sequence read {right arrow over (O)}i assigns a corresponding variant call in A_i;
  
  pto H₀, (iii) each respective label 1 for the respective sequence read {right arrow over (O)}i assigns a corresponding variant call in A_i;
  
  pto H_O, and (iv) each respective label −
  
  for the respective sequence read {right arrow over (O)}i indicates that the corresponding variant call in A_i;
  
  pis not covered; and
  
  (D) refining a phasing result {right arrow over (X)} by optimization of haplotype assignments at individual positions i in A_i;
  
  pbetween H_Oand H₁for the plurality of sequence reads using the relationship;

39. A computing system, comprising:
- one or more processors;
  
  memory storing one or more programs to be executed by the one or more processors;
  
  the one or more programs comprising instructions addressing error in the zygosity of variant calls in phasing sequencing data of a test nucleic acid sample obtained from a biological sample from a single organism of a species, wherein the test nucleic acid sample comprises a first set of haplotypes (H₀) and a second set of haplotypes (H₁), by executing a method comprising;
  
  A) obtaining a reference consensus sequence for all or a portion of a genome of the species;
  
  (B) obtaining a plurality of variant calls A_i;
  
  pfor the biological sample, whereini is an index to a position in the reference consensus sequence, andpε
  
  {0, 1, −
  
  1} in which label 0 assigns a respective variant call in A_i;
  
  pto H_O, label 1 assigns the respective variant call to H₁, and label −
  
  1 assigns the respective variant call to the zygosity error condition H_−
  
  1;
  
  (C) obtaining a plurality of sequence reads {right arrow over (O)} for the biological sample, whereineach respective sequence read {right arrow over (O)}_iin the plurality of sequence reads comprises a first portion that corresponds to a subset of the reference sequence and a second portion that encodes a respective barcode, independent of the reference sequence, for the respective sequence read, in a plurality of barcodes, andeach respective sequence read {right arrow over (O)}i in the plurality of sequence reads is ε
  
  {0, 1, −
  
  1, −
  
  }ⁿ, wherein (i) n is the number of variants calls in A_i;
  
  p, (ii) each respective label 0 for the respective sequence read {right arrow over (O)}i assigns a corresponding variant call in A_i;
  
  pto H₀, (iii) each respective label 1 for the respective sequence read {right arrow over (O)}i assigns a corresponding variant call in A_i;
  
  pto H_O, (iv) each respective label −
  
  1 for the respective sequence read {right arrow over (O)}i assigns a corresponding variant call in A_i;
  
  p, to H_−
  
  1, and (v) each respective label −
  
  for the respective sequence read {right arrow over (O)}i indicates that the corresponding variant call in A_i;
  
  pis not covered; and
  
  (D) refining a phasing vector result {right arrow over (X)} by optimization of haplotype assignments at individual positions i in A_i;
  
  pbetween H_O, H₁and H_−
  
  1for the plurality of sequence reads using an overall objective function;
- View Dependent Claims (40)
- - 40. The computing system of claim 39, wherein
    (O_1,f, . . . ,O_N,f|{right arrow over (X)},H_f=0)=Π
    - _iP(O_i,f|A_i,X_i),
      P(O_1,f, . . . ,O_N,f|{right arrow over (X)},H_f=1)=Π
      
      _iP(O_i,f|A_i,1-X_i),
      P(O_1,f, . . . ,O_N,f|{right arrow over (X)},H_f=M)=Π
      
      _i0.5.M indicates a mixture of H_f=0 and H_f=1 for the respective barcode f,

41. A non-transitory computer readable storage medium storing one or more programs configured for execution by a computer, the one or more programs comprising instructions for phasing sequencing data of a test nucleic acid sample obtained from a biological sample from a single organism of a species, wherein the test nucleic acid sample comprises a first set of haplotypes (H₀) and a second set of haplotypes (H₁), the one or more programs collectively executing a method comprising:
- (A) obtaining a reference consensus sequence for all or a portion of a genome of the species;
  
  (B) obtaining a plurality of variant calls A_i;
  
  pfor the biological sample, whereini is an index to a position in the reference consensus sequence, andpε
  
  {0, 1} in which label 0 assigns a respective variant call in A_i;
  
  pto H_Oand label 1 assigns the respective variant call to H₁;
  
  (C) obtaining a plurality of sequence reads {right arrow over (O)} for the biological sample, whereineach respective sequence read {right arrow over (O)}_iin the plurality of sequence reads comprises a first portion that corresponds to a subset of the reference sequence and a second portion that encodes a respective barcode, independent of the reference sequence, for the respective sequence read, in a plurality of barcodes, andeach respective sequence read {right arrow over (O)}i in the plurality of sequence reads is ε
  
  {0, 1, −
  
  }ⁿ, wherein (i) n is the number of variants calls in A_i;
  
  p, (ii) each respective label 0 for the respective sequence read {right arrow over (O)}i assigns a corresponding variant call in A_i;
  
  pto H₀, (iii) each respective label 1 for the respective sequence read {right arrow over (O)}i assigns a corresponding variant call in A_i;
  
  pto H_O, and (iv) each respective label −
  
  for the respective sequence read {right arrow over (O)}i indicates that the corresponding variant call in A_i;
  
  pis not covered; and
  
  (D) refining a phasing result {right arrow over (X)} by optimization of haplotype assignments at individual positions i in A_i;
  
  pbetween H_Oand H₁for the plurality of sequence reads using the relationship;

42. A non-transitory computer readable storage medium storing one or more programs configured for execution by a computer, the one or more programs comprising instructions for addressing error in the zygosity of variant calls in phasing sequencing data of a test nucleic acid sample obtained from a biological sample from a single organism of a species, wherein the test nucleic acid sample comprises a first set of haplotypes (H₀) and a second set of haplotypes (H₁), the one or more programs collectively executing a method comprising:
- (A) obtaining a reference consensus sequence for all or a portion of a genome of the species;
  
  (B) obtaining a plurality of variant calls A_i;
  
  pfor the biological sample, whereini is an index to a position in the reference consensus sequence, andpε
  
  {0, 1, −
  
  1} in which label 0 assigns a respective variant call in A_i;
  
  pto H_O, label 1 assigns the respective variant call to H₁, and label −
  
  1 assigns the respective variant call to the zygosity error condition H_−
  
  1;
  
  (C) obtaining a plurality of sequence reads {right arrow over (O)} for the biological sample, whereineach respective sequence read {right arrow over (O)}_iin the plurality of sequence reads comprises a first portion that corresponds to a subset of the reference sequence and a second portion that encodes a respective barcode, independent of the reference sequence, for the respective sequence read, in a plurality of barcodes, andeach respective sequence read {right arrow over (O)}i in the plurality of sequence reads is ε
  
  {0, 1, −
  
  1, −
  
  }ⁿ, wherein (i) n is the number of variants calls in A_i;
  
  p, (ii) each respective label 0 for the respective sequence read {right arrow over (O)}i assigns a corresponding variant call in A_i;
  
  pto H₀, (iii) each respective label 1 for the respective sequence read {right arrow over (O)}i assigns a corresponding variant call in A_i;
  
  pto H_O, (iv) each respective label −
  
  1 for the respective sequence read {right arrow over (O)}i assigns a corresponding variant call in A_i;
  
  pto H_−
  
  1, and (v) each respective label −
  
  for the respective sequence read {right arrow over (O)}i indicates that the corresponding variant call in A_i;
  
  pis not covered; and
  
  (D) refining a phasing vector result {right arrow over (X)} by optimization of haplotype assignments at individual positions i in A_i;
  
  pbetween H_O, H₁and H_−
  
  1for the plurality of sequence reads using an overall objective function;

43. The non-transitory computer readable storage medium of claim 43, wherein
(O_1,f, . . . ,O_N,f|{right arrow over (X)},H_f=0)=Π
- _iP(O_i,f|A_i,X_i),
  P(O_1,f, . . . ,O_N,f|{right arrow over (X)},H_f=1)=Π
  
  _iP(O_i,f|A_i,1-X_i),
  P(O_1,f, . . . ,O_N,f|{right arrow over (X)},H_f=M)=Π
  
  _i0.5.M indicates a mixture of H_f=0 and H_f=1 for the respective barcode f,

44. A method of phasing sequencing data of a test nucleic acid sample obtained from a biological sample from a single organism of a species, wherein the test nucleic acid sample comprises a first set of haplotypes (H_O) and a second set of haplotypes (H₁), the method comprising:
- at a computer system having one or more processors, and memory storing one or more programs for execution by the one or more processors;
  
  (A) obtaining a plurality of variant calls A_i;
  
  pfor the test nucleic acid sample, whereini is an index to a position in a reference consensus sequence for all or a portion of a genome of the species, andpε
  
  {0, 1} in which label 0 assigns a respective variant call in A_i;
  
  pto H=0 and label 1 assigns the respective variant call to H=1;
  
  (B) for each respective local block of variant calls in A_i;
  
  pthat are localized to a corresponding subset of the reference consensus sequence, using a beam search over the haplotype assignments of local phasing vectors X_k, X_k+1, . . . , X_k+jin the respective local block of variant calls, whereink is the first variant in the respective local block of variant calls,j is a number of variant calls in the respective local block of variant calls,assignments of X_k, X_k+1, . . . , X_k+jare found by computing an objective function in which the phasing vector of the objective function in respective computations is limited to X_k, X_k+1, . . . , X_k+j, andthe objective function is calculated by matching observed sequence reads of the test nucleic acid sample against the respective local block of variant calls in A_i;
  
  p,thereby finding a phasing solution for each respective local block of variant calls in A_p; and
  
  (C) greedily joining, upon completion of the beam search for each respective local block of variant calls in A_i;
  
  pneighboring local blocks of variant calls in A_i;
  
  pusing the phasing solution for each respective local block of variant calls thereby obtaining a phasing configuration {circumflex over (X)} for the single organism of the species.

45. A method of phasing sequencing data of a test nucleic acid sample obtained from a biological sample from a single organism of a species while accounting for error in variant call zygosity, wherein the test nucleic acid sample comprises a first set of haplotypes (H₀) and a second set of haplotypes (H₁), the method comprising:
- at a computer system having one or more processors, and memory storing one or more programs for execution by the one or more processors;
  
  (A) obtaining a plurality of variant calls A_i;
  
  p, whereini is an index to a position in a reference consensus sequence for all or a portion of a genome of the species, andpε
  
  {0, 1, −
  
  1} in which label 0 assigns a respective variant call in A_i;
  
  pto H₀, label 1 assigns the respective variant call to H₁, and label −
  
  1 assigns the respective variant call to a zygosity error condition H_−
  
  1,(B) for each respective local block of variant calls in A_i;
  
  pthat are localized to a corresponding subset of the reference consensus sequence, using a beam search over the haplotype assignments of local phasing vectors X_k, X_k+1, . . . , X_k+jin the respective local block of variant calls, whereink is the first variant in the respective local block of variant calls,j is a number of variant calls in the respective local block of variant calls,assignments of X_k, X_k+1, . . . , X_k+jare found by computing an objective function in which the phasing vector of the objective function in respective computations is limited to X_k, X_k+1, . . . , X_k+j, andthe objective function is calculated by matching observed sequence reads of the test nucleic acid sample against the respective local block of variant calls in A_i;
  
  p,thereby finding a phasing solution for each respective local block of variant calls in A_i;
  
  p; and
  
  (C) greedily joining, upon completion of the beam search for each respective local block of variant calls in A_i;
  
  p, neighboring local blocks of variant calls in A_i;
  
  pusing the phasing solution for each respective local block of variant calls thereby obtaining a phasing configuration {circumflex over (X)} for the single organism of the species.
- View Dependent Claims (46, 47, 48, 49)
- - 46. The method of claim 45, the method further comprising iteratively swapping the phase result of individual x_iin {circumflex over (X)} and recomputing the objective function, thereby obtaining {circumflex over (X)}.
  - 47. The method of claim 45, wherein a respective local block of variant calls consists of between 20 and 60 variants in A_i;
    - p.
  - 48. The method claim 45, wherein an iteration of the beam search for the assignments of one of X_k, X_k+1, . . . , X_k+jdiscards all but a predetermined number of solutions for {circumflex over (X)}.
  - 49. The method of claim 45, wherein the test nucleic acid sample is loaded onto a plurality of barcoded-oligo coated gel-beads from which a plurality of sequence reads is obtained in order to derive the plurality of variant calls A_i;
    - p and wherein the test nucleic acid sample is 10 ng or less, and wherein the plurality of sequencing reads is obtained within ten minutes of exposure to the plurality of barcodes.

50. A computing system, comprising:
- one or more processors;
  
  memory storing one or more programs to be executed by the one or more processors;
  
  the one or more programs comprising instructions for phasing sequencing data of a test nucleic acid sample obtained from a biological sample from a single organism of a species, wherein the test nucleic acid sample comprises a first set of haplotypes (H_O) and a second set of haplotypes (H₁) by executing a method comprising;
  
  (A) obtaining a plurality of variant calls A_i;
  
  p, whereini is an index to a position in a reference consensus sequence for all or a portion of a genome of the species, andpε
  
  {0, 1, −
  
  1} in which label 0 assigns a respective variant call in A_i;
  
  pto H₀, label 1 assigns the respective variant call to H₁, and label −
  
  1 assigns the respective variant call to a zygosity error condition H_−
  
  1;
  
  (B) for each respective local block of variant calls in A_i;
  
  pthat are localized to a corresponding subset of the reference consensus sequence, using a beam search over the haplotype assignments of local phasing vectors X_k, X_k+1, . . . , X_k+jin the respective local block of variant calls, whereink is the first variant in the respective local block of variant calls,j is a number of variant calls in the respective local block of variant calls,assignments of X_k, X_k+1, . . . , X_k+jare found by computing an objective function in which the phasing vector of the objective function in respective computations is limited to X_k, X_k+1, . . . , X_k+j, andthe objective function is calculated by matching observed sequence reads of the test nucleic acid sample against the respective local block of variant calls in A_i;
  
  p,thereby finding a phasing solution for each respective local block of variant calls in A_i;
  
  p; and
  
  (C) greedily joining, upon completion of the beam search for each respective local block of variant calls in A_i;
  
  p, neighboring local blocks of variant calls in A_i;
  
  pusing the phasing solution for each respective local block of variant calls thereby obtaining a phasing configuration {circumflex over (X)} for the single organism of the species.

51. A computing system, comprising:
- one or more processors;
  
  memory storing one or more programs to be executed by the one or more processors;
  
  the one or more programs comprising instructions for phasing sequencing data of a test nucleic acid sample obtained from a biological sample from a single organism of a species while accounting for error in variant call zygosity, wherein the test nucleic acid sample comprises a first set of haplotypes (H₀) and a second set of haplotypes (H₁), the one or more programs executing a method comprising;
  
  (A) obtaining a plurality of variant calls A_i;
  
  p, whereini is an index to a position in a reference consensus sequence for all or a portion of a genome of the species, andpε
  
  {0, 1} in which label 0 assigns a respective variant call in A_i;
  
  pto H_Oand label 1 assigns the respective variant call to H₁;
  
  (B) for each respective local block of variant calls in A_i;
  
  pthat are localized to a corresponding subset of the reference consensus sequence, using a beam search over the haplotype assignments of local phasing vectors X_k, X_k+1, . . . , X_k+jin the respective local block of variant calls, whereink is the first variant in the respective local block of variant calls,j is a number of variant calls in the respective local block of variant calls,assignments of X_k, X_k+1, . . . , X_k+jare found by computing an objective function in which the phasing vector of the objective function in respective computations is limited to X_k, X_k+1, . . . , X_k+j, andthe objective function is calculated by matching observed sequence reads of the test nucleic acid sample against the respective local block of variant calls in A_i;
  
  p, thereby finding a phasing solution for each respective local block of variant calls in A_i;
  
  p; and
  
  (C) greedily joining, upon completion of the beam search for each respective local block of variant calls in A_i;
  
  p, neighboring local blocks of variant calls in A_i;
  
  pusing the phasing solution for each respective local block of variant calls thereby obtaining a phasing configuration {circumflex over (X)} for the single organism of the species.

52. A non-transitory computer readable storage medium storing one or more programs configured for execution by a computer, the one or more programs comprising instructions for phasing sequencing data of a test nucleic acid sample obtained from a biological sample from a single organism of a species, wherein the test nucleic acid sample comprises a first set of haplotypes (H_O) and a second set of haplotypes (H₁), the one or more programs collectively executing a method comprising:
- A) obtaining a plurality of variant calls A_i;
  
  p, whereini is an index to a position in a reference consensus sequence for all or a portion of a genome of the species, andpε
  
  {0, 1} in which label 0 assigns a respective variant call in A_i;
  
  pto H_Oand label 1 assigns the respective variant call to H₁;
  
  (B) for each respective local block of variant calls in A_i;
  
  pthat are localized to a corresponding subset of the reference consensus sequence, using a beam search over the haplotype assignments of local phasing vectors X_k, X_k+1, . . . , X_k+jin the respective local block of variant calls, whereink is the first variant in the respective local block of variant calls,j is a number of variant calls in the respective local block of variant calls,assignments of X_k, X_k+1, . . . , X_k+jare found by computing an objective function in which the phasing vector of the objective function in respective computations is limited to X_k, X_k+1, . . . , X_k+j, andthe objective function is calculated by matching observed sequence reads of the test nucleic acid sample against the respective local block of variant calls in A_i;
  
  p,thereby finding a phasing solution for each respective local block of variant calls in A_i;
  
  p; and
  
  (C) greedily joining, upon completion of the beam search for each respective local block of variant calls in A_i;
  
  p, neighboring local blocks of variant calls in A_i;
  
  pusing the phasing solution for each respective local block of variant calls thereby obtaining a phasing configuration {circumflex over (X)} for the single organism of the species.

53. A non-transitory computer readable storage medium storing one or more programs configured for execution by a computer, the one or more programs comprising instructions for phasing sequencing data of a test nucleic acid sample obtained from a biological sample from a single organism of a species while accounting for error in variant call zygosity, wherein the test nucleic acid sample comprises a first set of haplotypes (H₀) and a second set of haplotypes (H₁), the one or more programs collectively executing a method comprising:
- (A) obtaining a plurality of variant calls A_i;
  
  p, whereini is an index to a position in a reference consensus sequence for all or a portion of a genome of the species, andpε
  
  {0, 1} in which label 0 assigns a respective variant call in A_i;
  
  pto H=0 and label 1 assigns the respective variant call to H=1;
  
  (B) for each respective local block of variant calls in A_i;
  
  pthat are localized to a corresponding subset of the reference consensus sequence, using a beam search over the haplotype assignments of local phasing vectors X_k, X_k+1, . . . , X_k+jin the respective local block of variant calls, whereink is the first variant in the respective local block of variant calls,j is a number of variant calls in the respective local block of variant calls,assignments of X_k, X_k+1, . . . , X_k+jare found by computing an objective function in which the phasing vector of the objective function in respective computations is limited to X_k, X_k+1, . . . , X_k+j, andthe objective function is calculated by matching observed sequence reads of the test nucleic acid sample against the respective local block of variant calls in A_i;
  
  p, thereby finding a phasing solution for each respective local block of variant calls in A_i;
  
  p; and
  
  (C) greedily joining, upon completion of the beam search for each respective local block of variant calls in A_i;
  
  p, neighboring local blocks of variant calls in A_i;
  
  pusing the phasing solution for each respective local block of variant calls thereby obtaining a phasing configuration {circumflex over (X)} for the single organism of the species.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
10X Genomics Incorporated
Original Assignee
10X Genomics Incorporated
Inventors
Kyriazopoulou-Panagiotopoulou, Sofia, Schnall-Levin, Michael, Zheng, Xinying, Jarosz, Mirna, Giorda, Kristina, Mudivarti, Patrice, Ordonez, Heather, Terry, Jessica, Heaton, William Haynes, Marks, Patrick, Saxonov, Serge

Granted Patent

US 10,854,315 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

A61P 3/00   Drugs for disorders of the ...

A61P 43/00   Drugs for specific purposes...

C12Q 1/6837   using probe arrays or probe...

C12Q 1/6869   Methods for sequencing

C12Q 2525/161   incorporating target specif...

C12Q 2537/165   Mathematical modelling, e.g...

G16B 20/00   ICT specially adapted for f...

G16B 20/20   Allele or variant detection...

G16B 30/00   ICT specially adapted for s...

G16B 30/10   Sequence alignment; Homolog...

SYSTEMS AND METHODS FOR DETERMINING STRUCTURAL VARIATION AND PHASING USING VARIANT CALL DATA

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

53 Claims

Specification

Solutions

Use Cases

Quick Links

SYSTEMS AND METHODS FOR DETERMINING STRUCTURAL VARIATION AND PHASING USING VARIANT CALL DATA

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

53 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links