IDENTIFYING REARRANGEMENTS IN A SEQUENCED GENOME
First Claim
1. A method of determining whether a junction exists between a sample genome and a reference genome, the sample genome being of an organism providing a biological sample, the method comprising:
- receiving results of paired-end sequencing of a plurality of fragments from the biological sample, the results including mate pairs of fragments and mappings of the mate pairs to the reference genome, wherein a mate pair includes a first arm read for a first end of a fragment and a corresponding arm read of an opposite end of the fragment;
identifying a junction region in the sample genome based on the mappings of the mate pairs to the reference genome, the junction region including;
a first edge portion including a first edge of the junction region;
a second edge portion including a second edge of the junction region, the first edge opposite the second edge; and
a potential junction between the first edge and the second edge;
identifying a first set of first arm reads, each at least partially mapping to the first edge portion or having a non-negligible probability to at least partially map to the first edge portion based on a mapped location of the respective corresponding arm read; and
comparing the sequences of the first arm reads of the first set to each other to determine whether a junction exists in the junction region.
1 Assignment
0 Petitions
Accused Products
Abstract
Methods, apparatuses, and systems for identification of junctions (e.g., resulting from large-scale rearrangements) of a sequenced genome with respect to a human genome reference sequence is provided. For example, false positives can be distinguished from actual junctions. Such false positives can result from many sources, including mismapping, chimeric reactions among the DNA of a sample, and problems with the reference genome. As part of the filtering processes, a base pair resolution (or near base pair resolution) of a junction can be provided. In various implementations, junctions can be identified using discordant mate pairs and/or using a statistical analysis of the length distributions of fragments for local regions of the sample genome. Clinically significant junctions can also be identified so that further analysis can be focused on genomic regions that may have more of an impact on the health of a patient.
22 Citations
33 Claims
-
1. A method of determining whether a junction exists between a sample genome and a reference genome, the sample genome being of an organism providing a biological sample, the method comprising:
-
receiving results of paired-end sequencing of a plurality of fragments from the biological sample, the results including mate pairs of fragments and mappings of the mate pairs to the reference genome, wherein a mate pair includes a first arm read for a first end of a fragment and a corresponding arm read of an opposite end of the fragment; identifying a junction region in the sample genome based on the mappings of the mate pairs to the reference genome, the junction region including; a first edge portion including a first edge of the junction region; a second edge portion including a second edge of the junction region, the first edge opposite the second edge; and a potential junction between the first edge and the second edge; identifying a first set of first arm reads, each at least partially mapping to the first edge portion or having a non-negligible probability to at least partially map to the first edge portion based on a mapped location of the respective corresponding arm read; and comparing the sequences of the first arm reads of the first set to each other to determine whether a junction exists in the junction region. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
-
-
26. A method of determining whether a clinically significant junction exists between a sample genome and a reference genome, the sample genome being of an organism providing a biological sample, the method comprising:
-
receiving results of paired-end sequencing of a plurality of fragments from the biological sample, the results including mate pairs of fragments and mappings of the mate pairs to the reference genome, wherein a mate pair includes a first arm read for a first end of a fragment and a corresponding arm read of an opposite end of the fragment; determining a plurality of discordant mate pairs; determining a plurality of potential junctions based on the discordant mate pairs; obtaining a list of junctions that have appeared in other sample genomes; for each of the potential junctions; determining whether the potential junction is on the list; and determining whether or not the potential junction is a clinically significant junction based at least on whether the potential junction is on the list, wherein a potential junction that is on the list is less likely to be a clinically significant junction. - View Dependent Claims (27, 28, 29, 30)
-
-
31. A method of determining whether a junction exists between a sample genome and a reference genome, the sample genome being of an organism providing a biological sample, the method comprising:
-
receiving results of paired-end sequencing of a plurality of fragments from the biological sample, the results including mate pairs of fragments and mappings of the mate pairs to the reference genome, wherein a mate pair includes a first arm read for a first end of a fragment and a corresponding arm read of an opposite end of the fragment; determining a plurality of discordant mate pairs based on the mapping results; clustering the discordant mate pairs based on locations of the first arms reads and of the corresponding arm reads; for a plurality of the discordant mate pairs of a first cluster, attempting to perform a realignment to the reference genome of each arm of a discordant mate pair, the realignment of an arm being in a region determined from a length distribution of the fragments; determining an amount of the plurality of discordant mate pairs of the first cluster that are realigned in a concordant manner; and determining that a junction does not exist for the first cluster if the amount is greater than a threshold. - View Dependent Claims (32, 33)
-
Specification