Systems and methods for using paired-end data in directed acyclic structure
First Claim
Patent Images
1. A system for analyzing a transcriptome, the system comprising:
- a processor coupled to the memory, wherein the system is operable to;
obtain, from an annotated transcriptome database, a plurality of exons and introns from a genome;
use the processor to transform the plurality of exons and introns into a directed acyclic data structure comprising nodes representing known RNA sequences and edges connecting the nodes;
obtain a pair of paired-end reads generated by sequencing a transcriptome of an organism;
use the processor to transform the first read of the pair into an alignment with an optimal score between that first read of the pair and a node in the directed acyclic data structure;
identify, using the processor, candidate paths within the directed acyclic data structure that include the node connected to a downstream node by a path having a length substantially similar to an insert length of the pair of paired-end reads;
exclude non-candidate paths from alignments involving the pair of paired-end reads;
align, using the processor, the paired-end reads to the candidate paths to determine an optimal-scoring alignment by;
calculating match scores between a second read of the pair and nodes in the candidate paths, andlooking backwards to predecessor nodes in the candidate paths while not considering any nodes in the non-candidate paths to identify a back-trace through the candidate paths that gives an optimal score,wherein the back-trace that gives the optimal score corresponds to an optimal scoring alignment of the pair of paired-end reads to the candidate paths, andwherein the directed acyclic data structure held in the memory prior to obtaining the pair of paired-end reads includes at least one path that has a node that the second read of the pair aligns to but that is not included during the aligning step due to being excluded as a noncandidate path; and
identify an isoform of an RNA from the organism using the optimal scoring alignment of the paired-end reads.
10 Assignments
0 Petitions
Accused Products
Abstract
Methods of analyzing a transcriptome that involves obtaining at least one pair of paired-end reads from a transcriptome from an organism, finding an alignment with an optimal score between a first read of the pair and a node in a directed acyclic data structure (the data structure has nodes representing RNA sequences such as exons or transcripts and edges connecting pairs of nodes), identifying candidate paths that include the node connected to a downstream node by a path having a length substantially similar to an insert length of the pair of paired-end reads, and aligning the paired-end rends to the candidate paths to determine an optimal-scoring alignment.
-
Citations
20 Claims
-
1. A system for analyzing a transcriptome, the system comprising:
- a processor coupled to the memory, wherein the system is operable to;
obtain, from an annotated transcriptome database, a plurality of exons and introns from a genome; use the processor to transform the plurality of exons and introns into a directed acyclic data structure comprising nodes representing known RNA sequences and edges connecting the nodes; obtain a pair of paired-end reads generated by sequencing a transcriptome of an organism; use the processor to transform the first read of the pair into an alignment with an optimal score between that first read of the pair and a node in the directed acyclic data structure; identify, using the processor, candidate paths within the directed acyclic data structure that include the node connected to a downstream node by a path having a length substantially similar to an insert length of the pair of paired-end reads; exclude non-candidate paths from alignments involving the pair of paired-end reads; align, using the processor, the paired-end reads to the candidate paths to determine an optimal-scoring alignment by; calculating match scores between a second read of the pair and nodes in the candidate paths, and looking backwards to predecessor nodes in the candidate paths while not considering any nodes in the non-candidate paths to identify a back-trace through the candidate paths that gives an optimal score, wherein the back-trace that gives the optimal score corresponds to an optimal scoring alignment of the pair of paired-end reads to the candidate paths, and wherein the directed acyclic data structure held in the memory prior to obtaining the pair of paired-end reads includes at least one path that has a node that the second read of the pair aligns to but that is not included during the aligning step due to being excluded as a noncandidate path; and identify an isoform of an RNA from the organism using the optimal scoring alignment of the paired-end reads. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- a processor coupled to the memory, wherein the system is operable to;
-
10. A system for analyzing a transcriptome, the system comprising a processor coupled to a memory and operable to:
-
obtain a pair of paired-end reads from a transcriptome; find an alignment with an optimal score between a first read of the pair and a node in a directed acyclic data structure, the data structure comprising nodes representing RNA sequences and edges connecting pairs of the nodes, identify candidate paths that include the node connected to a downstream node by a path having a length substantially similar to an insert length of the pair of paired-end reads; exclude any paths that are not candidate paths from any alignment calculations involving the pair of paired-end reads; and align the paired-end reads to the candidate paths to determine an optimal-scoring alignment by; calculating match scores between a second read of the pair and nodes in the candidate paths, and looking backwards to predecessor nodes in the candidate paths while not considering any nodes in the non-candidate paths to identify a back-trace through the candidate paths that gives an optimal score, wherein the back-trace that gives the optimal score corresponds to the optimal scoring alignment of the pair of paired-end reads to the candidate paths, and wherein the directed acyclic data structure held in the memory prior to obtaining the pair of paired-end reads includes at least one path that had a node that the second reads of the pair aligns to but is not included during the aligning step due to being excluded as a non-candidate path. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification