×

Systems and methods for using paired-end data in directed acyclic structure

  • US 10,055,539 B2
  • Filed: 07/14/2015
  • Issued: 08/21/2018
  • Est. Priority Date: 10/21/2013
  • Status: Active Grant
First Claim
Patent Images

1. A system for analyzing a transcriptome, the system comprising:

  • a processor coupled to the memory, wherein the system is operable to;

    obtain, from an annotated transcriptome database, a plurality of exons and introns from a genome;

    use the processor to transform the plurality of exons and introns into a directed acyclic data structure comprising nodes representing known RNA sequences and edges connecting the nodes;

    obtain a pair of paired-end reads generated by sequencing a transcriptome of an organism;

    use the processor to transform the first read of the pair into an alignment with an optimal score between that first read of the pair and a node in the directed acyclic data structure;

    identify, using the processor, candidate paths within the directed acyclic data structure that include the node connected to a downstream node by a path having a length substantially similar to an insert length of the pair of paired-end reads;

    exclude non-candidate paths from alignments involving the pair of paired-end reads;

    align, using the processor, the paired-end reads to the candidate paths to determine an optimal-scoring alignment by;

    calculating match scores between a second read of the pair and nodes in the candidate paths, andlooking backwards to predecessor nodes in the candidate paths while not considering any nodes in the non-candidate paths to identify a back-trace through the candidate paths that gives an optimal score,wherein the back-trace that gives the optimal score corresponds to an optimal scoring alignment of the pair of paired-end reads to the candidate paths, andwherein the directed acyclic data structure held in the memory prior to obtaining the pair of paired-end reads includes at least one path that has a node that the second read of the pair aligns to but that is not included during the aligning step due to being excluded as a noncandidate path; and

    identify an isoform of an RNA from the organism using the optimal scoring alignment of the paired-end reads.

View all claims
  • 10 Assignments
Timeline View
Assignment View
    ×
    ×