×

Methods and systems for aligning sequences

  • US 9,898,575 B2
  • Filed: 09/03/2013
  • Issued: 02/20/2018
  • Est. Priority Date: 08/21/2013
  • Status: Active Grant
First Claim
Patent Images

1. A system for aligning a plurality of sequence reads, the system comprising:

  • a processor; and

    a tangible, non-transitory memory storing a plurality of sequence reads from a subject, and a reference directed acyclic graph (DAG) representing a reference sequence and genetic variation of the reference sequence, wherein the reference DAG comprises a plurality of nodes, wherein each node is stored in the memory as a string of one or more symbols and a set of parent nodes, the string representing a sequence and the set of parent nodes defining the node'"'"'s position with respect to other nodes in the reference DAG;

    wherein the plurality of nodes further comprises a first node, a second node, and a third node, the third node having the first node and the second node as parent nodes, and wherein each of the first node, second node, and third node comprise different strings comprising a plurality of symbols;

    wherein the memory further comprises instructions that, when executed, cause the processor to align a sequence read of the plurality of sequence reads to the reference DAG by;

    creating, in the memory, a matrix for each respective node of the reference DAG, the creating comprising creating for each of the first node, the second node, and the third node, a matrix representing a comparison between the sequence read and a string associated with its respective node;

    for each created matrix, scoring overlaps between the sequence read and the string associated with its respective node, the scoring comprising calculating a score for each entry of the matrix based at least in part on whether a symbol of the sequence read matches a corresponding symbol of the string associated with its respective node;

    wherein calculating a score further comprises determining highest scores from matrices from parent nodes if and only if a matrix entry comprises the first symbol of the string associated with its node, such that scores for matrix entries for the first symbol of the string associated with the third node comprise a highest score from the matrix of the first node and the matrix of the second node;

    identifying a highest overall score in the calculated scores, the highest overall score having a position in one of the created matrices;

    backtracking from the position of the highest overall score to produce an actual match for the sequence read to the reference DAG, thereby aligning the sequence read to the reference DAG; and

    writing a file to the memory corresponding to the actual match for the sequence read.

View all claims
  • 12 Assignments
Timeline View
Assignment View
    ×
    ×