×

Systems and methods for genomic pattern analysis

  • US 10,192,026 B2
  • Filed: 03/04/2016
  • Issued: 01/29/2019
  • Est. Priority Date: 03/05/2015
  • Status: Active Grant
First Claim
Patent Images

1. A method for analyzing a genetic sequence, the method comprising:

  • obtaining a reference graph representing a genomic sequence and known variation in the genomic sequence, in which substrings of the genomic sequence and known variation are stored in objects connected to one another to form a plurality of paths through the graph, wherein at least one path through the graph represents substantially an entire chromosome;

    identifying a data string for each path of the plurality of paths through the graph, each data string representing a concatenation of the substrings of genomic sequence and known variation in the genomic sequence stored in objects through the path;

    for each data string;

    identifying a plurality of k-mers in the data string; and

    listing each identified k-mer'"'"'s location within the graph in an entry in a search index, wherein that entry is indexed according to a hash of that k-mer and contains locations of all k-mers having that index;

    obtaining a query sequence;

    identifying a plurality of query k-mers from the query sequence;

    determining the locations of at least one query k-mer within the graph by reading search index entries indexed according to hashes of query k-mers; and

    identifying portions of the graph in which a number of potential matches with different query k-mers is equal to or exceeds a threshold number as candidate targets within the graph for alignment of segments of the query sequence.

View all claims
  • 13 Assignments
Timeline View
Assignment View
    ×
    ×