×

Using dotplots for comparing and finding patterns in sequences of data points

  • US 8,463,733 B2
  • Filed: 11/10/2009
  • Issued: 06/11/2013
  • Est. Priority Date: 11/11/2008
  • Status: Active Grant
First Claim
Patent Images

1. A method for identifying one or more patterns in a plurality of sequences, the method comprising:

  • reading a set of sequential data with an analysis system, wherein the sequential data comprises a plurality of sequences, each of the plurality of sequences representing an ordered sequence of tokens;

    generating, with the analysis system, a dotplot based on the tokens and representing matches between each sequence of the plurality sequences; and

    identifying, with the analysis system, one or more patterns within the sequential data based on the dotplot by identifying linear relationships between the tokens and wherein identifying linear relationships between the tokens comprises;

    determining a dotplot sub-matrix plotting tokens from two sequences;

    identifying a set of points in the sub-matrix that corresponds to matching tokens in corresponding sub-sequences;

    filtering the identified points against a pre-determined high-pass threshold;

    fitting a linear regression line to the filtered points;

    computing variance criterion based on Euclidean distances between the regression line and the filtered points;

    filtering the filtered points to those within the variance criterion; and

    re-computing the linear regression line using the points within the variance criterion.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×