Using dotplots for comparing and finding patterns in sequences of data points
First Claim
1. A method for identifying one or more patterns in a plurality of sequences, the method comprising:
- reading a set of sequential data with an analysis system, wherein the sequential data comprises a plurality of sequences, each of the plurality of sequences representing an ordered sequence of tokens;
generating, with the analysis system, a dotplot based on and representing the tokens and-matches between each sequence of the plurality sequences;
identifying, with the analysis system, tokens representing at least one pattern within the sequential data based on the dotplot using a line fitting technique identifying linear relationships between the tokens represented in the dotplot;
removing, with the analysis system, tokens representing the identified at least one pattern from the dotplot; and
repeating, by the analysis system, said identifying and said removing until the line fitting technique fails to identify any linear relationship between remaining tokens of the dotplot.
1 Assignment
0 Petitions
Accused Products
Abstract
Embodiments of the invention provide systems and methods for analyzing sequential data. The sequential data can comprise a sequence of data points arranged in a particular order and thus representing a sequence. A number of such sequences can be analyzed, for example, to identify patterns or commonalities within the sequences or portions of sequences represented by the data. According to one embodiment, a method of identifying patterns in sequences of data points can comprise reading a set of sequential data. The sequential data can comprises a plurality of sequences and each of the plurality of sequences can represent an ordered sequence of tokens. A dotplot representing matches between each sequence of the plurality sequences can be generated. One or more patterns within the sequential data can then be identified based on the dotplot.
27 Citations
20 Claims
-
1. A method for identifying one or more patterns in a plurality of sequences, the method comprising:
-
reading a set of sequential data with an analysis system, wherein the sequential data comprises a plurality of sequences, each of the plurality of sequences representing an ordered sequence of tokens; generating, with the analysis system, a dotplot based on and representing the tokens and-matches between each sequence of the plurality sequences; identifying, with the analysis system, tokens representing at least one pattern within the sequential data based on the dotplot using a line fitting technique identifying linear relationships between the tokens represented in the dotplot; removing, with the analysis system, tokens representing the identified at least one pattern from the dotplot; and repeating, by the analysis system, said identifying and said removing until the line fitting technique fails to identify any linear relationship between remaining tokens of the dotplot. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system for identifying one or more patterns in a plurality of sequences, the system comprising:
-
a processor; and a memory communicatively coupled with and readable by the processor and having stored therein a series of instructions which, when executed by the processor, cause the processor to read a set of sequential data, wherein the sequential data comprises a plurality of sequences, each of the plurality of sequences representing an ordered sequence of tokens, generate a dotplot based on and representing the tokens and matches between each sequence of the plurality sequences, identify tokens representing at least one pattern within the sequential data based on the dotplot using a line fitting technique identifying linear relationships between the tokens represented in the dotplot, remove tokens representing the identified at least one pattern from the dotplot, and repeat said identifying and said removing until the line fitting technique fails to identify any linear relationship between remaining tokens of the dotplot. - View Dependent Claims (12, 13, 14, 15)
-
-
16. A machine-readable memory device comprising a set of instructions stored therein which, when executed by a processor, cause the processor to identify one or more patterns in a plurality of sequences by:
-
reading a set of sequential data, wherein the sequential data comprises a plurality of sequences, each of the plurality of sequences representing an ordered sequence of tokens; generating a dotplot based on and representing the tokens and-matches between each sequence of the plurality sequences; identifying tokens representing at least one pattern within the sequential data based on the dotplot using a line fitting technique identifying linear relationships between the tokens represented in the dotplot; removing tokens representing the identified at least one pattern from the dotplot; and repeating said identifying and said removing until the line fitting technique fails to identify any linear relationship between remaining tokens of the dotplot. - View Dependent Claims (17, 18, 19, 20)
-
Specification