Method for identifying sub-sequences of interest in a sequence
First Claim
1. A method for identifying a sequence of interest, comprising the steps of:
- analyzing a data series based on a grammar comprising at least an initial grammar;
calculating a statistical heuristic for each sub-sequence of the analyzed data series;
comparing a selected statistical heuristic with one or more reference conditions;
updating the grammar and the data series with a symbol representing a sequence corresponding to the selected statistical heuristic based upon a non-termination result of the comparison; and
identifying the sequence as a sequence of interest based upon a termination result of the comparison.
1 Assignment
0 Petitions
Accused Products
Abstract
The present technique provides for the analysis of a data series to identify sequences of interest within the series. The analysis may be used to iteratively update a grammar used to analyze the data series or updated versions of the data series. Furthermore, the technique provides for the calculation of a minimum description length heuristic, such as a symbol compression ratio, for each sub-sequence of the analyzed data sequence. The technique may then compare a selected heuristic value against one or more reference conditions to determine if additional iteration is to be performed. The grammar and the data sequence may be updated between iterations to include a symbol representing a string corresponding to the selected heuristic value based upon a non-termination result of the comparison. Alternatively, the string corresponding to the selected heuristic value may be identified as a sequence of interest based upon a termination result of the comparison.
-
Citations
26 Claims
-
1. A method for identifying a sequence of interest, comprising the steps of:
-
analyzing a data series based on a grammar comprising at least an initial grammar;
calculating a statistical heuristic for each sub-sequence of the analyzed data series;
comparing a selected statistical heuristic with one or more reference conditions;
updating the grammar and the data series with a symbol representing a sequence corresponding to the selected statistical heuristic based upon a non-termination result of the comparison; and
identifying the sequence as a sequence of interest based upon a termination result of the comparison. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A tangible, machine-readable media, comprising:
-
code adapted to analyze a data series based on a grammar comprising at least an initial grammar;
code adapted to calculate a statistical heuristic for each sub-sequence of the analyzed data series;
code adapted to compare a selected statistical heuristic with one or more reference conditions;
code adapted to update the grammar and the data series with a symbol representing a sequence corresponding to the selected statistical heuristic based upon a non-termination result of the comparison; and
code adapted to identify the sequence as a sequence of interest based upon a termination result of the comparison. - View Dependent Claims (10, 11, 12, 13, 14)
-
-
15. A method for processing a data series, the method comprising:
-
specifying a data series for analysis;
executing one or more routines configured to analyze the data series based on minimum description length principles; and
obtaining the analyzed data series comprising at least one sequence of interest. - View Dependent Claims (16, 17, 18, 19)
-
-
20. A method for identifying a biological sequence of interest, the method comprising:
-
analyzing a biological polymer sequence based on a grammar comprising at least an initial grammar;
calculating a minimum description length heuristic for each sub-sequence of the analyzed biological polymer sequence;
comparing a selected minimum description length heuristic with one or more reference conditions;
updating the grammar and the biological polymer sequence with a symbol representing a sub-sequence corresponding to the selected minimum description length heuristic based upon a non-termination result of the comparison; and
identifying the sub-sequence as a biological sequence of interest based upon a termination result of the comparison. - View Dependent Claims (21, 22, 23)
-
-
24. A tangible, machine-readable media, comprising:
-
code adapted to analyze a biological polymer sequence based on a grammar comprising at least an initial grammar;
code adapted to calculate a minimum description length heuristic for each sub-sequence of the analyzed biological polymer sequence;
code adapted to compare a selected minimum description length heuristic with one or more reference conditions;
code adapted to update the grammar and the biological polymer sequence with a symbol representing a sub-sequence corresponding to the selected minimum description length heuristic based upon a non-termination result of the comparison; and
code adapted to identify the sub-sequence as a biological sequence of interest based upon a termination result of the comparison. - View Dependent Claims (25, 26)
-
Specification