Method and apparatus for learning, recognizing and generalizing sequences
2 Assignments
0 Petitions
Accused Products
Abstract
A method of generalizing a dataset having a plurality of sequences defined over a lexicon of tokens is provided. The method comprises: searching over the dataset for similarity sets, where each similarity set comprises a plurality of segments of size L having L−S common tokens and S uncommon tokens; and defining a plurality of equivalence classes corresponding to uncommon tokens of at least one similarity set. The method may further comprise a step in which a plurality of significant patterns are extracted, where each significant pattern corresponds to a most significant partial overlap between one sequence of the dataset and other sequences of the dataset. In one embodiment, a generalized dataset represented by a graph or a forest is constructed, and can be realized as a context-free grammar. The graph or forest can be used for generating sequences and/or testing grammatical structures.
80 Citations
194 Claims
-
1-164. -164. (canceled)
-
165. A method of extracting significant patterns from a dataset having a plurality of sequences defined over a lexicon of tokens, the method comprising, for each sequence of the plurality of sequences:
- searching for partial overlaps between said sequence and other sequences of the dataset, applying a significance test on said partial overlaps, and defining a most significant partial overlap as a significant pattern of said sequence, thereby extracting significant patterns from the dataset.
- View Dependent Claims (166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180)
-
181. A method of generalizing a dataset having a plurality of sequences defined over a lexicon of tokens, the method comprising:
-
searching over the dataset for similarity sets, each similarity set comprising a plurality of segments of size L having L−
S common tokens and S uncommon tokens, each of said plurality of segments being a portion of a different sequence of the dataset; and
defining a plurality of equivalence classes corresponding to uncommon tokens of at least one similarity set, thereby generalizing the dataset. - View Dependent Claims (182, 183, 184, 185, 186)
-
-
187. An apparatus for generalizing a dataset having a plurality of sequences defined over a lexicon of tokens, the apparatus comprising:
-
(a) a searcher, for searching over the dataset for similarity sets, each similarity set comprising a plurality of segments of size L having L−
S common tokens and S uncommon tokens, each of said plurality of segments being a portion of a different sequence of the dataset; and
(b) a definition unit, for defining a plurality of equivalence classes corresponding to uncommon tokens of at least one similarity set, thereby generalizing the dataset. - View Dependent Claims (188, 189, 190, 191, 192, 193, 194)
-
Specification