Method and apparatus for fundamental operations on token sequences: computing similarity, extracting term values, and searching efficiently
First Claim
Patent Images
1. A method comprising:
- inputting a vector space;
inputting a probability space; and
generating a similarity space.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and apparatus for fundamental operations on token sequences: computing similarity, extracting term values, and searching efficiently have been disclosed.
-
Citations
62 Claims
-
1. A method comprising:
-
inputting a vector space;
inputting a probability space; and
generating a similarity space. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. An apparatus for deriving a similarity measure comprising:
-
means for inputting an eigenspace analysis of a reference;
means for inputting a transition probability model of a target; and
means for operating on said eigenspace analysis and said transition probability model. - View Dependent Claims (11)
-
-
12. An apparatus comprising:
-
a first block having an input and an output, said first block input capable of receiving a vector space;
a second block having an input and an output, said second block input capable of receiving a probability space; and
a third block having a first input, a second input, and an output, said third block first input coupled to receive said first block output, said third block second input coupled to receive said second block output, and said third block output capable of communication a similarity space. - View Dependent Claims (13)
-
-
14. A method comprising:
generating a similarity metric based upon an eigenspace analysis and an n-gram model.
-
15. A method comprising:
-
receiving a profile;
receiving a matrix; and
generating a similarity indication between said profile and said matrix. - View Dependent Claims (16, 17, 18, 19, 20)
-
-
21. A method for generating a similarity space, the method comprising:
combining a vector space with a transition probability space.
-
22. A method comprising performing a mathematical operation using an eigenspace and transition probability matrix to generate a similarity index.
-
23. A method for generating a similarity score comprising:
-
receiving a profile having eigenvalues (w.i);
receiving a transition matrix (a);
generating a left eigenvector (u.i) for each said w.i;
generating a right eigenvector (v.i) for each said w.i;
generating a complex conjugate of said u.i; and
generating said similarity score according to a formula;
similarity score=sum(i, ∥
u.i.conjugate *a*v.i ∥
{circumflex over (
)}2),where each term of said summation is said transition matrix premultiplied by said complex conjugate of i-th said left eigenvector, and postmultiplied by i-th said right eigenvector and wherein norm squared ∥
.∥
{circumflex over (
)}2 is a square of magnitude of a complex number resulting from said i-th term in said sum. - View Dependent Claims (24, 25, 26)
-
-
27. An apparatus for generating a similarity measure comprising:
means for performing a computation on a target input and a profile. - View Dependent Claims (28, 29, 30)
- 31. A means for computing a similarity measure wherein said computing means is substantially O(log(n)) for n target inputs against a pre-computed profile.
-
34. A method comprising:
-
tokenizing a target input;
generating a transition probability matrix for said tokens;
operating on a profile and said transition probability matrix; and
generating a measure of similarity between said profile and said matrix. - View Dependent Claims (35, 36, 37)
-
-
38. A method for modeling comprising;
-
using a history window of h tokens to compose a tuple; and
tallying all words that fall within r tokens of said tuple wherein r is between r=1 (which is a Markov n-gram model), and r substantially approaching infinity (which is a word frequency model). - View Dependent Claims (39, 40, 41, 42)
-
- 43. A method for generating a similarity measure between a reference profile and a target input by performing an operation on an eigenvalue space representation of said reference profile and a transition probability model of said target input.
-
47. A method for determining a high similarity measure, the method comprising:
-
(a) pre-generating a fixed set of eigenspace profiles representing known references;
(b) generating a series of tokens representing clauses from a target input;
(c) dividing said series of tokens into two groups, group A and group B;
(d) generating a transition probability model for group A and group B;
(e) generating a similarity measure for group A versus said profiles, and for group B versus said profiles;
(f) retaining group A if it has a similarity measure equal to or higher than group B from (e), otherwise retaining group B; and
(g) define the retained group as said series of tokens and repeat (c) to (g) for a predetermined number of times. - View Dependent Claims (48, 49, 50, 51)
-
-
52. A method comprising:
-
receiving N text blocks;
building a binary tree representing indexes of said N text blocks;
receiving a T text block;
computing a transition probability matrix for said T text block; and
traversing said binary tree; and
finding a closest matching N text block for said T text block. - View Dependent Claims (53)
-
-
54. A method comprising:
-
receiving a T text block;
computing a profile for said T text block;
receiving N text blocks;
(a) shuffling randomly said N text blocks;
(b) dividing said shuffled randomly N text blocks into set A and set B;
(c) concatenating the text clocks in set A to form group A;
(d) concatenating the text clocks in set B to form group B;
(e) computing a transition probability matrix for said group A and for said group B;
(f) generating a similarity measure between said T text block and said group A and said group B;
(g) determining if group A or group B has a higher similarity measure;
(h) tallying an additional count for text blocks that are members of group A or group B having said determined higher similarity measure;
(i) repeating (a) through (h) R times;
(j) picking group A or group B with a highest count as a remaining group;
(k) using the remaining group now as said N text blocks; and
(l) repeating (a) through (k) K times. - View Dependent Claims (55, 56, 57, 58)
-
-
59. A method for determining a high similarity measure, the method comprising:
-
pre-generating one or more eigenspace profiles representing clauses from a known reference;
generating a series of tokens representing clauses from a target input;
(a) setting a counter n=0;
(b) setting counter n=n+1;
(c) dividing said series of tokens into two groups, group A(n) and group B(n);
(d) generating a transition probability model for group A(n) and group B(n);
(e) generating a similarity measure for group A(n) versus said profiles, and for group B(n) versus said profiles;
(f) awarding group A(n) a point if it has a similarity measure equal to or higher than group B(n) from (e), otherwise awarding group B(n) a point;
(g) shuffling said series of tokens representing clauses from said target input in substantially random order and repeat (b) to (g) for a predetermined number of times;
(h) picking those groups having a point and retaining tokens associated with said picked groups and defining said retained tokens as said series of tokens; and
(i) repeating (c) to (h) until said high similarity measure is determined. - View Dependent Claims (60, 61, 62)
-
Specification