Method and apparatus for fast similarity-based query, self-join, and join for massive, high-dimension datasets
First Claim
Patent Images
1. A computer implemented method comprising:
- (a) inputting in said computer a dataset consisting of vectors from an inner product space S, for which a similarity index has been built;
(b) inputting in said computer a query q, where q is a vector from S;
(c) inputting in said computer a desired scalar similarity threshold s;
(d) setting a current node to a root of said similarity index;
(e) determining if said current node is a leaf or an interior node; and
(f) if leaf then(f1) computing a similarity between q and each item in said leaf node; and
(f2) returning one or more of said each item that meets said desired similarity threshold s, said one or more of said each item being stored in hardware on said computer and displayed on a display for a user;
(g) if interior node then(g1) obtaining a splitter (vsplit, p.split) from said interior node, where vsplit is a vector and p.split is a scalar; and
(g2) computing r=<
q−
vsplit, vsplit>
, where <
>
denotes inner product;
(h) determining if r−
p.split>
0; and
(i) if yes then(i1) setting said current node to be a “
upper”
child node; and
(i2) resuming at (e);
(j) if no then(j1) setting said current node to be a “
lower”
child node; and
(j2) resuming at (e).
0 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus for fast similarity-based query, self-join, and join for massive, high-dimension datasets have been disclosed.
-
Citations
12 Claims
-
1. A computer implemented method comprising:
-
(a) inputting in said computer a dataset consisting of vectors from an inner product space S, for which a similarity index has been built; (b) inputting in said computer a query q, where q is a vector from S; (c) inputting in said computer a desired scalar similarity threshold s; (d) setting a current node to a root of said similarity index; (e) determining if said current node is a leaf or an interior node; and (f) if leaf then (f1) computing a similarity between q and each item in said leaf node; and (f2) returning one or more of said each item that meets said desired similarity threshold s, said one or more of said each item being stored in hardware on said computer and displayed on a display for a user; (g) if interior node then (g1) obtaining a splitter (vsplit, p.split) from said interior node, where vsplit is a vector and p.split is a scalar; and (g2) computing r=<
q−
vsplit, vsplit>
, where <
>
denotes inner product;(h) determining if r−
p.split>
0; and(i) if yes then (i1) setting said current node to be a “
upper”
child node; and(i2) resuming at (e); (j) if no then (j1) setting said current node to be a “
lower”
child node; and(j2) resuming at (e). - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A hardware based apparatus comprising:
-
(a) means for inputting in said hardware based apparatus a dataset consisting of vectors from an inner product space S, for which a similarity index has been built; (b) means for inputting in said hardware based apparatus a query q, where q is a vector from S; (c) means for inputting in said hardware based apparatus a desired scalar similarity threshold s; (d) means for setting a current node to a root of said similarity index; (e) means for determining if said current node is a leaf or an interior node; and (f) if leaf then (f1) means for computing a similarity between q and each item in said leaf node; and (f2) means for returning one or more of said each item that meets said desired similarity threshold s, said one or more of said each item being stored in a memory and said memory displayed on a display for a user; (g) if interior node then (g1) means for obtaining a splitter (vsplit, p.split) from said interior node, where vsplit is a vector and p.split is a scalar; and (g2) means for computing r=<
q−
vsplit, vsplit>
, where <
>
denotes inner product;(h) determining if r−
p.split>
0; and(i) if yes then (i1) means for setting said current node to be a “
upper”
child node; and(i2) means for resuming at (e); (j) if no then (j1) means for setting said current node to be a “
lower”
child node; and(j2) means for resuming at (e). - View Dependent Claims (8, 9, 10, 11, 12)
-
Specification