Method and apparatus for fast similarity-based query, self-join, and join for massive, high-dimension datasets

US 8,117,213 B1
Filed: 10/30/2009
Issued: 02/14/2012
Est. Priority Date: 06/27/2006
Status: Active Grant

First Claim

Patent Images

1. A computer implemented method comprising:

(a) inputting in said computer a dataset consisting of vectors from an inner product space S, for which a similarity index has been built;

(b) inputting in said computer a query q, where q is a vector from S;

(c) inputting in said computer a desired scalar similarity threshold s;

(d) setting a current node to a root of said similarity index;

(e) determining if said current node is a leaf or an interior node; and

(f) if leaf then(f1) computing a similarity between q and each item in said leaf node; and

(f2) returning one or more of said each item that meets said desired similarity threshold s, said one or more of said each item being stored in hardware on said computer and displayed on a display for a user;

(g) if interior node then(g1) obtaining a splitter (vsplit, p.split) from said interior node, where vsplit is a vector and p.split is a scalar; and

(g2) computing r=<

q−

vsplit, vsplit>

, where <

>

denotes inner product;

(h) determining if r−

p.split>

0; and

(i) if yes then(i1) setting said current node to be a “

upper”

child node; and

(i2) resuming at (e);

(j) if no then(j1) setting said current node to be a “

lower”

child node; and

(j2) resuming at (e).

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and apparatus for fast similarity-based query, self-join, and join for massive, high-dimension datasets have been disclosed.

Citations

12 Claims

1. A computer implemented method comprising:
- (a) inputting in said computer a dataset consisting of vectors from an inner product space S, for which a similarity index has been built;
  
  (b) inputting in said computer a query q, where q is a vector from S;
  
  (c) inputting in said computer a desired scalar similarity threshold s;
  
  (d) setting a current node to a root of said similarity index;
  
  (e) determining if said current node is a leaf or an interior node; and
  
  (f) if leaf then(f1) computing a similarity between q and each item in said leaf node; and
  
  (f2) returning one or more of said each item that meets said desired similarity threshold s, said one or more of said each item being stored in hardware on said computer and displayed on a display for a user;
  
  (g) if interior node then(g1) obtaining a splitter (vsplit, p.split) from said interior node, where vsplit is a vector and p.split is a scalar; and
  
  (g2) computing r=<
  
  q−
  
  vsplit, vsplit>
  
  , where <
  
  >
  
  denotes inner product;
  
  (h) determining if r−
  
  p.split>
  
  0; and
  
  (i) if yes then(i1) setting said current node to be a “
  
  upper”
  
  child node; and
  
  (i2) resuming at (e);
  
  (j) if no then(j1) setting said current node to be a “
  
  lower”
  
  child node; and
  
  (j2) resuming at (e).
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, wherein the vectors in the inner product space S have finite dimension.
  - 3. The method of claim 1, wherein the vectors in the inner product space S are scalar valued functions over a Euclidean space.
  - 4. The method of claim 1, wherein the vectors in the inner product space S are vector valued functions over a Euclidean space.
  - 5. The method of claim 1, wherein the vectors in the inner product space S are token sequence vectors.
  - 6. The method of claim 1, wherein the vectors in the inner product space S are term document vectors.

7. A hardware based apparatus comprising:
- (a) means for inputting in said hardware based apparatus a dataset consisting of vectors from an inner product space S, for which a similarity index has been built;
  
  (b) means for inputting in said hardware based apparatus a query q, where q is a vector from S;
  
  (c) means for inputting in said hardware based apparatus a desired scalar similarity threshold s;
  
  (d) means for setting a current node to a root of said similarity index;
  
  (e) means for determining if said current node is a leaf or an interior node; and
  
  (f) if leaf then(f1) means for computing a similarity between q and each item in said leaf node; and
  
  (f2) means for returning one or more of said each item that meets said desired similarity threshold s, said one or more of said each item being stored in a memory and said memory displayed on a display for a user;
  
  (g) if interior node then(g1) means for obtaining a splitter (vsplit, p.split) from said interior node, where vsplit is a vector and p.split is a scalar; and
  
  (g2) means for computing r=<
  
  q−
  
  vsplit, vsplit>
  
  , where <
  
  >
  
  denotes inner product;
  
  (h) determining if r−
  
  p.split>
  
  0; and
  
  (i) if yes then(i1) means for setting said current node to be a “
  
  upper”
  
  child node; and
  
  (i2) means for resuming at (e);
  
  (j) if no then(j1) means for setting said current node to be a “
  
  lower”
  
  child node; and
  
  (j2) means for resuming at (e).
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The hardware based apparatus of claim 7, wherein the vectors in the inner product space S have finite dimension.
  - 9. The hardware based apparatus of claim 7, wherein the vectors in the inner product space S are scalar valued functions over a Euclidean space.
  - 10. The hardware based apparatus of claim 7, wherein the vectors in the inner product space S are vector valued functions over a Euclidean space.
  - 11. The hardware based apparatus of claim 7, wherein the vectors in the inner product space S are token sequence vectors.
  - 12. The hardware based apparatus of claim 7, wherein the vectors in the inner product space S are term document vectors.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nahava, Inc.
Original Assignee
Nahava, Inc.
Inventors
Nakano, Russell Toshio, Cheng, Stanley
Primary Examiner(s)
Vo, Tim T
Assistant Examiner(s)
GORTAYO, DANGELINO N

Application Number

US12/609,910
Time in Patent Office

837 Days
Field of Search

707/1, 707/3, 707/100, 707/104, 707/713, 707/715, 707/741, 707/749, 707/773, 707/714
US Class Current

707/749
CPC Class Codes

G06F 16/2264 Multidimensional index stru...

Method and apparatus for fast similarity-based query, self-join, and join for massive, high-dimension datasets

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

12 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for fast similarity-based query, self-join, and join for massive, high-dimension datasets

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

12 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links