×

Method and apparatus for fast similarity-based query, self-join, and join for massive, high-dimension datasets

  • US 8,117,213 B1
  • Filed: 10/30/2009
  • Issued: 02/14/2012
  • Est. Priority Date: 06/27/2006
  • Status: Active Grant
First Claim
Patent Images

1. A computer implemented method comprising:

  • (a) inputting in said computer a dataset consisting of vectors from an inner product space S, for which a similarity index has been built;

    (b) inputting in said computer a query q, where q is a vector from S;

    (c) inputting in said computer a desired scalar similarity threshold s;

    (d) setting a current node to a root of said similarity index;

    (e) determining if said current node is a leaf or an interior node; and

    (f) if leaf then(f1) computing a similarity between q and each item in said leaf node; and

    (f2) returning one or more of said each item that meets said desired similarity threshold s, said one or more of said each item being stored in hardware on said computer and displayed on a display for a user;

    (g) if interior node then(g1) obtaining a splitter (vsplit, p.split) from said interior node, where vsplit is a vector and p.split is a scalar; and

    (g2) computing r=<

    q−

    vsplit, vsplit>

    , where <

    >

    denotes inner product;

    (h) determining if r−

    p.split>

    0; and

    (i) if yes then(i1) setting said current node to be a “

    upper”

    child node; and

    (i2) resuming at (e);

    (j) if no then(j1) setting said current node to be a “

    lower”

    child node; and

    (j2) resuming at (e).

View all claims
  • 0 Assignments
Timeline View
Assignment View
    ×
    ×