×

Method and Apparatus for fast similarity-based query, self-join, and join for massive, high-dimension datasets

  • US 20070299865A1
  • Filed: 06/24/2007
  • Published: 12/27/2007
  • Est. Priority Date: 06/27/2006
  • Status: Active Grant
First Claim
Patent Images

1. A computer implemented method comprising:

  • (a) specifying a maximum of leaf nodes as G items;

    (b) inputting n input content items and denoting said n input items as a current collection;

    (c) determining if n>

    G; and

    (d) if not then(d1) building a leaf node;

    (d2) filling said leaf node with said n input content items; and

    (d3) connecting a link from a parent to said leaf node capable of being stored in hardware on said computer and capable of being displayed to a user.(e) if yes then(e1) computing a vector sum over all items in said current collection, wherein said vector sum is vsplit=sum(i;

    x.i)/n;

    (e2) computing a vector difference for each item in said current collection, wherein said vector difference is d.i=x.i−

    vsplit;

    (e3) computing a scalar value for each item in said current collection, wherein said scalar value is p.i=<

    d.i, vsplit>

    , and making a collection of said computed scalar value for each item;

    (e4) determining if p.i<

    3; and

    (f) if not then(f1) removing a largest p.i from said collection(f2) removing a smallest p.i from said collection; and

    (f3) resuming at (e3).(g) if yes then(g1) determining if 1 or 2 computed values remain in said collection; and



    (h) if 1 then 

    (h1) letting p.split be said 1 computed remaining value; and



    (h2) resuming at (j);



    (i) if 2 then 

    (i1) letting p.split be an average of said 2 computed remaining values; and



    (i2) resuming at (1);

    (j) defining a splitter which consists of said vsplit and said p.split;

    (k) denoting for each of said content items in said current collection a designation of “

    upper”

    if p.i>

    p.split, otherwise a designation of “

    lower”

    ;

    (l) building an interior node, consisting of said splitter, and defining links to said “

    lower” and

    said “

    upper”

    nodes;

    (m) inputting said “

    lower”

    nodes as items into a new “

    lower”

    current collection, letting new “

    lower”

    n denote the number of items in said new “

    lower”

    current collection, replacing said current collection with said new “

    lower”

    current collection, replacing said n with said new “

    lower”

    n, and resuming at (c);

    (n) inputting said “

    upper”

    nodes as items into a new “

    upper”

    current collection, letting new “

    upper”

    n denote the number of items in said new “

    upper”

    current collection, replacing said current collection with said new “

    upper”

    current collection, replacing said n with said new “

    upper”

    n, and resuming at (c);

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×