Method and apparatus for efficient indexed storage for unstructured content
First Claim
Patent Images
1. A method comprising:
- (a) inputting a collection of n vectors x.i;
(b) computing a mean x.avg=(1/n)*sum(i, x.i);
(c) if x.avg is non-zero then;
(c1) computing for each i a deviation d.i=<
x.i−
x.avg, x.avg>
;
(d) if x.avg is zero then;
(d1) picking a specific x.i that is not zero and denoting it x.pvg;
(d2) computing for each i a deviation d.i=<
x.i−
x.pvg, x.pvg>
;
(e) initializing D to the collection of values d.i; and
(f if D contains more than two elements and the smallest and largest members of D are different then;
(f1) removing the smallest and largest values from the collection D;
(f2) repeating (f)-(f1);
(g) if D contains one element then;
(g1) outputting said D one element as a split value;
(h) if D contains two elements then;
(h1) computing an average of the deviations corresponding to said D two elements;
(h2) outputting said average of the deviations corresponding to said D two elements as a split value.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and apparatus for efficient indexed storage for unstructured content have been disclosed.
36 Citations
20 Claims
-
1. A method comprising:
-
(a) inputting a collection of n vectors x.i;
(b) computing a mean x.avg=(1/n)*sum(i, x.i);
(c) if x.avg is non-zero then;
(c1) computing for each i a deviation d.i=<
x.i−
x.avg, x.avg>
;
(d) if x.avg is zero then;
(d1) picking a specific x.i that is not zero and denoting it x.pvg;
(d2) computing for each i a deviation d.i=<
x.i−
x.pvg, x.pvg>
;
(e) initializing D to the collection of values d.i; and
(f if D contains more than two elements and the smallest and largest members of D are different then;
(f1) removing the smallest and largest values from the collection D;
(f2) repeating (f)-(f1);
(g) if D contains one element then;
(g1) outputting said D one element as a split value;
(h) if D contains two elements then;
(h1) computing an average of the deviations corresponding to said D two elements;
(h2) outputting said average of the deviations corresponding to said D two elements as a split value. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. An apparatus comprising:
-
(a) means for inputting a collection of n vectors x.i;
(b) means for computing a mean x.avg=(1/n)*sum(i, x.i);
(c) if x.avg is non-zero then;
(c1) means for computing for each i a deviation d.i=<
x.i−
x.avg, x.avg>
;
(d) if x.avg is zero then;
(d1) means for picking a specific x.i that is not zero and denoting it x.pvg;
(d2) means for computing for each i a deviation d.i=<
x.i−
x.pvg, x.pvg>
;
(e) means for initializing D to the collection of values d.i; and
(f if D contains more than two elements and the smallest and largest members of D are different then;
(f1) means for removing the smallest and largest values from the collection D;
(f2) means for repeating (f)-(f1);
(g) if D contains one element then;
(g1) means for outputting said D one element as a split value;
(h) if D contains two elements then;
(h1) means for computing an average of the deviations corresponding to said D two elements;
(h2) means for outputting said average of the deviations corresponding to said D two elements as a split value.
-
-
8. A method comprising:
-
(a) inputting a node N;
(b) inputting a query vector q;
(c) if N is a leaf node then;
(c1) computing a distance e.i=∥
q−
x.i∥
for each vector x.i contained in node N; and
(c2) returning a element with the smallest distance e.i;
(d) else;
(d1) obtaining a splitter S from node N;
(d2) computing d=<
q−
S.avg, S.avg>
;
(e) if d>
S.split then;
(e1) returning (N.upper, q);
(g) else;
(e1) returning (N.lower, q);
- View Dependent Claims (9, 10, 11)
-
-
12. A method comprising:
-
(a) inputting a node N;
(b) inputting a query vector q;
(c) inputting a distance h;
(d) if N is a leaf node then;
(d1) computing a distance e.i=∥
q−
x.i∥
for each vector x.i contained in node N; and
(d2) returning all elements with a distance e.i<
h;
(d) else;
(d1) obtaining a splitter S from node N;
(d2) computing d=<
q−
S.avg, S.avg>
;
(d3) computing distance from q to hyperplane=sqrt(|S.split−
d|);
(e) if h>
sqrt(|S.split−
d|) then;
(e1) computing SearchFiniteNeighborhood (N.upper, q, h) by completing steps (a)-(x) wherein N at (a) is N.upper;
(e2) computing SearchFiniteNeighborhood (N.lower, q, h) by completing steps (a)-(g) wherein N at (a) is N.lower;
(e3) computing the union of results (e1) and (e2) and returning said union of results;
(f) else if d>
S.split then;
(f1) returning results from SearchFiniteNeighborhood(N.upper, q, h);
(g) else;
(f1) returning results from SearchFiniteNeighborhood(N. lower, q, h). - View Dependent Claims (13, 14)
-
-
15. An apparatus comprising:
-
(a) means for inputting a node N;
(b) means for inputting a query vector q;
(c) means for inputting a distance h;
(d) if N is a leaf node then;
(d1) means for computing a distance e.i=∥
q−
x.i∥
for each vector x.i contained in node N; and
(d2) means for returning all elements with a distance e.i<
h;
(d) else;
(d1) means for obtaining a splitter S from node N;
(d2) means for computing d=<
q−
S.avg, S.avg>
;
(d3) means for computing distance from q to hyperplane=sqrt(|S.split−
d|);
(e) if h>
sqrt(|S.split−
d|) then;
(e1) means for computing SearchFiniteNeighborhood (N.upper, q, h) by completing steps (a)-(x) wherein N at (a) is N.upper;
(e2) means for computing SearchFiniteNeighborhood (N.lower, q, h) by completing steps (a)-(g) wherein N at (a) is N.lower;
(e3) means for computing the union of results (e1) and (e2) and returning said union of results;
(f) else if d>
S.split then;
(f1) means for returning results from SearchFiniteNeighborhood(N.upper, q, h);
(g) else;
(f1) means for returning results from SearchFiniteNeighborhood(N. lower, q, h).
-
-
16. An apparatus comprising:
-
means for using a plurality of inner products to build an indexed structure; and
means for finding a most similar item to one in said indexed structure. - View Dependent Claims (17, 18, 19, 20)
-
Specification