Method and apparatus for efficient indexed storage for unstructured content

US 20060190465A1
Filed: 02/23/2006
Published: 08/24/2006
Est. Priority Date: 02/24/2005
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

(a) inputting a collection of n vectors x.i;

(b) computing a mean x.avg=(1/n)*sum(i, x.i);

(c) if x.avg is non-zero then;

(c1) computing for each i a deviation d.i=<

x.i−

x.avg, x.avg>

;

(d) if x.avg is zero then;

(d1) picking a specific x.i that is not zero and denoting it x.pvg;

(d2) computing for each i a deviation d.i=<

x.i−

x.pvg, x.pvg>

;

(e) initializing D to the collection of values d.i; and

(f if D contains more than two elements and the smallest and largest members of D are different then;

(f1) removing the smallest and largest values from the collection D;

(f2) repeating (f)-(f1);

(g) if D contains one element then;

(g1) outputting said D one element as a split value;

(h) if D contains two elements then;

(h1) computing an average of the deviations corresponding to said D two elements;

(h2) outputting said average of the deviations corresponding to said D two elements as a split value.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and apparatus for efficient indexed storage for unstructured content have been disclosed.

36 Citations

View as Search Results

20 Claims

1. A method comprising:
- (a) inputting a collection of n vectors x.i;
  
  (b) computing a mean x.avg=(1/n)*sum(i, x.i);
  
  (c) if x.avg is non-zero then;
  
  (c1) computing for each i a deviation d.i=<
  
  x.i−
  
  x.avg, x.avg>
  
  ;
  
  (d) if x.avg is zero then;
  
  (d1) picking a specific x.i that is not zero and denoting it x.pvg;
  
  (d2) computing for each i a deviation d.i=<
  
  x.i−
  
  x.pvg, x.pvg>
  
  ;
  
  (e) initializing D to the collection of values d.i; and
  
  (f if D contains more than two elements and the smallest and largest members of D are different then;
  
  (f1) removing the smallest and largest values from the collection D;
  
  (f2) repeating (f)-(f1);
  
  (g) if D contains one element then;
  
  (g1) outputting said D one element as a split value;
  
  (h) if D contains two elements then;
  
  (h1) computing an average of the deviations corresponding to said D two elements;
  
  (h2) outputting said average of the deviations corresponding to said D two elements as a split value.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1 wherein said x.i has no duplicates.
  - 3. The method of claim 1 further comprising:
    - determining if x.pvg yields a good split value; and
      
      if not then;
      
      discarding said x.pvg value; and
      
      repeating (d)-(d2).
  - 4. The method of claim 1 further comprising:
    - computing for any arbitrary vector y a value d=<
      
      y−
      
      x.avg, x.avg>
      
      ; and
      
      if d >
      
      0 then;
      
      assigning said vector y to an “
      
      upper”
      
      group;
      
      else;
      
      assigning said vector y to a “
      
      lower”
      
      group.
  - 5. A machine-readable medium having stored thereon instructions, which when executed performs the method of claim 1.
  - 6. A system comprising a processor coupled to a memory, which when executing a set of instructions performs the method of claim 1.

7. An apparatus comprising:
- (a) means for inputting a collection of n vectors x.i;
  
  (b) means for computing a mean x.avg=(1/n)*sum(i, x.i);
  
  (c) if x.avg is non-zero then;
  
  (c1) means for computing for each i a deviation d.i=<
  
  x.i−
  
  x.avg, x.avg>
  
  ;
  
  (d) if x.avg is zero then;
  
  (d1) means for picking a specific x.i that is not zero and denoting it x.pvg;
  
  (d2) means for computing for each i a deviation d.i=<
  
  x.i−
  
  x.pvg, x.pvg>
  
  ;
  
  (e) means for initializing D to the collection of values d.i; and
  
  (f if D contains more than two elements and the smallest and largest members of D are different then;
  
  (f1) means for removing the smallest and largest values from the collection D;
  
  (f2) means for repeating (f)-(f1);
  
  (g) if D contains one element then;
  
  (g1) means for outputting said D one element as a split value;
  
  (h) if D contains two elements then;
  
  (h1) means for computing an average of the deviations corresponding to said D two elements;
  
  (h2) means for outputting said average of the deviations corresponding to said D two elements as a split value.

8. A method comprising:
- (a) inputting a node N;
  
  (b) inputting a query vector q;
  
  (c) if N is a leaf node then;
  
  (c1) computing a distance e.i=∥
  
  q−
  
  x.i∥
  
  for each vector x.i contained in node N; and
  
  (c2) returning a element with the smallest distance e.i;
  
  (d) else;
  
  (d1) obtaining a splitter S from node N;
  
  (d2) computing d=<
  
  q−
  
  S.avg, S.avg>
  
  ;
  
  (e) if d>
  
  S.split then;
  
  (e1) returning (N.upper, q);
  
  (g) else;
  
  (e1) returning (N.lower, q);
- View Dependent Claims (9, 10, 11)
- - 9. The method of claim 8 wherein said (e) comprises:
    - if d>
      
      =S.split then;
  - 10. A machine-readable medium having stored thereon instructions, which when executed performs the method of claim 8.
  - 11. A system comprising a processor coupled to a memory, which when executing a set of instructions performs the method of claim 8.

12. A method comprising:
- (a) inputting a node N;
  
  (b) inputting a query vector q;
  
  (c) inputting a distance h;
  
  (d) if N is a leaf node then;
  
  (d1) computing a distance e.i=∥
  
  q−
  
  x.i∥
  
  for each vector x.i contained in node N; and
  
  (d2) returning all elements with a distance e.i<
  
  h;
  
  (d) else;
  
  (d1) obtaining a splitter S from node N;
  
  (d2) computing d=<
  
  q−
  
  S.avg, S.avg>
  
  ;
  
  (d3) computing distance from q to hyperplane=sqrt(|S.split−
  
  d|);
  
  (e) if h>
  
  sqrt(|S.split−
  
  d|) then;
  
  (e1) computing SearchFiniteNeighborhood (N.upper, q, h) by completing steps (a)-(x) wherein N at (a) is N.upper;
  
  (e2) computing SearchFiniteNeighborhood (N.lower, q, h) by completing steps (a)-(g) wherein N at (a) is N.lower;
  
  (e3) computing the union of results (e1) and (e2) and returning said union of results;
  
  (f) else if d>
  
  S.split then;
  
  (f1) returning results from SearchFiniteNeighborhood(N.upper, q, h);
  
  (g) else;
  
  (f1) returning results from SearchFiniteNeighborhood(N. lower, q, h).
- View Dependent Claims (13, 14)
- - 13. A machine-readable medium having stored thereon instructions, which when executed performs the method of claim 12.
  - 14. A system comprising a processor coupled to a memory, which when executing a set of instructions performs the method of claim 12.

15. An apparatus comprising:
- (a) means for inputting a node N;
  
  (b) means for inputting a query vector q;
  
  (c) means for inputting a distance h;
  
  (d) if N is a leaf node then;
  
  (d1) means for computing a distance e.i=∥
  
  q−
  
  x.i∥
  
  for each vector x.i contained in node N; and
  
  (d2) means for returning all elements with a distance e.i<
  
  h;
  
  (d) else;
  
  (d1) means for obtaining a splitter S from node N;
  
  (d2) means for computing d=<
  
  q−
  
  S.avg, S.avg>
  
  ;
  
  (d3) means for computing distance from q to hyperplane=sqrt(|S.split−
  
  d|);
  
  (e) if h>
  
  sqrt(|S.split−
  
  d|) then;
  
  (e1) means for computing SearchFiniteNeighborhood (N.upper, q, h) by completing steps (a)-(x) wherein N at (a) is N.upper;
  
  (e2) means for computing SearchFiniteNeighborhood (N.lower, q, h) by completing steps (a)-(g) wherein N at (a) is N.lower;
  
  (e3) means for computing the union of results (e1) and (e2) and returning said union of results;
  
  (f) else if d>
  
  S.split then;
  
  (f1) means for returning results from SearchFiniteNeighborhood(N.upper, q, h);
  
  (g) else;
  
  (f1) means for returning results from SearchFiniteNeighborhood(N. lower, q, h).

16. An apparatus comprising:
- means for using a plurality of inner products to build an indexed structure; and
  
  means for finding a most similar item to one in said indexed structure.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The apparatus of claim 16 further comprising means for representing unstructured content data elements by vectors in said plurality of inner products.
  - 18. The apparatus of claim 16 further comprising means for insertion and means for query wherein said means for insertion and said means for query are timing related proportionally to log(N) wherein N is a collection of N items in said indexed storage structure.
  - 19. The apparatus of claim 16 further comprising means for determining a split value.
  - 20. The apparatus of claim 19 further comprising means for using said split value in means for splitting said plurality of inner products into two groups.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nahava, Inc.
Original Assignee
Nahava, Inc.
Inventors
Nakano, Russell T.

Granted Patent

US 7,650,320 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 16/2264   Multidimensional index stru...

G06F 16/322   Trees

G06F 16/3347   using vector based model

G06F 16/41   Indexing; Data structures t...

Method and apparatus for efficient indexed storage for unstructured content

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

36 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

Method and apparatus for efficient indexed storage for unstructured content

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

36 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others