Vector index preparing method, similar vector searching method, and apparatuses for the methods

US 7,007,019 B2
Filed: 12/21/2000
Issued: 02/28/2006
Est. Priority Date: 12/21/1999
Status: Expired due to Fees

First Claim

Patent Images

1. A method of preparing an index, which is searchable by a computer, with respect to a vector database in which a finite number of ordered lists each including at least N-dimensional real vector and an identification number of the vector are registered as vector data, said index being used for data retrieval using a computer, said method comprising:

a first step of vector index preparation of dividing N components into m ordered list in a predetermined method with respect to the N-dimensional real vector V of each vector data in said vector database, preparing m partial vectors v₁to v_m, subsequently tabulating a distribution of a norm of the partial vector v_k(k=1 to m), preparing a norm partition table which contains a predetermined number of norm ranges, calculating a region number d to which said partial vector v_kbelongs in accordance with predetermined D region center vectors p₁to p_D, tabulating a distribution of a cosine (v_k·

p_d)/(|V_k|*|p_d|) of an angle formed by said partial vector v_kand the region center vector p_das a declination distribution, and preparing a declination partition table which contains a predetermined number of declination ranges;

a second step of the vector index preparation of dividing N components into m ordered lists in the same method as said first step with respect to the N-dimensional real vector V of each vector data in said vector database, preparing m partial vectors v₁to v_m, referring to said norm partition table to calculate a number r of the norm partition to which the norm of said partial vector v_bbelongs with respect to the partial vector v_b(b=1 to m) for the partial space number b, calculating the region number d to which said partial vector v_bbelongs in accordance with the predetermined D region center vectors p₁to p_Din the same method as said first step, calculating a declination (v_b·

p_d)/(|v_b|*|p_d|) as a cosine of an angle formed by said partial vector v_band the region center vector p_dindicating a center direction of the region of said region number d, referring to said declination partition table, calculating a number c of the belonging declination partition, and calculating index registration data to be registered in a vector index from said partial space number b, said region number d, said declination partition number c, said norm partition number r, the component of said partial vector v_b, and the identification number i; and

a third step of the vector index preparation of constituting the vector index such that the identification number and the component of each partial vector can be searched using a ordered list of the partial space number b, the region number d, the declination partition number c and a norm partition number range (r₁, r₂) as a key from said norm partition table, said declination partition table, and said index registration data, and such that the vector component of each vector data can be searched with the identification number of the vector component.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In the present invention, a similar vector is searched from a several hundreds dimensional vector database at a high speed, by a single vector index, and in accordance with either measure of an inner product or a distance by designating a similarity search range and maximum obtained pieces number, vector index preparation is performed by decomposing each vector into a plurality of partial vectors and characterizing the vector by a norm division, belonging region and declination division to prepare an index, and similarity search is performed by obtaining a partial query vector and partial search range from a query vector and search range, performing similarity search in each partial space to accumulate a difference from the search range and to obtain an upper limit value, and obtaining a correct measure from a higher upper limit value to obtain a final similarity search result.

32 Citations

View as Search Results

29 Claims

1. A method of preparing an index, which is searchable by a computer, with respect to a vector database in which a finite number of ordered lists each including at least N-dimensional real vector and an identification number of the vector are registered as vector data, said index being used for data retrieval using a computer, said method comprising:
- a first step of vector index preparation of dividing N components into m ordered list in a predetermined method with respect to the N-dimensional real vector V of each vector data in said vector database, preparing m partial vectors v₁to v_m, subsequently tabulating a distribution of a norm of the partial vector v_k(k=1 to m), preparing a norm partition table which contains a predetermined number of norm ranges, calculating a region number d to which said partial vector v_kbelongs in accordance with predetermined D region center vectors p₁to p_D, tabulating a distribution of a cosine (v_k·
  
  p_d)/(|V_k|*|p_d|) of an angle formed by said partial vector v_kand the region center vector p_das a declination distribution, and preparing a declination partition table which contains a predetermined number of declination ranges;
  
  a second step of the vector index preparation of dividing N components into m ordered lists in the same method as said first step with respect to the N-dimensional real vector V of each vector data in said vector database, preparing m partial vectors v₁to v_m, referring to said norm partition table to calculate a number r of the norm partition to which the norm of said partial vector v_bbelongs with respect to the partial vector v_b(b=1 to m) for the partial space number b, calculating the region number d to which said partial vector v_bbelongs in accordance with the predetermined D region center vectors p₁to p_Din the same method as said first step, calculating a declination (v_b·
  
  p_d)/(|v_b|*|p_d|) as a cosine of an angle formed by said partial vector v_band the region center vector p_dindicating a center direction of the region of said region number d, referring to said declination partition table, calculating a number c of the belonging declination partition, and calculating index registration data to be registered in a vector index from said partial space number b, said region number d, said declination partition number c, said norm partition number r, the component of said partial vector v_b, and the identification number i; and
  
  a third step of the vector index preparation of constituting the vector index such that the identification number and the component of each partial vector can be searched using a ordered list of the partial space number b, the region number d, the declination partition number c and a norm partition number range (r₁, r₂) as a key from said norm partition table, said declination partition table, and said index registration data, and such that the vector component of each vector data can be searched with the identification number of the vector component.
- View Dependent Claims (3, 4, 5, 6, 7, 8, 9, 28)
- - 3. The vector index preparing method according to claim 1 or 2 wherein in the first and second steps of said vector index preparation, an angle cosine (vb·
    - pd)/(|vb|*|pd|) is used as a function of an angle formed by the partial vector vb and the region center vector pd, and a value of the function is used as a declination to obtain the declination distribution.
  - 4. The vector index preparing method according to claim 1 or 2 wherein in the first and second steps of said vector index preparation, N/m components or (N/m)+1 components are extracted in order from a top component of V so that all components of an N-dimensional vector V are extracted, and the partial vector is prepared.
  - 5. The vector index preparing method according to claim 1 wherein in the first step of said vector index preparation, during preparation of the norm division table, the norm partition is determined based on the tabulation result of the norm distribution so that the number of partial vectors belonging to the norm range corresponding to each norm division becomes as uniform as possible.
  - 6. The vector index preparing method according to claim 1 wherein in the first step of said vector index preparation, during preparation of the declination division table, the declination division is determined based on the tabulation result of the declination distribution so that the number of partial vectors belonging to the declination range corresponding to each declination division becomes as uniform as possible.
  - 7. The vector index preparing method according to claim 1 or 2 wherein in the first and second steps of said vector index preparation, the region number of the partial vector v_bis obtained as a number d of the region center vector p_din which a cosine (v_b·
    - p_d)/(|v_b|*|p_d|) of an angle formed by p_dand v_bis largest among the predetermined D region center vector p₁to p_D.
  - 8. The vector index preparing method according to claim 1 or 2 wherein in the third step of said vector index preparation, a search tree in which a number (b*Nd*Nc*Nr)+(d*Nc*Nr)+(c*Nr)+r obtained by combining the partial space number b, the region number d, the declination division number c, and the norm division number r can be used as a key to search the identification number i and the component of the vector, and a table in which the vector data identification number is used as an affix and the key of said search tree of each partial vector is recorded are prepared and used as part of the vector index.
  - 9. The vector index preparing method according to claim 1 or 2 wherein in the second step of said vector index preparation, the vector obtained by normalizing all vectors (0, . . . , 0, +1) to (−
    - 1, . . . , −
      
      1) whose component is any one of {−
      
      1, 0, +1} and which are not 0 vector is used as the region center vector.
  - 28. A recording medium in which a computer program for executing the method of claim 1 or 2 is recorded.

2. A method of preparing an index, which is searchable by a computer, with respect to a vector database in which a finite number of ordered lists each including at least N-dimensional real vector and an identification number of the vector are registered as vector data, said index being used for data retrieval using a computer, said method comprising:
- a first step of vector index preparation of dividing N components into m ordered list in a predetermined method with respect to the N-dimensional real vector V of each vector data in said vector database, preparing m partial vectors v₁to v_m, subsequently tabulating a distribution of a norm of the partial vector v_b(b=1 to m) for each partial space number b, preparing a norm partition table which contains a predetermined number of norm ranges, calculating a region number d to which said partial vector v_bbelongs in accordance with predetermined D region center vectors p₁to p_Dtabulating a distribution of a cosine (v_b·
  
  p_d)/(|v_b|*|p_d|) of an angle formed by said partial vector v_band the region center vector p_das a declination distribution, and preparing a declination partition table which contains a predetermined number of norm ranges;
  
  a second step of the vector index preparation of dividing N components into m ordered list in the same method as said first step with respect to the N-dimensional real vector V of each vector data in said vector database, preparing m partial vectors v₁to v_m, referring to said norm partition table to calculate a number r of the norm partition to which the norm of said partial vector v_bbelongs with respect to the partial vector v_b(b=1 to m) for said partial space b, calculating the region number d to which said partial vector v_bbelongs in accordance with the predetermined D region center vectors p₁to p_Din the same method as said first step, calculating a declination (v_b·
  
  p_d)/(|v_b|*|p_d|) as a cosine of an angle formed by said partial vector v_band the region center vector p_dindicating a center direction of the region of said region number d, referring to said declination partition table, calculating a number c of the belonging declination partition, calculating a component partition number w_jof a predetermined range to which v_bjbelongs from a maximum value of the norm of the norm partition corresponding to said calculated norm partition number r with respect to each component v_bjof said calculated partial vector v_b, and calculating index registration data to be registered in a vector index from said partial space number b, said region number d, said declination partition number c, said norm partition number r, a string of said component partition numbers w_j, and the identification number i; and
  
  a third step of the vector index preparation of constituting the vector index such that the identification number and the component of each partial vector can be searched using a set of the partial space number b, the region number d, the declination partition number c and a norm partition number range (r₁, r₂) as a key from said norm partition table, said declination partition table, and said index registration data, and such that the vector component of each vector data can be searched with the identification number of the vector component.

10. A similarity vector searching method in which a query vector Q of an N-dimensional real vector, an inner product lower limit value α
- , and maximum obtained vector number L are designated as search conditions, a vector index prepared from vector data with a finite number of ordered list of at least N-dimensional real vector and an ID number of the real vector registered therein is searched, and L ordered list at maximum (i, V·
  
  Q) of an identification number i and an inner product of Q and V are obtained with respect to vector data (i, V) of said vector database whose value V·
  
  Q of the inner product with said query vector Q is larger than said inner product lower limit value α
  
  , said similar vector searching method comprising;
  
  a first step of similar vector search of dividing N components of Q into m ordered lists in the same predetermined method as a method used in preparing said vector index with respect to said query vector Q, preparing m partial query vectors q_lto q_m, calculating a partial inner product lower limit value f_bas a lower limit value of a partial inner product of each partial query vector q_band the corresponding partial vector from a designated inner product lower limit value α
  
  , calculating a partial space number b, and an ordered list (c, (r₁, r₂)) of a declination division number c to be searched in a region number d and a norm partition range (r₁, r₂) from a value of an inner product p_d·
  
  q_bof the region center vector p_dand said partial query vector q_b, said partial inner product lower limit value f_b, and a norm partition table and a declination partition table in said vector index with respect to each partial query vector q_b(b=1 to m) and each region b, searching a range of said vector index using (b, d, c, (r₁, r₂)) as a search condition based on said calculated (c, (r₁, r₂)), obtaining the identification number i and the component of the partial vector v_bsatisfying the condition as an index search result, calculating a partial inner product difference (v_b·
  
  q_b)−
  
  f_bas a difference between a partial inner product v_b·
  
  q_bof said v_band q_band said partial inner product lower limit value f_b, and accumulating (adding) the difference as an inner product difference upper limit value S(i) of the identification number i of an inner product difference table; and
  
  a second step of the similar vector search of searching said vector index with the identification number i in order from a largest value in said inner product difference table S(i) to obtain a vector data component V, calculating an inner product difference value t=V·
  
  Q−
  
  α
  
  by subtracting a from the inner product V·
  
  Q of V and said query vector Q, and outputting an ordered list of at least the identification number i and an inner product t+α
  
  as a search result with respect to L pieces at maximum of vector data with a large inner product difference value when L or more pieces of vector data having the inner product difference value larger than a maximum value of an element having a non-calculated inner product difference value are collected, or when the inner products of all the vector data having a positive inner product difference upper limit value are calculated in said inner product difference table.
- View Dependent Claims (12)
- - 12. The similar vector searching method according to claim 10 or 11 wherein in the first step of said similar vector search, N/m components or (N/m)+1 components are extracted in order from a top component of V so that all components of an N-dimensional vector V are extracted, and the partial query vector is prepared.

11. A similarity vector searching method in which a query vector Q of an N-dimensional real vector, a distance upper limit value α
- , and maximum obtained vector number L are designated as search conditions, a vector index prepared from vector data with a finite number of ordered lists of at least N-dimensional real vector and an identification number of the real vector registered therein is searched, and L ordered lists at maximum (i, p) of an identification number i of an N-dimensional real vector V in said vector data and a distance p between Q and V are obtained such that a value of an inner product with said query vector Q is not more than said distance upper limit value α
  
  , said similar vector searching method comprising;
  
  a first step of similar vector search of dividing N components of Q into m ordered lists in the same predetermined method as a method used in preparing said vector index with respect to said query vector Q, preparing m partial query vectors q₁to q_m, calculating a partial square distance upper limit value f_bas an upper limit value of a partial square distance |v_b−
  
  q_b|²(i.e.,) corresponding to square of Euclidean distance of each partial query vector q_band the corresponding partial vector v_bfrom a designated distance upper limit value α
  
  , systematically generating an ordered list (b, d, c, (r₁, r₂)) of a partial space number b to be searched, a region number d, a declination partition number c and a norm partition range (r₁, r₂) from said partial query vector q_b, said partial square distance upper limit value f_b, and a norm partition table and a declination partition table in said vector index with respect to each partial query vector q_b(b=1 to m), searching a range of said vector index using said generated (b, d, c, (r₁, r₂)) as a search condition, obtaining the identification number i and the component of the partial vector v_bsatisfying the condition as an index search result, calculating a partial square distance difference f_b−
  
  |v_b−
  
  q_b|²as a difference between said partial square distance upper limit value f_band a partial square distance |v_b−
  
  q_b|²of v_band q_b, and accumulating (adding) the difference as a square distance difference upper limit value S(i) of the identification number i of a square distance difference table; and
  
  a second step of the similar vector search of searching said vector index with the identification number i in order from a largest value in said square distance difference table S(i) to obtain a vector data component V, calculating a square distance difference value α
  
  ²−
  
  |V−
  
  Q|²by subtracting a square distance |V−
  
  Q|²of V and said query vector Q from a squared distance upper limit value α
  
  ², and outputting an ordered list of at least the identification number i and a distance (α
  
  ²−
  
  t)^1/2as a search result with respect to L pieces at maximum of vector data with a large square distance difference value t when L or more pieces of vector data having the square distance difference value larger than a maximum value of an element having a non-calculated square distance difference value are collected, or when the square distance difference values of all the vector data having a positive square distance difference upper limit value are calculated in said square distance difference table.
- View Dependent Claims (13, 14)
- - 13. The similar vector searching method according to claim 11 wherein in the first step of said similar vector search, the partial inner product lower limit value f_bas the lower limit value of the inner product of said partial query vector q_band the corresponding partial vector v_bis calculated from a designated inner product lower limit value α
    - by f_b=α
      
      |q_b|²/Σ
      
      (|q_b|²).
  - 14. The similar vector searching method according to claim 11 wherein in the first step of said similar vector search, the partial square distance upper limit value f_bas the upper limit value of the square distance of said partial query vector q_band the corresponding partial vector v_bis calculated from a designated distance lower/upper limit value α
    - by f_b=α
      
      ²|q_b|²/Σ
      
      (|q_b|²).

15. An apparatus for preparing an index, which is searchable by a computer, with respect to a vector database in which a finite number of ordered lists each including at least N-dimensional real vector and an identification number of the vector are registered as vector data, said index being used for data retrieval using a computer, said apparatus comprising:
- partial vector calculation means for dividing N components into m ordered lists in a predetermined method with respect to the N-dimensional real vector V of each vector data in said vector database, and preparing m partial vectors v₁to v_m;
  
  norm distribution tabulation means for tabulating a distribution of a norm of the partial vector v_k(k=1 to m) among said prepared m partial vectors v₁to v_m, and preparing a norm partition table which contains a predetermined number of norm ranges;
  
  region number calculation means for calculating a region number d to which said partial vector v_kbelongs in accordance with predetermined D region center vectors p_lto p_D;
  
  declination distribution tabulation means for tabulating a distribution of a cosine (v_k·
  
  p_d)/(|V_k|*|p_d|) of an angle formed by said partial vector v_kand the region center vector p_das a declination distribution, and preparing a declination partition table which contains a predetermined number of declination ranges;
  
  norm division number calculation means for referring to said norm partition table to calculate a number r of the norm partition to which the norm of said partial vector v_bbelongs with respect to the partial vector v_b(b=1 to m) for the partial space number b among the m partial vectors v₁to v_mprepared by said partial vector calculation means;
  
  declination partition number calculation means for calculating a declination (v_b·
  
  p_d)/(|v_b|*|p_d|) as a cosine of an angle formed by said partial vector v_band the region center vector p_dindicating a center direction of the region of said region number d calculated by said region number calculation means;
  
  index data calculation means for calculating index registration data to be registered in a vector index from said partial space number b, said region number d, said declination partition number c, said norm partition number r, the component of said partial vector v_b, and the identification number i; and
  
  index constituting means for constituting the vector index such that the identification number and the component of each partial vector can be searched using an ordered list of the partial space number b, the region number d, the declination partition number c and a norm partition number range as a key from said norm partition table, said declination partition table, and said index registration data, and such that the vector component of each vector data can be searched with the identification number of the vector component.
- View Dependent Claims (17, 18, 19, 20, 21, 22, 29)
- - 17. The vector index preparing apparatus according to claim 15 or 16 wherein said partial vector calculation means extracts N/m components or (N/m)+1 components in order from a top component of V so that all components of an N-dimensional vector V are extracted, and prepares the partial vector.
  - 18. The vector index preparing apparatus according to claim 15 wherein during preparation of the norm division table said norm distribution tabulation means determines the norm division based on the tabulation result of the norm distribution so that the number of partial vectors belonging to the norm range corresponding to each norm division becomes as uniform as possible.
  - 19. The vector index preparing apparatus according to claim 15 wherein during preparation of the declination division table, said declination distribution tabulation means determines the declination division based on the tabulation result of the declination distribution so that the number of partial vectors belonging to the declination range corresponding to each declination division becomes as uniform as possible.
  - 20. The vector index preparing apparatus according to claim 15 or 16 wherein said region number calculation means obtains the region number of the partial vector v_bas a number d of the region center vector p_din which a cosine (v_b·
    - p_d)/(|v_b|*|p_d|) of an angle formed by p_dand v_bis largest among the predetermined D region center vector p₁to p_D.
  - 21. The vector index preparing apparatus according to claim 15 or 16 wherein said index constituting means prepares a search tree in which a number (b*Nd*Nc*Nr)+(d*Nc*Nr)+(c*Nr)+r obtained by combining the partial space number b, the region number d, the declination division number c, and the norm division number r can be used as a key to search the identification number i and the component of the vector, and a table in which the vector data identification number is used as an affix and the key of said search tree of each partial vector is recorded, and uses the search tree and the table as a part of the vector index.
  - 22. The vector index preparing apparatus according to claim 15 or 16 wherein said region number calculation means uses the vector obtained by normalizing all vectors (0, . . . , 0, +1) to (−
    - 1, . . . , −
      
      1) whose component is any one of {−
      
      1, 0, +1} and which are not 0 vector as the region center vector.
  - 29. A recording medium in which a computer program for realizing the apparatus of claim 15 or 16 by software is recorded.

16. An apparatus for preparing an index, which is searchable by a computer, with respect to a vector database in which a finite number of ordered lists each including at least N-dimensional real vector and an identification number of the vector are registered as vector data, said index being used for data retrieval using a computer, said apparatus comprising:
- partial vector calculation means for dividing N components into m ordered lists in a predetermined method with respect to the N-dimensional real vector V of each vector data in said vector database, and preparing m partial vectors v₁to v_m;
  
  norm distribution tabulation means for tabulating a distribution of a norm of the partial vector v_b(b=1 to m) for a partial space number b among said prepared m partial vectors v₁to v_m, and preparing a norm partition table which contains a predetermined number of norm ranges;
  
  region number calculation means for calculating a region number d to which said partial vector v_bbelongs in accordance with predetermined D region center vectors p₁to p_D;
  
  declination distribution tabulation means for tabulating a distribution of a cosine (v_b·
  
  p_d)/(|v_b|*|p_d|) of an angle formed by said partial vector v_band the region center vector p_das a declination distribution, and preparing a declination partition table which contains a predetermined number of declination ranges;
  
  norm partition number calculation means for referring to said norm partition table to calculate a number r of the norm partition to which the norm of said partial vector v_bbelongs with respect to the partial vector v_b(b=1 to m) for a partial space b among the m partial vectors v₁to v_mprepared by said partial vector calculation means;
  
  declination partition number calculation means for calculating a declination (v_b·
  
  p_d)/(|v_b|*|p_d|) as a cosine of an angle formed by said partial vector v_band the region center vector p_dindicating a center direction of the region of the region number d calculated by said region number calculation means;
  
  component partition number calculation means for calculating a component partition number w_jof a predetermined range to which v_bjbelongs from a maximum value of the norm of the norm partition corresponding to said calculated norm partition number r with respect to each component v_bjof said calculated partial vector v_b;
  
  index data calculation means for calculating index registration data to be registered in a vector index from said partial space number b, said region number d, said declination partition number c, said norm partition number r, a string of said component partition numbers w_j, and the identification number i; and
  
  index constituting means for constituting the vector index such that the identification number and the component of each partial vector can be searched using a ordered list of the partial space number b, the region number d, the declination partition number c and a norm partition number range (r₁, r₂) as a key from said norm partition table, said declination partition table, and said index registration data, and such that the vector component of each vector data can be searched with the identification number of the vector component.

23. A similarity vector searching apparatus for designating a query vector Q of an N-dimensional real vector, an inner product lower limit value α
- , and maximum obtained vector number L as search conditions, searching a vector index prepared from vector data with a finite number of ordered lists of at least N-dimensional real vector and an ID number of the real vector registered therein, and obtaining L ordered lists at maximum (i, V·
  
  Q) of an identification number i and an inner product of Q and V with respect to vector data (i, V) of said vector database whose value V·
  
  Q of the inner product with said query vector Q is larger than said inner product lower limit value α
  
  , said similar vector searching apparatus comprising;
  
  partial query condition calculation means for dividing N components of Q into m ordered lists in the same predetermined method as a method used in preparing said vector index with respect to said query vector Q, preparing m partial query vectors q₁to q_m, and calculating a partial inner product lower limit value f_bas a lower limit value of a partial inner product of each partial query vector q_band the corresponding partial vector from a designated inner product lower limit value α
  
  ;
  
  search object range generation means for calculating a partial space number b, and an ordered list (c, (r₁, r₂)) of a declination partition number c to be searched in a region number d and a norm partition range (r₁, r₂) from a value of an inner product p_d·
  
  q_bof the region center vector p_dand said partial query vector q_b, said partial inner product lower limit value f_b, and a norm partition table and a declination partition table in said vector index with respect to each partial query vector q_b(b=1 to m) and each region b;
  
  index search means for searching a range of said vector index using (b, d, c, (r₁, r₂)) as a search condition based on (c, (r₁, r₂)) calculated by said search object range generation means, and obtaining the identification number i and the component of the partial vector v_bsatisfying the condition as an index search result;
  
  inner product difference upper limit calculation means for calculating a partial inner product difference (v_b·
  
  q_b)−
  
  f_bas a difference between a partial inner product v_b·
  
  q_bof said v_band q_band said partial inner product lower limit value f_b, and accumulating (adding) the difference as an inner product difference upper limit value S(i) of the identification number i of an inner product difference table; and
  
  similarity search result determination means for searching said vector index with the identification number i in order from a largest value in said inner product difference table S(i) to obtain a vector data component V, calculating an inner product difference value t=V·
  
  Q−
  
  α
  
  by subtracting α
  
  from the inner product V·
  
  Q of V and said query vector Q, and outputting an ordered list of at least the identification number i and an inner product t+α
  
  as a search result with respect to L pieces at maximum of vector data with a large inner product difference value when L or more pieces of vector data having the inner product difference value larger than a maximum value of an element having a non-calculated inner product difference value are collected, or when the inner products of all the vector data having a positive inner product difference upper limit value are calculated in said inner product difference table.
- View Dependent Claims (25, 26)
- - 25. The similar vector searching apparatus according to claim 23 or 24 wherein said partial query condition calculation means extracts N/m components or (N/m)+1 components in order from a top component of V so that all components of an N-dimensional vector V are extracted, and prepares the partial query vector.
  - 26. The similar vector searching apparatus according to claim 23 wherein the partial inner product lower limit value f_bas the lower limit value of the inner product of said partial query vector q_b, and the corresponding partial vector v_bis calculated from a designated inner product lower limit value α
    - by f_b=α
      
      |q_b|²/Σ
      
      (|q_b|²).

24. A similarity vector searching apparatus for designating a query vector Q of an N-dimensional real vector, a distance upper limit value α
- , and maximum obtained vector number L as search conditions, searching a vector index prepared from vector data with a finite number of ordered lists of at least N-dimensional real vector and an identification number of the real vector registered therein, and obtaining L ordered lists at maximum (i, p) of an identification number i of an N-dimensional real vector V in said vector data and a distance p between Q and V such that a value of an inner product with said query vector Q is not more than said distance upper limit value α
  
  , said similar vector searching apparatus comprising;
  
  partial query condition calculation means for dividing N components of Q into m ordered lists in the same predetermined method as a method used in preparing said vector index with respect to said query vector Q, preparing m partial query vectors q₁to q_m, calculating a partial square distance upper limit value f_bas an upper limit value of a partial square distance |v_b−
  
  q_b|²(i.e.,) corresponding to square of Euclidean distance of each partial query vector q_band the corresponding partial vector v_bfrom a designated distance upper limit value α
  
  ;
  
  search object range generation means for systematically generating an ordered list (b, d, c, (r₁, r₂)) of a partial space number b to be searched, a region number d, a declination partition number c and a norm partition range (r₁, r₂) from said partial query vector q_b, said partial square distance upper limit value f_b, and a norm partition table and a declination partition table in said vector index with respect to said partial query vector q_b(b=1 to m);
  
  index search means for searching a range of said vector index using (b, d, c, (r₁, r₂)) generated by said search object range generation means as a search condition, and obtaining the identification number i and the component of the partial vector v_bsatisfying the condition as an index search result;
  
  square distance difference upper limit calculation means for calculating a partial square distance difference f_b−
  
  |v_b−
  
  q_b|²as a difference between said partial square distance upper limit value f_band a partial square distance |v_b−
  
  q_b|²of v_band q_b, and accumulating (adding) the difference as a square distance difference upper limit value S(i) of the identification number i of a square distance difference table; and
  
  similarity search result determination means for searching said vector index with the identification number i in order from a largest value in said square distance difference table S(i) to obtain a vector data component V, calculating a square distance difference value α
  
  ²−
  
  |V−
  
  Q|²by subtracting a square distance |V−
  
  Q|²of V and said query vector Q from a squared distance upper limit value α
  
  ², and outputting an ordered list of at least the identification number i and a distance (α
  
  ²−
  
  t)^1/2as a search result with respect to L pieces at maximum of vector data with a large square distance difference value t when L or more pieces of vector data having the square distance difference value larger than a maximum value of an element having a non-calculated square distance difference value are collected, or when the square distance difference values of all the vector data having a positive square distance difference upper limit value are calculated in said square distance difference table.
- View Dependent Claims (27)
- - 27. The similar vector searching apparatus according to claim 24 wherein the partial square distance upper limit value f_bas the upper limit value of the square distance of said partial query vector q_band the corresponding partial vector v_bis calculated from a designated distance lower/upper limit value α
    - by f_b=α
      
      ²|q_b|²/Σ
      
      (|q_b|²).

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Matsushita Electric Industrial Company Limited (Panasonic Holdings Corporation)
Original Assignee
Matsushita Electric Industrial Company Limited (Panasonic Holdings Corporation)
Inventors
Kanno, Yuji
Primary Examiner(s)
Kindred, Alford
Assistant Examiner(s)
To, Baoquoc N

Application Number

US09/913,960
Publication Number

US 20020178158A1
Time in Patent Office

1,895 Days
Field of Search

707/3, 707/4, 707/5, 707/10
US Class Current

1/1
CPC Class Codes

G06F 16/338   Presentation of query results

Y10S 707/99933   Query processing, i.e. sear...

Y10S 707/99934   Query formulation, input pr...

Y10S 707/99935   Query augmenting and refini...

Vector index preparing method, similar vector searching method, and apparatuses for the methods

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

32 Citations

29 Claims

Specification

Use Cases

Quick Links

Others

Vector index preparing method, similar vector searching method, and apparatuses for the methods

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

32 Citations

29 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others