Data processing apparatus and method
First Claim
1. A data processing apparatus for extracting, from a set of data having a vector format stored in a database, a first prescribed number of items of data having a high degree of similarity with a query vector, comprising:
- a database storing a set of data having a vector format;
list creation means for creating lists of data in each of which data of said database is sorted in order of decreasing strength of respective one component of a vector;
list-priority decision means for deciding a priority of each list;
input means for inputting the query vector;
selection means for successively selecting, from the lists based upon the list priority and ranking in each list, a second prescribed number of items of data not yet output;
similarity calculation means for calculating a degree of similarity between the query vector and each of only the second prescribed number of items of data selected by said selection means;
output means for outputting one item of data, from the second prescribed number of items of data successively selected by said selection means, based upon degree of similarity between each of the second prescribed number of items of data and the query vector; and
control means for controlling said selection means, said similarity calculation means, and said output means to repeat operations until the first prescribed number of items of data are output from said output means.
6 Assignments
0 Petitions
Accused Products
Abstract
A data processing apparatus for extracting, from a set of data having a vector format stored in a database, a first prescribed number of items of data having a high degree of similarity with a query vector includes a list creation unit and a candidate output unit. The list creation unit creates lists of data in each of which data of the database is sorted in order of decreasing strength of respective one component of a vector. The candidate output unit decides the priority of each list, successively selects, from the lists based upon the list priority and ranking in each list, a second prescribed number of items of data not yet output, and outputs one item of data, from the second prescribed number of items of data selected, based upon degree of similarity between each item of the above-mentioned data and the query data. By virtue of the list creation unit and candidate output unit, the first prescribed number of items of candidate data similar to a query vector are obtained at high speed from the data in the database.
-
Citations
13 Claims
-
1. A data processing apparatus for extracting, from a set of data having a vector format stored in a database, a first prescribed number of items of data having a high degree of similarity with a query vector, comprising:
-
a database storing a set of data having a vector format;
list creation means for creating lists of data in each of which data of said database is sorted in order of decreasing strength of respective one component of a vector;
list-priority decision means for deciding a priority of each list;
input means for inputting the query vector;
selection means for successively selecting, from the lists based upon the list priority and ranking in each list, a second prescribed number of items of data not yet output;
similarity calculation means for calculating a degree of similarity between the query vector and each of only the second prescribed number of items of data selected by said selection means;
output means for outputting one item of data, from the second prescribed number of items of data successively selected by said selection means, based upon degree of similarity between each of the second prescribed number of items of data and the query vector; and
control means for controlling said selection means, said similarity calculation means, and said output means to repeat operations until the first prescribed number of items of data are output from said output means. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A data processing method for extracting, from a set of data having a vector format stored in a database, a first prescribed number of items of data having a high degree of similarity with a query vector, comprising:
-
a list creation step of creating lists of data in each of which data of said database is sorted in order of decreasing strength of respective one component of a vector;
a list-priority decision step of deciding a priority of each list;
an input step of inputting the query vector;
a selection step of successively selecting, from the list based upon the list priority and ranking in each list, a second prescribed number of items of data not yet output;
a similarity calculation step of calculating a degree of similarity between the query vector and each of only the second prescribed number of items of data selected in said selection step;
an output step of outputting one item of data, from the second prescribed number of items of data successively selected by said selection step, based upon degree of similarity between each of the second prescribed number of items of data and the query data, and a control step of controlling said selection step, said similarity calculation step, and said output step to repeat operations until the first prescribed number of items of data are output in said output step. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A storage medium storing a data processing program for causing a computer to extract, from a set of data having a vector format stored in a database, a first prescribed number of items of data having a high degree of similarity with a query vector, said data processing program having:
-
program code of a list creation step of creating lists of data in each of which data of said database is sorted in order of decreasing strength of respective one component of a vector;
program code of a list-priority decision step of deciding a priority of each list;
program code of an input step of inputting the query vector;
program code of a selection step of successively selecting, from the lists based upon the list priority and ranking in each list, a second prescribed number of items of data not yet output;
program code of a similarity calculation step of calculating a degree of similarity between the query vector and each of only the second prescribed number of items of data selected in the selection step;
program code of an output step of outputting one item of data, from the second prescribed number of items of data successively selected by said selection step, based upon degree of similarity between each of the second prescribed number of items of data and the query data, and program code of a control step for controlling the selection step, the similarity calculation step, and the output step to repeat operations until the first prescribed number of items of data are output by the output step.
-
Specification