Data processing apparatus and method

US 6,334,129 B1
Filed: 01/25/1999
Issued: 12/25/2001
Est. Priority Date: 01/30/1998
Status: Expired due to Term

First Claim

Patent Images

1. A data processing apparatus for extracting, from a set of data having a vector format stored in a database, a first prescribed number of items of data having a high degree of similarity with a query vector, comprising:

a database storing a set of data having a vector format;

list creation means for creating lists of data in each of which data of said database is sorted in order of decreasing strength of respective one component of a vector;

list-priority decision means for deciding a priority of each list;

input means for inputting the query vector;

selection means for successively selecting, from the lists based upon the list priority and ranking in each list, a second prescribed number of items of data not yet output;

similarity calculation means for calculating a degree of similarity between the query vector and each of only the second prescribed number of items of data selected by said selection means;

output means for outputting one item of data, from the second prescribed number of items of data successively selected by said selection means, based upon degree of similarity between each of the second prescribed number of items of data and the query vector; and

control means for controlling said selection means, said similarity calculation means, and said output means to repeat operations until the first prescribed number of items of data are output from said output means.

View all claims

6 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A data processing apparatus for extracting, from a set of data having a vector format stored in a database, a first prescribed number of items of data having a high degree of similarity with a query vector includes a list creation unit and a candidate output unit. The list creation unit creates lists of data in each of which data of the database is sorted in order of decreasing strength of respective one component of a vector. The candidate output unit decides the priority of each list, successively selects, from the lists based upon the list priority and ranking in each list, a second prescribed number of items of data not yet output, and outputs one item of data, from the second prescribed number of items of data selected, based upon degree of similarity between each item of the above-mentioned data and the query data. By virtue of the list creation unit and candidate output unit, the first prescribed number of items of candidate data similar to a query vector are obtained at high speed from the data in the database.

Citations

13 Claims

1. A data processing apparatus for extracting, from a set of data having a vector format stored in a database, a first prescribed number of items of data having a high degree of similarity with a query vector, comprising:
- a database storing a set of data having a vector format;
  
  list creation means for creating lists of data in each of which data of said database is sorted in order of decreasing strength of respective one component of a vector;
  
  list-priority decision means for deciding a priority of each list;
  
  input means for inputting the query vector;
  
  selection means for successively selecting, from the lists based upon the list priority and ranking in each list, a second prescribed number of items of data not yet output;
  
  similarity calculation means for calculating a degree of similarity between the query vector and each of only the second prescribed number of items of data selected by said selection means;
  
  output means for outputting one item of data, from the second prescribed number of items of data successively selected by said selection means, based upon degree of similarity between each of the second prescribed number of items of data and the query vector; and
  
  control means for controlling said selection means, said similarity calculation means, and said output means to repeat operations until the first prescribed number of items of data are output from said output means.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The apparatus according to claim 1, wherein said list-priority decision means decides the list priority based upon strengths of components of the query vector.
  - 3. The apparatus according to claim 1, wherein the degree of similarity is a norm of each item of data that is weighted based upon strengths of components of the query vector.
  - 4. The apparatus according to claim 1, wherein said selection means selects the second prescribed number of lists from said lists and selects, from each selected list, most significant data among the data not yet output.
  - 5. The apparatus according to claim 4, wherein said selection means selects the second prescribed number of lists based upon priority of the list to which data output last by said output means belonged.
  - 6. The apparatus according to claim 1, wherein said similarity calculating means calculates a degree of similarity between the query vector and data, among the data of the second prescribed number of items selected by said selection means, of which a degree of similarity has not been calculated.

7. A data processing method for extracting, from a set of data having a vector format stored in a database, a first prescribed number of items of data having a high degree of similarity with a query vector, comprising:
- a list creation step of creating lists of data in each of which data of said database is sorted in order of decreasing strength of respective one component of a vector;
  
  a list-priority decision step of deciding a priority of each list;
  
  an input step of inputting the query vector;
  
  a selection step of successively selecting, from the list based upon the list priority and ranking in each list, a second prescribed number of items of data not yet output;
  
  a similarity calculation step of calculating a degree of similarity between the query vector and each of only the second prescribed number of items of data selected in said selection step;
  
  an output step of outputting one item of data, from the second prescribed number of items of data successively selected by said selection step, based upon degree of similarity between each of the second prescribed number of items of data and the query data, and a control step of controlling said selection step, said similarity calculation step, and said output step to repeat operations until the first prescribed number of items of data are output in said output step.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The method according to claim 7, wherein said list priority decision step decides the list priority based upon strengths of components of the query vector.
  - 9. The method according to claim 7, wherein the degree of similarity is a norm of each item of data that is weighted based upon strengths of components of the query vector.
  - 10. The method according to claim 7, wherein said selection step selects the second prescribed number of lists from said lists and selects, from each selected list, most significant data among the data not yet output.
  - 11. The method according to claim 10, wherein said selection step selects the second prescribed number of lists based upon priority of the list to which data output last by said output step belonged.
  - 12. The method according to claim 7, wherein said similarity calculation step calculates a degree of similarity between the query vector and data, among the data of the second prescribed number of items selected by said selection step, of which a degree of similarity has not been calculated.

13. A storage medium storing a data processing program for causing a computer to extract, from a set of data having a vector format stored in a database, a first prescribed number of items of data having a high degree of similarity with a query vector, said data processing program having:
- program code of a list creation step of creating lists of data in each of which data of said database is sorted in order of decreasing strength of respective one component of a vector;
  
  program code of a list-priority decision step of deciding a priority of each list;
  
  program code of an input step of inputting the query vector;
  
  program code of a selection step of successively selecting, from the lists based upon the list priority and ranking in each list, a second prescribed number of items of data not yet output;
  
  program code of a similarity calculation step of calculating a degree of similarity between the query vector and each of only the second prescribed number of items of data selected in the selection step;
  
  program code of an output step of outputting one item of data, from the second prescribed number of items of data successively selected by said selection step, based upon degree of similarity between each of the second prescribed number of items of data and the query data, and program code of a control step for controlling the selection step, the similarity calculation step, and the output step to repeat operations until the first prescribed number of items of data are output by the output step.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Canon Kabushiki Kaisha (Canon Inc.), Takashi Kitagawa
Original Assignee
Canon Kabushiki Kaisha (Canon Inc.), Takashi Kitagawa, Yasushi Kiyoki
Inventors
Kitagawa, Takashi, Washizawa, Teruyoshi, Kiyoki, Yasushi
Primary Examiner(s)
Alam, Hosain T.
Assistant Examiner(s)
TRUONG, CAM Y T

Application Number

US09/236,221
Time in Patent Office

1,065 Days
Field of Search

707/5, 707/1, 707/3, 707/2, 704/7
US Class Current

707/749
CPC Class Codes

G06F 16/90335   Query processing

Y10S 707/99931   Database or file accessing

Y10S 707/99932   Access augmentation or opti...

Y10S 707/99933   Query processing, i.e. sear...

Y10S 707/99935   Query augmenting and refini...

Data processing apparatus and method

First Claim

6 Assignments

0 Petitions

Accused Products

Abstract

Citations

13 Claims

Specification

Solutions

Use Cases

Quick Links

Data processing apparatus and method

First Claim

6 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

13 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links