Data processing system and method
First Claim
1. A data processing method for retrieving a subset of k items from a database of n items (n>
- >
k), the method comprising;
(a) determining the bk items (b>
1) in the database of n items which have the greatest similarity to an input query t according to a given similarity function S, (b) selecting as the first member of the subset that item of the bk items having the highest similarity S to the query t, and (c) iteratively selecting each successive member of the subset as that remaining item of the bk items having the highest quality Q, where Q is a given function of similarity S to the input query t and relative diversity RD, wherein relative diversity RD is a given function of the diversity of that remaining item with respect to the items selected during the previous iteration(s).
1 Assignment
0 Petitions
Accused Products
Abstract
A data processing method and system for retrieving a subset of k items from a database of n items (n≧k) firstly determines a limited set of bk items (b>1) in the database which have the greatest similarity to an input query t according to a given similarity function S. A result subset is then constructed by including as a first member the item having the greatest similarity S to the query t, and iteratively selecting each successive member of the subset as that remaining item of the bk items having the highest quality Q, where Q is a given function of both similarity to the input query t and relative diversity RD with respect to the items already in the results subset. In this way the diversity of the results subset is greatly increased relative to a simple selection of the k most similar items to the query t, with only a modest additional increase in processing requirements.
50 Citations
24 Claims
-
1. A data processing method for retrieving a subset of k items from a database of n items (n>
- >
k), the method comprising;
(a) determining the bk items (b>
1) in the database of n items which have the greatest similarity to an input query t according to a given similarity function S,(b) selecting as the first member of the subset that item of the bk items having the highest similarity S to the query t, and (c) iteratively selecting each successive member of the subset as that remaining item of the bk items having the highest quality Q, where Q is a given function of similarity S to the input query t and relative diversity RD, wherein relative diversity RD is a given function of the diversity of that remaining item with respect to the items selected during the previous iteration(s). - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 22, 23, 24)
- >
-
17. A data processing system for retrieving a subset of k items from a database of n items (n>
- >
K), the system comprising;
(a) a first memory area for storing the bk items (b>
1) in the database of n items which have the greatest similarity to an input query t according to a given similarity function S,(b) a second memory area for storing said subset as it is constructed from said bk item, and (c) a processor for (i) selecting as the first member of the subset that item of the bk items having the highest similarity S to the query t, and (ii) iteratively selecting each successive member of the subset as that remaining item of the bk items having the highest quality Q, where Q is a given function of similarity S to the input query t and relative diversity RD, wherein relative diversity RD is a given function of the diversity of that remaining item with respect to the items selected during the previous iteration(s). - View Dependent Claims (18, 19, 20, 21)
- >
Specification