Data processing system and method

US 20040193582A1
Filed: 01/29/2004
Published: 09/30/2004
Est. Priority Date: 07/30/2001
Status: Active Grant

First Claim

Patent Images

1. A data processing method for retrieving a subset of k items from a database of n items (n>

>

k), the method comprising;

(a) determining the bk items (b>

1) in the database of n items which have the greatest similarity to an input query t according to a given similarity function S, (b) selecting as the first member of the subset that item of the bk items having the highest similarity S to the query t, and (c) iteratively selecting each successive member of the subset as that remaining item of the bk items having the highest quality Q, where Q is a given function of similarity S to the input query t and relative diversity RD, wherein relative diversity RD is a given function of the diversity of that remaining item with respect to the items selected during the previous iteration(s).

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A data processing method and system for retrieving a subset of k items from a database of n items (n≧k) firstly determines a limited set of bk items (b>1) in the database which have the greatest similarity to an input query t according to a given similarity function S. A result subset is then constructed by including as a first member the item having the greatest similarity S to the query t, and iteratively selecting each successive member of the subset as that remaining item of the bk items having the highest quality Q, where Q is a given function of both similarity to the input query t and relative diversity RD with respect to the items already in the results subset. In this way the diversity of the results subset is greatly increased relative to a simple selection of the k most similar items to the query t, with only a modest additional increase in processing requirements.

50 Citations

View as Search Results

24 Claims

1. A data processing method for retrieving a subset of k items from a database of n items (n>
- >
  
  k), the method comprising;
  
  (a) determining the bk items (b>
  
  1) in the database of n items which have the greatest similarity to an input query t according to a given similarity function S, (b) selecting as the first member of the subset that item of the bk items having the highest similarity S to the query t, and (c) iteratively selecting each successive member of the subset as that remaining item of the bk items having the highest quality Q, where Q is a given function of similarity S to the input query t and relative diversity RD, wherein relative diversity RD is a given function of the diversity of that remaining item with respect to the items selected during the previous iteration(s).
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 22, 23, 24)
- - 2. A data processing method as claimed in claim 1, wherein said input query t and each of the database items is defined in terms of a plurality of parameters, and wherein said similarity function S comprises conducting a comparison between corresponding parameters of the query t and of the item to which the query is being compared to obtain a feature similarity measurement, and summing the feature similarity measurements to arrive at a similarity measurement between the query t and the item to which the query is being compared.
  - 3. A data processing method as claimed in claim 2, wherein said different feature similarity measurements are given different relative weightings.
  - 4. A data processing method as claimed in claim 3, wherein said similarity function S is defined between a query t and an item c, each having n features for comparison, as:
  - 5. A data processing method as claimed in claim 4, wherein said feature similarity measurement sim(t_i, c_i) is defined to return a value ranging from zero to unity.
  - 6. A data processing method as claimed in claim 1, wherein said function of relative diversity RD between the query t and the items (r₁, . . . , r_m) selected in the previous m iterations comprises summing the dissimilarity between t and each item (r₁, . . . , r_m) with dissimilarity being measured as a function of the similarity function S.
  - 7. A data processing method as claimed in claim 6, wherein said similarity function S returns a value of from zero to unity, and wherein said dissimilarity function is defined as the value of the similarity function subtracted from unity.
  - 8. A data processing method as claimed in claim 6, wherein said relative diversity function further comprises a normalisation of the summed dissimilarity measurements by division by m.
  - 9. A data processing method as claimed in claim 8, wherein said relative diversity function RelDiversity is defined as follows between a case c and the previously selected members (r₁, . . . , r_m) of the subset R:
  - 10. A data processing method as claimed in claim 1, wherein said relative diversity function RelDiversity is defined as follows between a case c and the previously selected members (r₁, . . . , r_m) of the subset R:
  - 11. A data processing method as claimed in claim 1, wherein the quality Q of an item c is defined as the product of the similarity of the target t to the item c and the relative diversity of the item c to the items previously selected.
  - 12. A data processing method as claimed in claim 1, wherein the quality Q of an item c is defined as the sum of the similarity of the target t to the item c, adjusted by a first weighting factor, and the relative diversity of the item c to the items previously selected, adjusted by as record weighting factor.
  - 13. A data processing method as claimed in claim 12, wherein the quality Q of the item c is defined asQuality(t,c,R)=α
    - *Similarity(t,c)+(1−
      
      α
      
      )*RelDiversity(c,R)
  - 14. A data processing method as claimed in claim 1, wherein the quality Q of an item c is defined as a harmonic mean of the similarity of the target t to the item c and the relative diversity of the item c to the items previously selected.
  - 15. A data processing method as claimed in claim 12, wherein the quality Q of the item c is defined as
  - 16. A data processing method as claimed in claim 1, wherein the quality Q of an item c is defined as a weighted harmonic mean of the similarity of the target t to the item c and the relative diversity of the item c to the items previously selected.
  - 22. A computer program comprising instructions which when executed on a data processing system are effective to cause the data processing system to carry out the method of claim 1.
  - 23. A computer program product in machine readable form comprising the computer program of claim 22.
  - 24. An electrical signal encoding the computer program of claim 22.

17. A data processing system for retrieving a subset of k items from a database of n items (n>
- >
  
  K), the system comprising;
  
  (a) a first memory area for storing the bk items (b>
  
  1) in the database of n items which have the greatest similarity to an input query t according to a given similarity function S, (b) a second memory area for storing said subset as it is constructed from said bk item, and (c) a processor for (i) selecting as the first member of the subset that item of the bk items having the highest similarity S to the query t, and (ii) iteratively selecting each successive member of the subset as that remaining item of the bk items having the highest quality Q, where Q is a given function of similarity S to the input query t and relative diversity RD, wherein relative diversity RD is a given function of the diversity of that remaining item with respect to the items selected during the previous iteration(s).
- View Dependent Claims (18, 19, 20, 21)
- - 18. A data processing system as claimed in claim 17, embodied as a computer running software which includes instructions to allocate said first and second memory areas and instructions to select said first member of said subset and iteratively select said successive members according to rules defining said measures of similarity S, relative diversity RD, and quality Q.
  - 19. A data processing system as claimed in claim 17, further comprising processing means for selecting said bk items from said database.
  - 20. A data processing system as claimed in claim 17 or 18, further comprising a communications link to a retrieval system for selecting said bk items from said database.
  - 21. A data processing system as claimed in claim 17, further comprising said database.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
University College Dublin
Original Assignee
University College Dublin
Inventors
Smyth, Barry Joseph

Granted Patent

US 7,188,101 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 16/2458   Special types of queries, e...

Y10S 707/99932   Access augmentation or opti...

Y10S 707/99933   Query processing, i.e. sear...

Y10S 707/99934   Query formulation, input pr...

Y10S 707/99935   Query augmenting and refini...

Y10S 707/99936   Pattern matching access

Y10S 707/99937   Sorting

Y10S 707/99938   Concurrency, e.g. lock mana...

Y10S 707/99939   Privileged access

Y10S 707/99942   Manipulating data structure...

Data processing system and method

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

50 Citations

24 Claims

Specification

Solutions

Use Cases

Quick Links

Data processing system and method

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

50 Citations

24 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links