Retrieval systems and methods employing probabilistic cross-media relevance feedback

US 8,538,896 B2
Filed: 08/31/2010
Issued: 09/17/2013
Est. Priority Date: 08/31/2010
Status: Active Grant

First Claim

Patent Images

1. A non-transitory storage medium storing instructions executable by a digital processor to perform a method comprising:

optimizing weights of a document relevance scoring function to generate a trained document relevance scoring function ƒ

(q,d) where q denotes a query and d denotes a document, wherein the document relevance scoring function comprises a weighted combination of scoring components including at least one pseudo-relevance scoring component and at least one cross-media relevance scoring component, and the optimizing is respective to a set of training documents including at least some multimedia training documents and a set of training queries and corresponding training document relevance annotations, the optimizing comprising optimizing a distribution-matchinq objective function respective to matching between;

a distribution

View all claims

7 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In a retrieval application, a document relevance scoring function comprises a weighted combination of scoring components including at least one of a pseudo-relevance scoring component and a cross-media relevance scoring component. Weights of the document relevance scoring function are optimized to generate a trained document relevance scoring function. The optimizing is respective to a set of training documents including at least some multimedia training documents and a set of training queries and corresponding training document relevance annotations. A retrieval operation is performed for an input query respective to a database using the trained document relevance scoring function to retrieve one or more documents from the database.

Citations

22 Claims

1. A non-transitory storage medium storing instructions executable by a digital processor to perform a method comprising:
- optimizing weights of a document relevance scoring function to generate a trained document relevance scoring function ƒ
  
  (q,d) where q denotes a query and d denotes a document, wherein the document relevance scoring function comprises a weighted combination of scoring components including at least one pseudo-relevance scoring component and at least one cross-media relevance scoring component, and the optimizing is respective to a set of training documents including at least some multimedia training documents and a set of training queries and corresponding training document relevance annotations, the optimizing comprising optimizing a distribution-matchinq objective function respective to matching between;
  
  a distribution
- View Dependent Claims (2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 14)
- - 2. The non-transitory storage medium as set forth in claim 1, wherein the document relevance scoring function ƒ
    - (q,d) comprises a weighted linear combination of scoring components including at least one pseudo-relevance scoring component, at least one cross-media relevance scoring component, and at least one direct relevance scoring component.
  - 3. The non-transitory storage medium as set forth in claim 1, wherein:
    - the at least one pseudo-relevance scoring component of the document relevance scoring function ƒ
      
      (q,d) includes at least one of a pseudo-relevance textual scoring component and a pseudo-relevance image scoring component; and
      
      the at least one cross-media relevance scoring component of the document relevance scoring function ƒ
      
      (q,d) includes at least one of a cross-media relevance scoring component having a text query modality and an image feedback modality and a cross-media relevance scoring component having an image query modality and a text feedback modality.
  - 4. The method non-transitory storage medium as set forth in claim 1, wherein the optimizing comprises:
    - training a classifier employing the document relevance scoring function ƒ
      
      (q,d) to predict document relevance for an input query, the training using the set of training documents and the set of training queries and corresponding training document relevance annotations.
  - 5. The non-transitory storage medium as set forth in claim 1, wherein the optimizing comprises:
    - training a classifier employing the document relevance scoring function ƒ
      
      (q,d) to predict a set of most relevant documents of the set of training documents for an input query, the training using the set of training documents and the set of training queries and corresponding training document relevance annotations.
  - 6. The non-transitory storage medium as set forth in claim 1, wherein the optimizing comprises:
    - for each training query of the set of training queries, computing training document relevance values for the training documents using the document relevance scoring function ƒ
      
      (q,d); and
      
      scaling the computed training document relevance values using training query-dependent scaling factors.
  - 8. The non-transitory storage medium as set forth in claim 6, wherein the training query-dependent scaling factors include a linear scaling factor α
    - _qfor each training query.
  - 9. The non-transitory storage medium as set forth in claim 8, wherein the training query-dependent scaling factors further include an offset scaling factor β
    - _qfor each training query.
  - 10. The non-transitory storage medium as set forth in claim 6, wherein the trained document relevance scoring function does not include the training query-dependent scaling factors.
  - 11. The non-transitory storage medium as set forth in claim 6, wherein each of the at least one pseudo-relevance scoring component and at least one cross-media relevance scoring component comprises a weighted linear combination of an ordered list of top k most similar feedback scoring sub-components weighted by a feedback ordinal position weighting, and the optimizing further comprises:
    - optimizing at least one parameter controlling the feedback ordinal position weighting.
  - 12. The non-transitory storage medium as set forth in claim 1, wherein each of the at least one pseudo-relevance scoring component and at least one cross-media relevance scoring component comprises a weighted linear combination of an ordered list of top k most similar feedback scoring sub-components weighted by a feedback ordinal position weighting, and the optimizing further comprises:
    - optimizing at least one parameter controlling the feedback ordinal position weighting.
  - 13. The non-transitory storage medium as set forth in claim 1, wherein the document relevance scoring function ƒ
    - (q,d) comprises a weighted linear combination of scoring components including at least one pseudo-relevance scoring component and at least one cross-media relevance scoring component.
  - 14. The non-transitory storage medium as set forth in claim 1, wherein the database includes annotated images, the retrieval operation is performed for an input query image and the method further comprises:
    - constructing an annotation for the input query image based on annotations of annotated images retrieved from the database by the retrieval operation.

7. A method comprising:
- optimizing weights of a document relevance scoring function to generate a trained document relevance scoring function ƒ
  
  (q,d) where q denotes a query and d denotes a document, wherein the document relevance scoring function comprises a weighted combination of scoring components including at least one pseudo-relevance scoring component and at least one cross-media relevance scoring component, and the optimizing is respective to a set of training documents including at least some multimedia training documents and a set of training queries and corresponding training document relevance annotations, the optimizing including optimizing a distribution-matching objective function respective to matching between;
  
  a distribution p_q(d) of document relevance computed using the document relevance scoring function ƒ
  
  (q,d) for the set of training queries and the set of training documents D, anda distribution p_q*(d) of the training document relevance annotations corresponding to the set of training queries wherein p_q*(d) is uniform over a set of documents R_qthat are relevant to training query q and zero for all other training documents, the optimizing further including;
  
  for each training query of the set of training queries, computing training document relevance values for the training documents using the document relevance scoring function ƒ
  
  (q,d); and
  
  scaling the computed training document relevance values using training query-dependent scaling factors, wherein the optimizing also optimizes the training query-dependent scaling factors; and
  
  performing a retrieval operation for an input query respective to a database using the trained document relevance scoring function to retrieve one or more documents from the database;
  
  wherein the optimizing and the performing are performed by a digital processor.
- View Dependent Claims (19, 20, 21, 22)
- - 19. The method as set forth in claim 7, wherein the training query-dependent scaling factors include a linear scaling factor α
    - _qfor each training query.
  - 20. The method as set forth in claim 19, wherein the training query-dependent scaling factors further include an offset scaling factor β
    - _qfor each training query.
  - 21. The method as set forth in claim 7, wherein the trained document relevance scoring function does not include the training query-dependent scaling factors.
  - 22. The method as set forth in claim 7, wherein each of the at least one pseudo-relevance scoring component and at least one cross-media relevance scoring component comprises a weighted linear combination of an ordered list of top k most similar feedback scoring sub-components weighted by a feedback ordinal position weighting, and the optimizing further comprises:
    - optimizing at least one parameter controlling the feedback ordinal position weighting.

15. An apparatus comprising:
- a digital processor configured to train a document relevance scoring function to generate a trained document relevance scoring function ƒ
  
  (q,d) where q denotes a query and d denotes a document, wherein the document relevance scoring function comprises a weighted linear combination of scoring components including at least one pseudo-relevance scoring component and at least one cross-media relevance scoring component, the training adjusts weights of the weighted linear combination of scoring components, and the training is respective to a set of training documents including at least some multimedia training documents and a set of training queries and corresponding training document relevance annotations, wherein the digital processor is configured to train the document relevance scoring function by a process including;
  
  for each training query of the set of training queries, computing training document relevance values for the training documents using the document relevance scoring function ƒ
  
  (q,d);
  
  scaling the computed training document relevance values using training query-dependent scaling factors; and
  
  adjusting (i) weights of the weighted linear combination of scoring components and (ii) the training query-dependent scaling factors to optimize a distribution-matching objective function measuring an aggregate similarity between the computed training document relevance values and the corresponding training document relevance annotations, wherein the distribution-matching objective function is respective to matching between (1) a distribution p_q(d) of document relevance computed using the document relevance scoring function ƒ
  
  (q,d) for the set of training queries and the set of training documents D, and (2) a distribution p_q*(d) of the training document relevance annotations corresponding to the set of training queries wherein p_q*(d) is uniform over a set of documents R_qthat are relevant to training query q and zero for all other training documents.
- View Dependent Claims (16, 17, 18)
- - 16. The apparatus as set forth in claim 15, wherein:
    - the at least one pseudo-relevance scoring component of the document relevance scoring function includes at least one of a pseudo-relevance textual scoring component and a pseudo-relevance image scoring component;
      
      the at least one cross-media relevance scoring component of the document relevance scoring function includes at least one of a cross-media relevance scoring component having a text query modality and an image feedback modality and a cross-media relevance scoring component having an image query modality and a text feedback modality; and
      
      the linear combination of scoring components further includes at least one of a direct relevance text scoring component and a direct relevance image scoring component.
  - 17. The apparatus as set forth in claim 15, wherein the training query-dependent scaling factors include a linear scaling factor for each training query of the set of training queries.
  - 18. The apparatus as set forth in claim 15, wherein trained document relevance scoring function does not include the training query-dependent scaling factors, and the digital processor is further configured to perform a retrieval operation using the trained document relevance scoring function.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Xerox Corporation (Xerox Holdings Corp.)
Original Assignee
Xerox Corporation (Xerox Holdings Corp.)
Inventors
Csurka, Gabriela, Verbeek, Jakob, Mensink, Thomas
Primary Examiner(s)
Chaki, Kakali
Assistant Examiner(s)
Seck, Ababacar

Application Number

US12/872,105
Publication Number

US 20120054130A1
Time in Patent Office

1,113 Days
Field of Search

706/12
US Class Current

706/12
CPC Class Codes

G06F 16/38   Retrieval characterised by ...

G06F 16/90   Details of database functio...

G06F 17/18   for evaluating statistical ...

G06F 18/2178   based on feedback of a supe...

Retrieval systems and methods employing probabilistic cross-media relevance feedback

First Claim

7 Assignments

0 Petitions

Accused Products

Abstract

Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

Retrieval systems and methods employing probabilistic cross-media relevance feedback

First Claim

7 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links