Training a ranking function using propagated document relevance

US 8,001,121 B2
Filed: 02/27/2006
Issued: 08/16/2011
Est. Priority Date: 02/27/2006
Status: Active Grant

First Claim

Patent Images

1. A computing device with a processor and memory for training a document ranking component, comprising:

a training data store that contains training data including representations of documents and, for each query of a plurality of queries, a labeling of some of the documents with relevance of the documents to the query;

a graph component that creates a graph of the documents with the documents represented as nodes being connected by edges representing similarity between documents includinga build graph component that builds a graph in which nodes representing similar documents are connected via edges, such that each node has an edge to a number of other nodes that are most similar to it; and

a generate weights component that generates weights for the edges based on similarity of the documents represented by the connected nodes, each document being represented by a feature vector in a feature space, the similarity between two documents being calculated based on a metric derived from the feature vectors representing the two documents; and

a propagate relevance component that propagates relevance of the labeled documents to the unlabeled documents based on similarity between documents as indicated by the weights generated for the edges;

a training component that trains a document ranking component to rank relevance of documents to queries based on the propagated relevance of the documents of the training data; and

a search component that, after the document ranking component is trained, receives a query, identifies documents relating to the query, and ranks the identified documents using the document ranking component that was trained based on the propagated relevance of the documents of the training datawherein the components comprise computer-executable instructions stored in memory for execution by the processor.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and system for propagating the relevance of labeled documents to a query to unlabeled documents is provided. The propagation system provides training data that includes queries, documents labeled with their relevance to the queries, and unlabeled documents. The propagation system then calculates the similarity between pairs of documents in the training data. The propagation system then propagates the relevance of the labeled documents to similar, but unlabeled, documents. The propagation system may iteratively propagate labels of the documents until the labels converge on a solution. The training data with the propagated relevances can then be used to train a ranking function.

60 Citations

View as Search Results

18 Claims

1. A computing device with a processor and memory for training a document ranking component, comprising:
- a training data store that contains training data including representations of documents and, for each query of a plurality of queries, a labeling of some of the documents with relevance of the documents to the query;
  
  a graph component that creates a graph of the documents with the documents represented as nodes being connected by edges representing similarity between documents includinga build graph component that builds a graph in which nodes representing similar documents are connected via edges, such that each node has an edge to a number of other nodes that are most similar to it; and
  
  a generate weights component that generates weights for the edges based on similarity of the documents represented by the connected nodes, each document being represented by a feature vector in a feature space, the similarity between two documents being calculated based on a metric derived from the feature vectors representing the two documents; and
  
  a propagate relevance component that propagates relevance of the labeled documents to the unlabeled documents based on similarity between documents as indicated by the weights generated for the edges;
  
  a training component that trains a document ranking component to rank relevance of documents to queries based on the propagated relevance of the documents of the training data; and
  
  a search component that, after the document ranking component is trained, receives a query, identifies documents relating to the query, and ranks the identified documents using the document ranking component that was trained based on the propagated relevance of the documents of the training datawherein the components comprise computer-executable instructions stored in memory for execution by the processor.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The computing device of claim 1 wherein the document ranking component implements a classification algorithm selected from a group consisting of a neural network algorithm, an adaptive boosting algorithm, and a support vector machine algorithm.
  - 3. The computing device of claim 1 wherein the document ranking component implements a regression based algorithm.
  - 4. The computing device of claim 1 wherein the propagate relevance component propagates relevance separately for each query and the training component trains the document ranking component using the separately propagated relevances.
  - 5. The computing device of claim 1 wherein the build graph component establishes edges between nodes using a nearest neighbor algorithm.
  - 6. The computing device of claim 1 wherein the propagate relevance component propagates relevance using a manifold ranking based algorithm.
  - 7. The computing device of claim 1 wherein the component that generates weights for the edges calculates similarity according to the following:

8. A computer-readable storage medium containing instructions for controlling a computer system to train a document ranking component, by a method comprising:
- providing representations of documents along with a labeling of some of the documents that indicates relevance of a document to a query;
  
  creating a graph with the documents represented as nodes being connected by edges representing similarity between documents represented by the connected nodes, each document being represented by a feature vector in a feature space, the similarity between documents being calculated based on a metric derived from the feature vectors representing the documents;
  
  propagating relevance of the labeled documents to the unlabeled documents based on similarity between documents as indicated by the created graph and based on a manifold ranking based algorithm; and
  
  training a document ranking component to rank relevance of documents to queries based on the propagated relevance of the documents;
  
  wherein after the document ranking component is trained based on the propagated relevance, the document ranking component is adapted to rank documents of search results of a query.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The computer-readable storage medium of claim 8 wherein the document ranking component implements a classification algorithm selected from a group consisting of a Bayes net algorithm, an adaptive boosting algorithm, and a support vector machine algorithm.
  - 10. The computer-readable storage medium of claim 8 wherein the document ranking component implements a regression based ranking algorithm.
  - 11. The computer-readable storage medium of claim 8 wherein the propagating of the relevance propagates relevance separately for each query and the training of the document ranking component trains using the separately propagated relevance.
  - 12. The computer-readable storage medium of claim 8 wherein the creating of a graph includes:
    - building a graph in which nodes representing similar documents are connected via edges; and
      
      generating weights for the edges based on similarity of the documents represented by the connected nodes.
  - 13. The computer-readable storage medium of claim 8 wherein the metric is a Euclidean distance.
  - 14. The computer-readable storage medium of claim 8 wherein the metric is a cosine similarity metric.

15. A computing device with a processor and memory for training a document ranking component, comprising:
- a component that provides representations of documents along with a labeling of some of the documents, the labeling for a document indicating relevance of that document to a query;
  
  a component that determines the similarity between pairs of documents represented based on analysis of content of each pair of documents;
  
  a component that propagates the relevance of labels of the labeled documents to the unlabeled documents based on the determined similarity between documents such that the propagation of the relevance of a label document to an unlabeled document increases with increasing similarity between the labeled document and the unlabeled document; and
  
  a component that generates a document ranking component to rank relevance of documents to a query based on the propagated relevance of the documents so that, after the document ranking component is generated, the document ranking component is adapted to rank documents of search results of a query.
- View Dependent Claims (16, 17, 18)
- - 16. The computing device of claim 15 wherein the document ranking component implements a regression based ranking algorithm.
  - 17. The computing device of claim 15 wherein the component that propagates relevance propagates relevance based on a manifold ranking based algorithm.
  - 18. The computing device of claim 15 wherein the labeled documents represent documents that are search results of the query provided based on searching a first corpus of documents and the unlabeled documents represent documents of a second corpus of documents and including a component that adds to the search results documents of the second corpus based on their propagated relevance to the query.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Wang, Jue, Ma, Wei-Ying, Li, Mingjing, Li, Zhiwei
Primary Examiner(s)
Bashore; William L
Assistant Examiner(s)
Distefano; Gregory A

Application Number

US11/364,576
Publication Number

US 20070203908A1
Time in Patent Office

1,996 Days
Field of Search

707/7, 707/706, 707/713, 707/736, 707/737
US Class Current

707/736
CPC Class Codes

G06F 16/3331 Query processing

G06F 16/951 Indexing; Web crawling tech...

Training a ranking function using propagated document relevance

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

60 Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Training a ranking function using propagated document relevance

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

60 Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links