Training a ranking function using propagated document relevance
First Claim
1. A computing device with a processor and memory for training a document ranking component, comprising:
- a training data store that contains training data including representations of documents and, for each query of a plurality of queries, a labeling of some of the documents with relevance of the documents to the query;
a graph component that creates a graph of the documents with the documents represented as nodes being connected by edges representing similarity between documents includinga build graph component that builds a graph in which nodes representing similar documents are connected via edges, such that each node has an edge to a number of other nodes that are most similar to it; and
a generate weights component that generates weights for the edges based on similarity of the documents represented by the connected nodes, each document being represented by a feature vector in a feature space, the similarity between two documents being calculated based on a metric derived from the feature vectors representing the two documents; and
a propagate relevance component that propagates relevance of the labeled documents to the unlabeled documents based on similarity between documents as indicated by the weights generated for the edges;
a training component that trains a document ranking component to rank relevance of documents to queries based on the propagated relevance of the documents of the training data; and
a search component that, after the document ranking component is trained, receives a query, identifies documents relating to the query, and ranks the identified documents using the document ranking component that was trained based on the propagated relevance of the documents of the training datawherein the components comprise computer-executable instructions stored in memory for execution by the processor.
2 Assignments
0 Petitions
Accused Products
Abstract
A method and system for propagating the relevance of labeled documents to a query to unlabeled documents is provided. The propagation system provides training data that includes queries, documents labeled with their relevance to the queries, and unlabeled documents. The propagation system then calculates the similarity between pairs of documents in the training data. The propagation system then propagates the relevance of the labeled documents to similar, but unlabeled, documents. The propagation system may iteratively propagate labels of the documents until the labels converge on a solution. The training data with the propagated relevances can then be used to train a ranking function.
60 Citations
18 Claims
-
1. A computing device with a processor and memory for training a document ranking component, comprising:
-
a training data store that contains training data including representations of documents and, for each query of a plurality of queries, a labeling of some of the documents with relevance of the documents to the query; a graph component that creates a graph of the documents with the documents represented as nodes being connected by edges representing similarity between documents including a build graph component that builds a graph in which nodes representing similar documents are connected via edges, such that each node has an edge to a number of other nodes that are most similar to it; and a generate weights component that generates weights for the edges based on similarity of the documents represented by the connected nodes, each document being represented by a feature vector in a feature space, the similarity between two documents being calculated based on a metric derived from the feature vectors representing the two documents; and a propagate relevance component that propagates relevance of the labeled documents to the unlabeled documents based on similarity between documents as indicated by the weights generated for the edges; a training component that trains a document ranking component to rank relevance of documents to queries based on the propagated relevance of the documents of the training data; and a search component that, after the document ranking component is trained, receives a query, identifies documents relating to the query, and ranks the identified documents using the document ranking component that was trained based on the propagated relevance of the documents of the training data wherein the components comprise computer-executable instructions stored in memory for execution by the processor. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer-readable storage medium containing instructions for controlling a computer system to train a document ranking component, by a method comprising:
-
providing representations of documents along with a labeling of some of the documents that indicates relevance of a document to a query; creating a graph with the documents represented as nodes being connected by edges representing similarity between documents represented by the connected nodes, each document being represented by a feature vector in a feature space, the similarity between documents being calculated based on a metric derived from the feature vectors representing the documents; propagating relevance of the labeled documents to the unlabeled documents based on similarity between documents as indicated by the created graph and based on a manifold ranking based algorithm; and training a document ranking component to rank relevance of documents to queries based on the propagated relevance of the documents; wherein after the document ranking component is trained based on the propagated relevance, the document ranking component is adapted to rank documents of search results of a query. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A computing device with a processor and memory for training a document ranking component, comprising:
-
a component that provides representations of documents along with a labeling of some of the documents, the labeling for a document indicating relevance of that document to a query; a component that determines the similarity between pairs of documents represented based on analysis of content of each pair of documents; a component that propagates the relevance of labels of the labeled documents to the unlabeled documents based on the determined similarity between documents such that the propagation of the relevance of a label document to an unlabeled document increases with increasing similarity between the labeled document and the unlabeled document; and a component that generates a document ranking component to rank relevance of documents to a query based on the propagated relevance of the documents so that, after the document ranking component is generated, the document ranking component is adapted to rank documents of search results of a query. - View Dependent Claims (16, 17, 18)
-
Specification