Propagating relevance from labeled documents to unlabeled documents
First Claim
1. A computing device with a processor and memory for propagating relevance of labeled documents to unlabeled documents, comprising:
- a document store that contains representations of documents, some of the documents being labeled with relevance to a query and others of the documents not being labeled with relevance to the query;
a graph component that creates a graph of the documents with the documents represented as nodes being connected by edges representing similarity between documents having content, includinga document similarity component that determines the similarity between two documents by retrieving the content of the two documents and generating a similarity metric based on the retrieved content of the two documents to indicate similarity between the retrieved content of the two documents;
a build graph component that builds a graph in which nodes representing similar documents are connected via edges, such that each node has an edge to a number of other nodes that are most similar to it;
a generate weights component that generates weights for the edges based on similarity of the documents represented by the connected nodes such that the weight for an edge between documents whose content are similar is higher than the weight for an edge between documents whose content are not as similar; and
a normalize weights component that normalizes the weights of the graph so that weights are relative to the weights of the connected documents;
a propagate relevance component that propagates relevance of the labeled documents to the unlabeled documents based on the similarity between documents as indicated by the weights of the edges of the graph connecting the documents such that an unlabeled document that is similar to labeled documents as represented by the weights of the edges of the graph has its relevance set based on the relevance of those similar labeled documents, the propagation of relevance for a first document includes determining the relevance for the first document based on a summation of weighted relevances of the documents wherein the weighted relevance of a second document to the first document is based on the weight of the edge of the graph connecting the second document to the first document and the relevance of the second document; and
a training component that trains a ranking function to generate relevance of a document to a query based on the documents, the query, and the labeled and propagated relevanceswherein the components comprise computer-executable instructions for execution by the processor.
2 Assignments
0 Petitions
Accused Products
Abstract
A method and system for propagating the relevance of labeled documents to a query to unlabeled documents is provided. The propagation system provides training data that includes queries, documents labeled with their relevance to the queries, and unlabeled documents. The propagation system then calculates the similarity between pairs of documents in the training data. The propagation system then propagates the relevance of the labeled documents to similar, but unlabeled, documents. The propagation system may iteratively propagate labels of the documents until the labels converge on a solution. The training data with the propagated relevances can then be used to train a ranking function.
-
Citations
20 Claims
-
1. A computing device with a processor and memory for propagating relevance of labeled documents to unlabeled documents, comprising:
-
a document store that contains representations of documents, some of the documents being labeled with relevance to a query and others of the documents not being labeled with relevance to the query; a graph component that creates a graph of the documents with the documents represented as nodes being connected by edges representing similarity between documents having content, including a document similarity component that determines the similarity between two documents by retrieving the content of the two documents and generating a similarity metric based on the retrieved content of the two documents to indicate similarity between the retrieved content of the two documents; a build graph component that builds a graph in which nodes representing similar documents are connected via edges, such that each node has an edge to a number of other nodes that are most similar to it; a generate weights component that generates weights for the edges based on similarity of the documents represented by the connected nodes such that the weight for an edge between documents whose content are similar is higher than the weight for an edge between documents whose content are not as similar; and a normalize weights component that normalizes the weights of the graph so that weights are relative to the weights of the connected documents; a propagate relevance component that propagates relevance of the labeled documents to the unlabeled documents based on the similarity between documents as indicated by the weights of the edges of the graph connecting the documents such that an unlabeled document that is similar to labeled documents as represented by the weights of the edges of the graph has its relevance set based on the relevance of those similar labeled documents, the propagation of relevance for a first document includes determining the relevance for the first document based on a summation of weighted relevances of the documents wherein the weighted relevance of a second document to the first document is based on the weight of the edge of the graph connecting the second document to the first document and the relevance of the second document; and a training component that trains a ranking function to generate relevance of a document to a query based on the documents, the query, and the labeled and propagated relevances wherein the components comprise computer-executable instructions for execution by the processor. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computing device with a processor and memory for propagating relevance of labeled documents to unlabeled documents, comprising:
-
a document store that contains representations of documents, some of the documents being labeled with relevance to a query and others of the documents not being labeled with relevance to the query; a graph component that creates a graph of the documents with the documents represented as nodes being connected by edges representing similarity between documents having content, including a document similarity component that determines the similarity between two documents by retrieving the content of the two documents and generating a similarity metric based on the retrieved content of the two documents to indicate similarity between the retrieved content of the two documents; a build graph component that builds a graph in which nodes representing similar documents are connected via edges, such that each node has an edge to a number of other nodes that are most similar to it; a generate weights component that generates weights for the edges based on similarity of the documents represented by the connected nodes such that the weight for an edge between documents whose content are similar is higher than the weight for an edge between documents whose content are not as similar; and a normalize weights component that normalizes the weights of the graph so that weights are relative to the weights of the connected documents; a propagate relevance component that propagates relevance of the labeled documents to the unlabeled documents based on the similarity between documents as indicated by the weights of the edges of the graph connecting the documents such that an unlabeled document that is similar to labeled documents as represented by the weights of the edges of the graph has its relevance set based on the relevance of those similar labeled documents; and a training component that trains a ranking function to generate relevance of a document to a query based on the documents, the query, and the labeled and propagated relevances wherein the components comprise computer-executable instructions for execution by the processor and wherein the propagate relevance component propagates relevance according to the following equation;
ƒ
*=(1−
α
)(I−
α
S)−
1ywhere ƒ
* represents a propagated relevance vector, S is a similarity matrix, y represents an initial relevance vector, and α
represents a decay rate.
-
-
11. A computing device with a processor and memory for propagating relevance of labeled documents to unlabeled documents, comprising:
-
a document store that contains representations of documents, some of the documents being labeled with relevance to a query and others of the documents not being labeled with relevance to the query; a graph component that creates a graph of the documents with the documents represented as nodes being connected by edges representing similarity between documents having content, including a document similarity component that determines the similarity between two documents by retrieving the content of the two documents and generating a similarity metric based on the retrieved content of the two documents to indicate similarity between the retrieved content of the two documents; a build graph component that builds a graph in which nodes representing similar documents are connected via edges, such that each node has an edge to a number of other nodes that are most similar to it; a generate weights component that generates weights for the edges based on similarity of the documents represented by the connected nodes such that the weight for an edge between documents whose content are similar is higher than the weight for an edge between documents whose content are not as similar; and a normalize weights component that normalizes the weights of the graph so that weights are relative to the weights of the connected documents; a propagate relevance component that propagates relevance of the labeled documents to the unlabeled documents based on the similarity between documents as indicated by the weights of the edges of the graph connecting the documents such that an unlabeled document that is similar to labeled documents as represented by the weights of the edges of the graph has its relevance set based on the relevance of those similar labeled documents; and a training component that trains a ranking function to generate relevance of a document to a query based on the documents, the query, and the labeled and propagated relevances wherein the components comprise computer-executable instructions for execution by the processor and wherein the propagate relevance component propagates relevance according to the following equation;
ƒ
*=(I+α
S+α
2S2+ . . . +α
NSN)ywhere ƒ
* represents a propagated relevance vector, S is a similarity matrix, y represents an initial relevance vector, and α
represents a decay rate, and where n represents an exponent for which ƒ
* converges on a solution.
-
-
12. A computing device with a processor and memory for propagating relevance of labeled pages to a query to unlabeled pages to the query, comprising:
-
a page store that contains representations of pages, some of the pages being labeled with relevance to a query and others of the pages not being labeled with relevance to the query; a graph component that creates a graph of the pages with the pages represented as nodes connected by edges representing similarity between the pages having content, including; a similarity component that determines the similarity between two pages by retrieving the content of the two pages and generating a similarity metric based on the retrieved content of the two pages to indicate similarity between the retrieved content of the two pages; a build graph component that builds a graph in which nodes representing similar pages are connected via edges; and a generate weights component that generates weights for the edges based on similarity of the pages represented by the connected nodes such that the weight for an edge between pages whose content are similar is higher than the weight for an edge between pages whose content are not as similar; and a propagate relevance component that propagates relevance of the labeled pages to the unlabeled pages based on the similarity between pages as indicated by the weights of the edges of the graph connecting the pages and based on a manifold ranking algorithm such that an unlabeled page that is similar to labeled pages as represented by the edges of the graph has its relevance set based on the relevance of those similar labeled pages, the propagation of relevance for a first page includes determining the relevance for the first page based on a summation of weighted relevances of the pages wherein the weighted relevance of a second page to the first page is based on the weight of the edge of the graph connecting the second page to the first page and the relevance of the second page. - View Dependent Claims (13, 14, 15, 16, 17)
-
-
18. A computer-readable storage medium containing instructions for controlling a computer system to propagate relevance of documents to a query to other documents, by a method comprising:
-
for each pair of documents, determining similarity between a first document of the pair and a second document of the pair, the determining including retrieving first content of the first document and second content of second document and generating a similarity metric based on comparison of the first content and the second content to indicate similarity between the first document and the second document; creating a graph of the documents represented as nodes connected by edges having weights representing similarity between documents, some documents being labeled indicating their relevance to the query and other documents being unlabeled, the weight for an edge between documents whose content are similar is higher than the weight for an edge between document whose content are not as similar; propagating the relevance of the labeled documents to the unlabeled documents based on the weights of the edges between nodes using a manifold ranking based algorithm such that an unlabeled document that is similar to labeled documents as represented by the edges of the graph has its relevance set based on the relevance of those similar labeled documents, the propagation of relevance for a first document includes determining the relevance for the first document based on a summation of weighted relevances of the documents wherein the weighted relevance of a second document to the first document is based on the weight of the edge of the graph connecting the second document to the first document and the relevance of the second document; and training a ranking function to generate relevance of a document to a query base on the documents, the query, and the labeled relevance and the propagated relevance. - View Dependent Claims (19)
-
-
20. A computer-readable storage medium containing instructions for controlling a computer system to propagate relevance of documents to a query to other documents, by a method comprising:
-
for each pair of documents, determining similarity between a first document of the pair and a second document of the pair, the determining including retrieving first content of the first document and second content of second document and generating a similarity metric based on comparison of the first content and the second content to indicate similarity between the first document and the second document; creating a graph of the documents represented as nodes connected by edges having weights representing similarity between documents, some documents being labeled indicating their relevance to the query and other documents being unlabeled, the weight for an edge between documents whose content are similar is higher than the weight for an edge between documents whose content are not as similar; propagating the relevance of the labeled documents to the unlabeled documents based on the weights of the edges between nodes using a manifold ranking based algorithm such that an unlabeled document that is similar to labeled documents as represented by the edges of the graph has its relevance set based on the relevance of those similar labeled documents; and training a ranking function to generate relevance of a document to a query base on the documents, the query, and the labeled relevance and the propagated relevance wherein the propagating of the relevance of the labeled documents includes using a Taylor expansion to iteratively solve the following equation;
ƒ
*=(1−
α
)(I−
α
S)−
1ywhere ƒ
* represents a propagated relevance vector, S is a similarity matrix, y represents an initial relevance vector, and a represents α
decay rate.
-
Specification