Clickthrough-based latent semantic model
First Claim
Patent Images
1. A computer-implemented method for ranking documents, comprising:
- identifying a plurality of query-document pairs based on clickthrough data for a plurality of documents;
building a latent semantic model based on the plurality of query-document pairs, wherein the plurality of query-document pairs comprises a plurality of query-title pairs, wherein the title in each query-title pair is a title of one of the documents of the plurality of documents, and wherein building the latent semantic model comprises building a bilingual topic model, a query being considered as expressed in a first language and the document being considered as expressed in a second language, by using the plurality of query-title pairs to learn a semantic representation of a query based on a likelihood that the query is a semantics-based translation of each of the plurality of documents;
ranking the plurality of documents for a Web search based on a distance between vector representations of a query and a title of each of the plurality of documents within a semantic space, wherein a projection matrix is used to map the vector representations of the query and the title of each of the plurality of documents to the semantic space, wherein the semantic space comprises a dense, low-dimensional space; and
ranking the plurality of documents for the Web search based on the latent semantic model.
2 Assignments
0 Petitions
Accused Products
Abstract
There is provided a computer-implemented method and system for ranking documents. The method includes identifying a number of query-document pairs based on clickthrough data for a number of documents. The method also includes building a latent semantic model based on the query-document pairs and ranking the documents for a search based on the latent semantic model.
23 Citations
11 Claims
-
1. A computer-implemented method for ranking documents, comprising:
-
identifying a plurality of query-document pairs based on clickthrough data for a plurality of documents; building a latent semantic model based on the plurality of query-document pairs, wherein the plurality of query-document pairs comprises a plurality of query-title pairs, wherein the title in each query-title pair is a title of one of the documents of the plurality of documents, and wherein building the latent semantic model comprises building a bilingual topic model, a query being considered as expressed in a first language and the document being considered as expressed in a second language, by using the plurality of query-title pairs to learn a semantic representation of a query based on a likelihood that the query is a semantics-based translation of each of the plurality of documents; ranking the plurality of documents for a Web search based on a distance between vector representations of a query and a title of each of the plurality of documents within a semantic space, wherein a projection matrix is used to map the vector representations of the query and the title of each of the plurality of documents to the semantic space, wherein the semantic space comprises a dense, low-dimensional space; and ranking the plurality of documents for the Web search based on the latent semantic model. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A system for ranking documents, comprising:
-
a processor that is adapted to execute stored instructions; and a system memory, wherein the system memory comprises code configured to; identify a plurality of query-title pairs based on clickthrough data for a plurality of documents, wherein the title in each query-title pair is a title of one of the documents of the plurality of documents; build a latent semantic model, a query being considered as expressed in a first language and the document being considered as expressed in a second language, the latent semantic model being based on the plurality of query-title pairs by building a bilingual topic model by learning a semantic representation of a query based on a likelihood that the query is a semantics-based translation of each of the plurality of documents; rank the plurality of documents for a search based on a distance between vector representations of a query and a title of each of the plurality of documents within a semantic space, wherein a projection matrix is used to map the vector representations of the query and the title of each of the plurality of documents to the semantic space, wherein the semantic space comprises a dense, low-dimensional space; and rank the plurality of documents for the search based on the latent semantic model. - View Dependent Claims (8, 9, 10, 11)
-
Specification