Method and system for adapting search results to personal information needs
First Claim
1. A computer-readable storage medium containing instructions for controlling a computer system to calculate relevance of a document to a user, by a method comprising:
- providing click-through data generated when users submitted queries to a search engine and selected a document from results provided by the search engine;
identifying user, query, and document triplets from the click-through data, each triplet indicating that the user of the triplet submitted the query of the triplet and the user selected the document of the triplet from results of the query provided by the search engine;
identifying user clusters of users and query clusters of queries such that each user is in only one user cluster and each query is in only one queryreceiving from a user a query;
searching for documents to be provided as results of the received query;
for each document of the results of the received query, determining a probability that the user from whom the query was received will find the document relevant by performing a smoothing of the identified triplets to account for sparseness of the triplets and calculating the probability based on the smoothed triplets, the smoothing including;
smoothing via backoff by;
when the identified triplets include a triplet for the user, query, and document, setting a first probability based on a discounted count of the number of identified triplets for the user, query, and document and the number of triplets for the user and query; and
when the identified triplets do not include a triplet for the user, query, and document, setting the first probability based on the number of identified triplets for the query and the document and the number of identified triplets for the document and based on a normalization constant;
when the identified triplets include a triplet for the query and document, smoothing via clustering by setting a second probability based on a probability that a user in the user cluster that includes the user from whom the query was received selects the document from the query; and
when the identified triplets do not include a triplet for the query and document, smoothing via content similarity by;
identifying the query cluster to which the query is most similar; and
setting the second probability based on a probability that a user selects the document from a query that is in the query cluster; and
combining the first probability and the second probability into an overall probability of the document; and
displaying an indication of the documents to the user from whom the query was received in an order based on the combined overall probabilities of the documents.
2 Assignments
0 Petitions
Accused Products
Abstract
A method and system for adapting search results of a query to the information needs of the user submitting the query is provided. A search system analyzes click-through triplets indicating that a user submitted a query and that the user selected a document from the results of the query. To overcome the large size and sparseness of the click-through data, the search system when presented with an input triplet comprising a user, a query, and a document determines a probability that the user will find the input document important by smoothing the click-through triplets. The search system then orders documents of the result based on the probability of their importance to the input user.
51 Citations
5 Claims
-
1. A computer-readable storage medium containing instructions for controlling a computer system to calculate relevance of a document to a user, by a method comprising:
-
providing click-through data generated when users submitted queries to a search engine and selected a document from results provided by the search engine; identifying user, query, and document triplets from the click-through data, each triplet indicating that the user of the triplet submitted the query of the triplet and the user selected the document of the triplet from results of the query provided by the search engine; identifying user clusters of users and query clusters of queries such that each user is in only one user cluster and each query is in only one query receiving from a user a query; searching for documents to be provided as results of the received query; for each document of the results of the received query, determining a probability that the user from whom the query was received will find the document relevant by performing a smoothing of the identified triplets to account for sparseness of the triplets and calculating the probability based on the smoothed triplets, the smoothing including; smoothing via backoff by; when the identified triplets include a triplet for the user, query, and document, setting a first probability based on a discounted count of the number of identified triplets for the user, query, and document and the number of triplets for the user and query; and when the identified triplets do not include a triplet for the user, query, and document, setting the first probability based on the number of identified triplets for the query and the document and the number of identified triplets for the document and based on a normalization constant; when the identified triplets include a triplet for the query and document, smoothing via clustering by setting a second probability based on a probability that a user in the user cluster that includes the user from whom the query was received selects the document from the query; and when the identified triplets do not include a triplet for the query and document, smoothing via content similarity by; identifying the query cluster to which the query is most similar; and setting the second probability based on a probability that a user selects the document from a query that is in the query cluster; and combining the first probability and the second probability into an overall probability of the document; and displaying an indication of the documents to the user from whom the query was received in an order based on the combined overall probabilities of the documents. - View Dependent Claims (2, 3, 4)
-
-
5. A computing device with a processor and memory for calculating relevance of a document, comprising:
-
a click-through data store; a component that identifies user, query, and document triplets from the click through data; a component that identifies user clusters of users and document clusters of documents such that each user is in only one user cluster and each document is in only one document cluster; a component that receives an input user, an input query, and input documents, the input documents representing results of the input query submitted by the input user; and for each input document, determining a probability that the input user will find the input document relevant by performing a smoothing by performing when the same input user, input query, and input document triplet was identified in the click-through data, a first backoff smoothing by setting a first probability that is a discounted probability of when the input user submits the input query, the input user selects the input document as indicated by the identified triplets; when only the same input user and input query were identified in a triplet of the click-through data, a second backoff smoothing by setting the first probability that is a normalized probability of when a user submits the input query, that user selects the input document as indicated by the identified triplets; when both the input query and input document were identified in a triplet of the click-through data, a clustering smoothing by setting a second probability based on a probability that a user in the user cluster that includes the input user selects the input document from the input query as indicated by the identified triplets; and when both the input query and input document were not identified in a triplet of the click-through data, a content similarity smoothing by; identifying a document cluster to which the input document is most similar; and setting the second probability based on a probability that a user selects a document of the document cluster from the input query as indicated by the identified triplets; and combining the first probability and the second into an overall probability of the input document to account for sparseness of the identified triplets.
-
Specification