GLOBAL AND TOPICAL RANKING OF SEARCH RESULTS USING USER CLICKS
First Claim
Patent Images
1. A method comprising:
- training, by at least one processor, a relevance prediction model using data for a plurality of queries, the data for a query comprising information identifying the query and documents of a result set retrieved using the query, the data further comprising user click information identifying each user click and corresponding document in the result set and a time of the user click, the training comprising;
determining a plurality of feature vector sets corresponding to the plurality of queries, a feature vector set for a query comprising a feature vector for each document in the result set of the query, the feature vector identifying a plurality of features and a corresponding plurality of feature values, the plurality of features for a document comprising at least one feature that relates the document to at least one other document in the result set of the query using the user click information to determine whether or not a user click sequence involving the document and the at least one other document exists;
determining a plurality of label sets corresponding to the plurality of queries, a label set for a query comprising a label for each document in the result set of the query, the label comprising an assessment of the document'"'"'s relevance to the query;
generating the relevance prediction model using the feature vector and label sets; and
obtaining, by the at least one processor and using the generated relevance prediction model, ranking predictions for documents in a result set of a query.
3 Assignments
0 Petitions
Accused Products
Abstract
To estimate, or predict, the relevance of items, or documents, in a set of search results, relevance information is extracted from user click data, and relational information among the documents as manifested by an aggregation of user clicks is determined from the click data. A supervised approach uses judgment information, such as human judgment information, as part of the training data used to generate a relevance predictor model, which minimizes the inherent noisiness of the click data collected from a commercial search engine.
60 Citations
48 Claims
-
1. A method comprising:
-
training, by at least one processor, a relevance prediction model using data for a plurality of queries, the data for a query comprising information identifying the query and documents of a result set retrieved using the query, the data further comprising user click information identifying each user click and corresponding document in the result set and a time of the user click, the training comprising; determining a plurality of feature vector sets corresponding to the plurality of queries, a feature vector set for a query comprising a feature vector for each document in the result set of the query, the feature vector identifying a plurality of features and a corresponding plurality of feature values, the plurality of features for a document comprising at least one feature that relates the document to at least one other document in the result set of the query using the user click information to determine whether or not a user click sequence involving the document and the at least one other document exists; determining a plurality of label sets corresponding to the plurality of queries, a label set for a query comprising a label for each document in the result set of the query, the label comprising an assessment of the document'"'"'s relevance to the query; generating the relevance prediction model using the feature vector and label sets; and obtaining, by the at least one processor and using the generated relevance prediction model, ranking predictions for documents in a result set of a query. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A system comprising:
-
at least one server, the at least one server comprising; a training data generator that uses data for a plurality of queries to determine a plurality of feature vector sets and a plurality of label sets corresponding to the plurality of queries, the data for a query comprising information identifying the query and documents of a result set retrieved using the query, the data further comprising user click information identifying each user click and corresponding document in the result set and a time of the user click, a feature vector set for a query comprising a feature vector for each document in the result set of the query, the feature vector identifying a plurality of features and a corresponding plurality of feature values, the plurality of features for a document comprising at least one feature that relates the document to at least one other document in the result set of the query using the user click information to determine whether or not a user click sequence involving the document and the at least one other document exists, and a label set for a query comprising a label for each document in the result set of the query, the label comprising an assessment of the document'"'"'s relevance to the query; a relevance predictor model generator that generates a relevance prediction model using the plurality of feature vector and label sets; a relevance predictor that obtains, using the generated relevance prediction model, ranking predictions for documents in a result set of a query. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32)
-
-
33. A computer-readable medium tangibly storing thereon computer-executable process steps, the process steps comprising:
-
training a relevance prediction model using data for a plurality of queries, the data for a query comprising information identifying the query and documents of a result set retrieved using the query, the data further comprising user click information identifying each user click and corresponding document in the result set and a time of the user click, the training comprising; determining a plurality of feature vector sets corresponding to the plurality of queries, a feature vector set for a query comprising a feature vector for each document in the result set of the query, the feature vector identifying a plurality of features and a corresponding plurality of feature values, the plurality of features for a document comprising at least one feature that relates the document to at least one other document in the result set of the query using the user click information to determine whether or not a user click sequence involving the document and the at least one other document exists; determining a plurality of label sets corresponding to the plurality of queries, a label set for a query comprising a label for each document in the result set of the query, the label comprising an assessment of the document'"'"'s relevance to the query; generating the relevance prediction model using the feature vector and label sets; and obtaining, using the generated relevance prediction model, ranking predictions for documents in a result set of a query. - View Dependent Claims (34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48)
-
Specification