System and method for indexing web content using click-through features
First Claim
1. A method for indexing content items based on click-through features, the method comprising:
- generating a training set comprising one or more query-content item pairs, wherein a given query-content item pair has one or more click-through features associated therewith, the one or more click-through features including two or more of an average amount of time users stay on a website associated with a given URL, a spam score of the given URL, expected clicks at a position of the given URL in a given search results page and frequency of a query in a query log;
labeling one or more query-content item pairs in the training set by assigning click score thereto based on the one or more click-through features thereof, wherein labeling a given content item in the training set comprises providing a given query-content item pair to a human judge to assign a click score;
determining a click score function using a loss function based on the click scores of the labeled query-content item pairs and the click-through features thereof;
applying the click score function to a plurality of unlabeled query-content item pairs to determine click scores thereof based on the one or more click-through features of the unlabeled query-content item pairs;
generating an inverted click-through index of the unlabeled query-content item pairs and the associated query-score pairs, wherein a key to the index is a URL of the content item; and
combining the inverted click-through index with a content index by associating the unlabeled query-content item pairs with content items in the content index.
9 Assignments
0 Petitions
Accused Products
Abstract
System and method for the determination of the relevance of a content item to a query through the use of a machine learned relevance function that incorporates click-through features of the content items. A method for selecting a relevance function to determine a relevance of a query-content item pair comprises generating training set having one or more query-URL pairs labeled for relevance based on their click-through features. The labeled query-URL pairs are used to determine the relevance function by minimizing a loss function that accounts for click-through features of the content item. The computed relevance function is then applied to the click-through features of unlabeled content items to assign relevance scores thereto. An inverted click-through index of query-score pairs is formed and combined with the content index to improve relevance of search results.
63 Citations
11 Claims
-
1. A method for indexing content items based on click-through features, the method comprising:
-
generating a training set comprising one or more query-content item pairs, wherein a given query-content item pair has one or more click-through features associated therewith, the one or more click-through features including two or more of an average amount of time users stay on a website associated with a given URL, a spam score of the given URL, expected clicks at a position of the given URL in a given search results page and frequency of a query in a query log; labeling one or more query-content item pairs in the training set by assigning click score thereto based on the one or more click-through features thereof, wherein labeling a given content item in the training set comprises providing a given query-content item pair to a human judge to assign a click score; determining a click score function using a loss function based on the click scores of the labeled query-content item pairs and the click-through features thereof; applying the click score function to a plurality of unlabeled query-content item pairs to determine click scores thereof based on the one or more click-through features of the unlabeled query-content item pairs; generating an inverted click-through index of the unlabeled query-content item pairs and the associated query-score pairs, wherein a key to the index is a URL of the content item; and combining the inverted click-through index with a content index by associating the unlabeled query-content item pairs with content items in the content index. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A system of one or more processing devices for indexing and searching content items based on its one or more click-through features, the system comprising:
-
an index component, executed on the one or more processing devices, operative to determine a click score function using a loss function based on a training set of labeled query-content item pairs and the click-through features thereof, the one or more click-through features including two or more of an average amount of time users stay on a website associated with a given URL, a spam score of the given URL, expected clicks at a position of the given URL in a given search results page and frequency of a query in a query log, wherein the training set of labeled query-content items comprises a plurality of query-content items having click scores assigned thereto by a human judge, assign click scores to a plurality of unlabeled query-content item pairs through application of the click score function to the one or more click-through features, generate an inverted click-through index of the unlabeled content items and the associated query-score pairs and combine the inverted click-through index with a content index by associating the unlabeled query-content item pairs with content items in the content index; a relevance engine, executed on the one or more processing devices, operative to receive one or more query scores for one or more content items and generate one or more relevance scores therefore; and a search engine, executed on the one or more processing devices, operative to retrieve one or more content items in a result set in response to receipt of the query from the user and order the content items in the result set according to the relevance scores from the relevance engine. - View Dependent Claims (8, 9, 10, 11)
-
Specification