Identifying salient items in documents
First Claim
1. A system comprising:
- a device that includes at least one processor, and a computer readable storage medium storing instructions for execution by the at least one processor, for implementing a salient item identification engine that;
obtains query data and corresponding click data that indicates web pages visited, in association with respectively corresponding user queries, based on information mined from a web search log; and
determines a salience annotation value of an item for respective ones of the web pages, based on determining a first count of a total number of the user queries that are associated with one or more corresponding visits to the respective ones of the web pages, and determining a ratio of a second count to the first count, the second count determined as a cardinality of a subset of the corresponding visits that are associated with a group of the user queries that include the item, the subset included in the one or more corresponding visits.
3 Assignments
0 Petitions
Accused Products
Abstract
A set of representations of item-page pairs of items and respective web pages that include the respective items is obtained, each representation including feature function values indicating weights associated with features of associated web pages, the features including page classification features. An annotated set of labeled training data that is annotated with salience annotation values of items for respective web pages that include the items is obtained. The salience annotation values are determined based on a soft function, by determining a first count of a total number of user queries associated with corresponding visits to the respective web pages, and determining a ratio of a second count to the first count, the second count determined as a cardinality of a subset of the corresponding visits that are associated with user queries that include the item, the subset included in the corresponding visits. Models are trained using the annotated set.
7 Citations
20 Claims
-
1. A system comprising:
a device that includes at least one processor, and a computer readable storage medium storing instructions for execution by the at least one processor, for implementing a salient item identification engine that; obtains query data and corresponding click data that indicates web pages visited, in association with respectively corresponding user queries, based on information mined from a web search log; and determines a salience annotation value of an item for respective ones of the web pages, based on determining a first count of a total number of the user queries that are associated with one or more corresponding visits to the respective ones of the web pages, and determining a ratio of a second count to the first count, the second count determined as a cardinality of a subset of the corresponding visits that are associated with a group of the user queries that include the item, the subset included in the one or more corresponding visits. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
12. A method comprising:
-
obtaining a first set of representations of a plurality of item-document pairs of items and respective documents that include the respective items, each of the representations including a plurality of feature function values indicating weights associated with one or more features of the associated documents, the features including one or more document classification features associated with the associated documents; obtaining an annotated set of labeled training data that is annotated with a plurality of salience annotation values of a plurality of respective items for respective documents that include content that includes the respective items, the salience annotation values determined based on a soft function, based on determining a first count of a total number of user queries that are associated with one or more corresponding visits to the respective documents, and determining a ratio of a second count to the first count, the second count determined as a cardinality of a subset of the corresponding visits that are associated with a group of the user queries that include the item, the subset included in the one or more corresponding visits; and initiating, via a device processor, training of one or more models based on the annotated set. - View Dependent Claims (13, 14, 15, 16, 17)
-
-
18. A computer program product comprising a hardware computer-readable storage medium storing including executable instructions that, when executed, cause at least one data processing apparatus to:
-
obtain a first set of representations of a plurality of item-page pairs of items and respective web pages that include the respective items, each of the representations including a plurality of feature function values indicating weights associated with one or more features of the associated web pages, the features including one or more page classification features associated with the associated web pages; and obtain an annotated set of labeled training data that is annotated with a plurality of salience annotation values of a plurality of the respective items for respective web pages that include content that includes the respective items, the salience annotation values determined based on a soft function, based on determining a first count of a total number of user queries that are associated with one or more corresponding visits to the respective web pages, and determining a ratio of a second count to the first count, the second count determined as a cardinality of a subset of the corresponding visits that are associated with a group of the user queries that include the item, the subset included in the one or more corresponding visits; and initiate training of one or more models based on the annotated set. - View Dependent Claims (19, 20)
-
Specification