IDENTIFYING SALIENT ITEMS IN DOCUMENTS
First Claim
1. A system comprising:
- a device that includes at least one processor, the device including a salient item identification engine comprising instructions tangibly embodied on a computer readable storage medium for execution by the at least one processor, the salient item identification engine including;
a log data acquisition component configured to obtain query data and corresponding click data that indicates web pages visited, in association with respectively corresponding user queries, based on information mined from a web search log; and
a soft labeling component configured to determine a salience annotation value of an item for respective ones of the web pages, based on determining a first count of a total number of the user queries that are associated with one or more corresponding visits to the respective ones of the web pages, and determining a ratio of a second count to the first count, the second count determined as a cardinality of a subset of the corresponding visits that are associated with a group of the user queries that include the item, the subset included in the one or more corresponding visits.
3 Assignments
0 Petitions
Accused Products
Abstract
A set of representations of item-page pairs of items and respective web pages that include the respective items is obtained, each representation including feature function values indicating weights associated with features of associated web pages, the features including page classification features. An annotated set of labeled training data that is annotated with salience annotation values of items for respective web pages that include the items is obtained. The salience annotation values are determined based on a soft function, by determining a first count of a total number of user queries associated with corresponding visits to the respective web pages, and determining a ratio of a second count to the first count, the second count determined as a cardinality of a subset of the corresponding visits that are associated with user queries that include the item, the subset included in the corresponding visits. Models are trained using the annotated set.
10 Citations
20 Claims
-
1. A system comprising:
a device that includes at least one processor, the device including a salient item identification engine comprising instructions tangibly embodied on a computer readable storage medium for execution by the at least one processor, the salient item identification engine including; a log data acquisition component configured to obtain query data and corresponding click data that indicates web pages visited, in association with respectively corresponding user queries, based on information mined from a web search log; and a soft labeling component configured to determine a salience annotation value of an item for respective ones of the web pages, based on determining a first count of a total number of the user queries that are associated with one or more corresponding visits to the respective ones of the web pages, and determining a ratio of a second count to the first count, the second count determined as a cardinality of a subset of the corresponding visits that are associated with a group of the user queries that include the item, the subset included in the one or more corresponding visits. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
12. A method comprising:
-
obtaining a first set of representations of a plurality of item-document pairs of items and respective documents that include the respective items, each of the representations including a plurality of feature function values indicating weights associated with one or more features of the associated documents, the features including one or more document classification features associated with the associated documents; initiating, via a device processor, training of one or more models based on the first set; and obtaining salience scores associated with respective ones of the items and associated ones of the documents, the salience scores indicating a measure of salience of the respective items to the respective associated documents, based on the trained one or more models. - View Dependent Claims (13, 14, 15, 16, 17)
-
-
18. A computer program product tangibly embodied on a computer-readable storage medium and including executable code that causes at least one data processing apparatus to:
-
obtain a first set of representations of a plurality of item-page pairs of items and respective web pages that include the respective items, each of the representations including a plurality of feature function values indicating weights associated with one or more features of the associated web pages, the features including one or more page classification features associated with the associated web pages; and obtain an annotated set of labeled training data that is annotated with a plurality of salience annotation values of a plurality of the respective items for respective web pages that include content that includes the respective items, the salience annotation values determined based on a soft function, based on determining a first count of a total number of user queries that are associated with one or more corresponding visits to the respective web pages, and determining a ratio of a second count to the first count, the second count determined as a cardinality of a subset of the corresponding visits that are associated with a group of the user queries that include the item, the subset included in the one or more corresponding visits; and initiate training of one or more models based on the annotated set. - View Dependent Claims (19, 20)
-
Specification