System and method for content selection for web page indexing
First Claim
1. A method of selecting content of a web page for indexing, the method comprising:
- transmitting, by an indexer server, a request to retrieve web page content from a web page data store, the request including a uniform resource locator (URL) of a web page, wherein the indexer server comprises one or more computer systems configured to performing indexing of web page documents within web page data stores;
receiving, by the indexer server, the web page content from the web page data store, based on the URL;
requesting, by the indexer server, interaction data associated with the URL from an interaction data store;
receiving, by the indexer server, interaction data from the interaction data store for the URL that identifies one or more attention events, each attention event based on a previous user interface event detected within the web page content of the URL, and each attention event including a location within the web page where the respective user interface event was detected;
dividing, by the indexer server, the web page content into a plurality of non-overlapping content sections;
for each particular content section of the plurality of content sections;
analyzing, by the indexer server, the locations of the one or more attention events in the received interaction data to determine whether the particular content section is associated with one or more of the attention events;
determining whether the particular content section of the web page is found within one or more additional web pages stored within the web page data store;
in response to determining that the particular content section of the web page is found within one or more additional web pages stored within the web page data store;
(a) analyzing the locations of a plurality of additional attention events associated with the one or more additional web pages, to determine whether the particular content section is associated with one or more of the plurality of additional attention events; and
(b) summing the number of attention events associated with the particular content section within the web page, and the number of additional attention events associated with the particular content section within the one or more additional web pages; and
in response to a determination that the sum of the numbers of attention events associated with the particular content section within the web page and within the additional web pages, is greater than a threshold of attention events, adding the particular content section to an indexable content of the web page; and
outputting, by the indexer server, the indexable content corresponding to the web page.
4 Assignments
0 Petitions
Accused Products
Abstract
An indexing system for documents such as web pages divides a document into elements, such as document object model elements. User attention data from prior interactions with the document are analyzed to determine those elements of a document that satisfy a threshold requirement of user attention. Elements meeting the user attention threshold requirement are added to a set of indexable content for the document. Furthermore, document sections are determined based on attention data and each section is indexed separately. Indexing is per section and based only on the indexable content, thereby enhancing the index relevance, increasing the efficiency of search engines and reducing spamdexing.
-
Citations
20 Claims
-
1. A method of selecting content of a web page for indexing, the method comprising:
-
transmitting, by an indexer server, a request to retrieve web page content from a web page data store, the request including a uniform resource locator (URL) of a web page, wherein the indexer server comprises one or more computer systems configured to performing indexing of web page documents within web page data stores; receiving, by the indexer server, the web page content from the web page data store, based on the URL; requesting, by the indexer server, interaction data associated with the URL from an interaction data store; receiving, by the indexer server, interaction data from the interaction data store for the URL that identifies one or more attention events, each attention event based on a previous user interface event detected within the web page content of the URL, and each attention event including a location within the web page where the respective user interface event was detected; dividing, by the indexer server, the web page content into a plurality of non-overlapping content sections; for each particular content section of the plurality of content sections; analyzing, by the indexer server, the locations of the one or more attention events in the received interaction data to determine whether the particular content section is associated with one or more of the attention events; determining whether the particular content section of the web page is found within one or more additional web pages stored within the web page data store; in response to determining that the particular content section of the web page is found within one or more additional web pages stored within the web page data store; (a) analyzing the locations of a plurality of additional attention events associated with the one or more additional web pages, to determine whether the particular content section is associated with one or more of the plurality of additional attention events; and (b) summing the number of attention events associated with the particular content section within the web page, and the number of additional attention events associated with the particular content section within the one or more additional web pages; and in response to a determination that the sum of the numbers of attention events associated with the particular content section within the web page and within the additional web pages, is greater than a threshold of attention events, adding the particular content section to an indexable content of the web page; and outputting, by the indexer server, the indexable content corresponding to the web page. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. An apparatus configured to select content of a web page, the apparatus comprising:
-
a transmitter configured to transmit a request to retrieve web page content from a web page data store, the request including a uniform resource locator (URL) of a web page; and a processor configured to receive the web page content based on the URL; request interaction data associated with the URL from an interaction data store; receive interaction data for the URL that identifies one or more attention events, each attention event based on a previous user interface event detected within the web page content of the URL, and each attention event including a location within the web page where the respective user interface event was detected; divide the web page content into a plurality of non-overlapping content sections; for each particular content section of the plurality of content sections; analyzing the locations of the one or more attention events in the received interaction data to determine whether the particular content section is associated with one or more of the attention events; determining whether the particular content section of the web page is found within one or more additional web pages stored within the web page data store; in response to determining that the particular content section of the web page is found within one or more additional web pages stored within the web page data store; (a) analyzing the locations of a plurality of additional attention events associated with the one or more additional web pages, to determine whether the particular content section is associated with one or more of the plurality of additional attention events; and (b) summing the number of attention events associated with the particular content section within the web page, and the number of additional attention events associated with the particular content section within the one or more additional web pages; and in response to a determination that the sum of the numbers of attention events associated with the particular content section within the web page and within the additional web pages, is greater than a threshold of attention events, add the particular content section to an indexable content of the web page; and output the indexable content corresponding to the web page. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A non-transitory computer readable storage medium storing instructions thereon, that when executed by one or more processors, cause the one or more processors to select content of a web page, by:
-
transmitting a request to retrieve web page content from a web page data store, the request including a uniform resource locator (URL) of a web page; receiving the web page content based on the URL; requesting interaction data associated with the URL from an interaction data store; receiving interaction data for the URL that identifies one or more attention events, each attention event based on a previous user interface event detected within the web page content of the URL, and each attention event including a location within the web page where the respective user interface event was detected; dividing the web page content into a plurality of non-overlapping content sections; for each particular content section of the plurality of content sections; analyzing the locations of the one or more attention events in the received interaction data to determine whether the particular content section is associated with one or more of the attention events; and determining whether the particular content section of the web page is found within one or more additional web pages stored within the web page data store; in response to determining that the particular content section of the web page is found within one or more additional web pages stored within the web page data store; (a) analyzing the locations of a plurality of additional attention events associated with the one or more additional web pages, to determine whether the particular content section is associated with one or more of the plurality of additional attention events; and (b) summing the number of attention events associated with the particular content section within the web page, and the number of additional attention events associated with the particular content section within the one or more additional web pages, and in response to a determination that the sum of the numbers of attention events associated with the particular content section within the web page and within the additional web pages, is greater than a threshold of attention events, adding the particular content section to an indexable content of the web page; and outputting the indexable content corresponding to the web page. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification