Assigning human-understandable labels to web pages
First Claim
Patent Images
1. A computer-implemented method of labeling a web page from a host, comprising:
- a. estimating a language model comprising an association of words from content of the web page;
b. collecting, from a set of web documents linking to the web page, a set of inbound labels for the web page, the set of inbound labels comprises data from an anchor text of a link to the web page on at least one web document of the set of web documents and text of a search query that results in a click-through to the web page;
c. computing a likelihood of generating each inbound label from the collected set of inbound labels for the web page given the estimated language model and assigning a score to each inbound label based on the computed likelihood; and
d. assigning a label to the web page used on a search results page that returns the web page based on the assigned score to each inbound label from the collected set of inbound labels for the web page.
9 Assignments
0 Petitions
Accused Products
Abstract
Methods and systems that label a web page by collecting a set of inbound labels for the web page, estimating a language model for the web page, computing the likelihood of generating each inbound label given the language model and assigning a score to each inbound label based on this likelihood, and assigning a label to the web page based on the score assigned to each of the set of inbound labels. Inbound labels are preferably collected from the set of web documents linking to the web page. Labels assigned are useful in providing labeled links to web pages from top hosts in search results pages.
-
Citations
19 Claims
-
1. A computer-implemented method of labeling a web page from a host, comprising:
-
a. estimating a language model comprising an association of words from content of the web page; b. collecting, from a set of web documents linking to the web page, a set of inbound labels for the web page, the set of inbound labels comprises data from an anchor text of a link to the web page on at least one web document of the set of web documents and text of a search query that results in a click-through to the web page; c. computing a likelihood of generating each inbound label from the collected set of inbound labels for the web page given the estimated language model and assigning a score to each inbound label based on the computed likelihood; and d. assigning a label to the web page used on a search results page that returns the web page based on the assigned score to each inbound label from the collected set of inbound labels for the web page. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. An offline processing module, comprising at least one processor and memory, for labeling a web page from a host, comprising:
-
a. a label collection element configured to collect, from a set of web documents linking to the web page, a set of inbound labels for the web page, the set of inbound labels comprises data from of an anchor text of a link to the web page on at least one web document of the set of web documents and text of a search query that results in a click-through to the web page; b. a language model estimator configured to estimate a language model comprising an association of words from content of the web page; c. a computation element configured to compute a likelihood of generating each inbound label from the collected set of inbound labels for the web page given the estimated language model and to assign a score to each inbound label based on the computed likelihood; and d. a label assignment element configured to assign a label to the web page from the collected set of inbound labels for the web page based on the assigned score to each inbound label, the assigned label used on a search results page that returns the web page. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A system, comprising at least one processor and memory, for providing labeled links to top web pages from a top host in a search results page, comprising:
-
a. an offline processing module configured to provide labels for the top web pages from the top host by collecting a set of inbound labels for the top web pages, estimating a language model comprising an association of words from content of each top web page, and assigning a label to each top web page based on a computation involving the estimated language model, the collected set of inbound labels for the top web pages comprises data from an anchor text of a link to each top web page on at least one second top web page and text of a search query that results in a click-through to each top web page; b. an element configured to select a set of top web pages from the top host; and c. an online module that publishes the provided labels from the collected set of inbound labels for the top web pages of each of the top web pages of the top host in the search results page for the top host. - View Dependent Claims (14, 15, 16, 17, 18, 19)
-
Specification