Systems and methods of retrieving relevant information
DCFirst Claim
1. A computer-implemented method of ranking the relevancy of pages in a collection of pages including linking hypertext pages, comprising:
- crawling the World Wide Web to produce a collection of pages without limitation to topic;
for each page in the collection of pages, examining a probability of visitors viewing a particular page to determine a page weight for said particular page;
for each of a plurality of selected words, with regard to each of a plurality of selected pages in the collection of pages,determining an intrinsic ranking factor for use of a selected word on a selected page in the collection of pages by examining content related to the selected word on the selected page to determine a content score and adjusting the content score in accordance with the page weight of the selected page, anddetermining an extrinsic ranking factor for use of the selected word on the selected page by, for each linking page in the collection of pages containing an outbound hypertext link to the selected page, examining text associated with the outbound hypertext link on the linking page related to the selected word to determine an anchor weight for the linking page, adjusting the anchor weight in accordance with the page weight of the linking page and combining the adjusted anchor weights for all linking pages containing an outbound hypertext link to the selected page;
ranking the selected page for the selected word by combining the intrinsic and extrinsic ranking factors related thereto; and
thencreating a database of the collection of pages indexed by the plurality of selected words, each indexed selected word in the database index associated with pages ranked for said each indexed selected word so that ranked search results are produced in response to a subsequent query which includes one or more of the selected words.
4 Assignments
Litigations
0 Petitions
Accused Products
Abstract
The present invention provides systems and methods of retrieving the pages according to the quality of the individual pages. The rank of a page for a keyword is a combination of intrinsic and extrinsic ranks. Intrinsic rank is the measure of the relevancy of a page to a given keyword as claimed by the author of the page while extrinsic rank is a measure of the relevancy of a page on a given keyword as indicated by other pages. The former is obtained from the analysis of the keyword matching in various parts of the page while the latter is obtained from the context-sensitive connectivity analysis of the links connecting the entire Web. The present invention also provides the methods to solve the self-consistent equation satisfied by the page weights iteratively in a very efficient way. The ranking mechanism for multi-word query is also described. Finally, the present invention provides a method to obtain the more relevant page weights by dividing the entire hypertext pages into distinct number of groups.
-
Citations
13 Claims
-
1. A computer-implemented method of ranking the relevancy of pages in a collection of pages including linking hypertext pages, comprising:
-
crawling the World Wide Web to produce a collection of pages without limitation to topic; for each page in the collection of pages, examining a probability of visitors viewing a particular page to determine a page weight for said particular page; for each of a plurality of selected words, with regard to each of a plurality of selected pages in the collection of pages, determining an intrinsic ranking factor for use of a selected word on a selected page in the collection of pages by examining content related to the selected word on the selected page to determine a content score and adjusting the content score in accordance with the page weight of the selected page, and determining an extrinsic ranking factor for use of the selected word on the selected page by, for each linking page in the collection of pages containing an outbound hypertext link to the selected page, examining text associated with the outbound hypertext link on the linking page related to the selected word to determine an anchor weight for the linking page, adjusting the anchor weight in accordance with the page weight of the linking page and combining the adjusted anchor weights for all linking pages containing an outbound hypertext link to the selected page; ranking the selected page for the selected word by combining the intrinsic and extrinsic ranking factors related thereto; and
thencreating a database of the collection of pages indexed by the plurality of selected words, each indexed selected word in the database index associated with pages ranked for said each indexed selected word so that ranked search results are produced in response to a subsequent query which includes one or more of the selected words. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A computer implemented method of ranking the relevancy of pages in a collection of pages including linking hypertext pages, comprising:
-
(a) crawling and re-crawling the World Wide Web without limitation to topic to produce and maintain a collection of pages representing the World Wide Web; (b) for each page of the collection of pages including linking hypertext pages, without a priori knowledge of keywords to be used in any particular query, determining a page weight related to a probability of a user viewing said each page as a result of viewing the pages in a random fashion in the collection, a plurality of selected words from said each page unrelated to topic; a content score for use of each one of the plurality of selected words on said each page, and an anchor weight, related to the use of said each of the plurality of selected words in association with an outbound link on said each page to another page in the collection; (c) for each pair of one of the plurality of selected words and a select page having a content score for said one of the plurality of selected words, ranking a relevancy of said selected page for said one of the plurality of selected words, in accordance with a combination of the content score for the selected word on the selected page adjusted in accordance with the page weighting factor for the selected page, and the anchor weight for each linking page having an outbound link to the selected page adjusted in accordance with a page weight for said each linking page; and (d) for each one of the plurality of selected words, collecting the pages ranked for said each one of the plurality of selected words to build a searchable databases, indexed in accordance with the plurality of selected words, so that a ranked set of search results pages is produced by searching the database in response to the particular query for the ranked selected words corresponding to the keywords in the query. - View Dependent Claims (7, 8, 9)
-
-
10. A computer-implemented method of ranking the relevancy of pages in a collection of pages including linking hypertext pages, comprising:
-
crawling the Web to produce a collection of pages without limitation to topic; selecting words from the pages of the collection of pages without a priori knowledge of keywords in a query; ranking the pages in the collection of pages for the selected words by, for each of the selected words with regard to each of the selected pages; determining an intrinsic ranking factor for by examining content related to the selected word on the selected page to determine a content score and adjusting the content score in accordance with the page weight of the selected page; determining an extrinsic ranking factor for use of the selected word on the selected page by, for each linking page in the collection of pages containing an outbound hypertext link to the selected page, examining text associated with the outbound hypertext link related to the selected word to determine an anchor weight for the linking page, adjusting the anchor weight in accordance with the page weight of the linking page and combining the adjusted anchor weights for all linking pages containing an outbound hypertext link to the selected page; and ranking each selected page for each selected word by combining the intrinsic and extrinsic ranking factors related thereto; and creating a searchable data structure related to the pages in the collection of pages indexed in accordance with the selected words, each indexed word associated with pages ranked for each such indexed word so that search results provided in response to the query are already ranked in accordance with relevance to the query. - View Dependent Claims (11)
-
-
12. A computer-implemented method of ranking the relevancy of pages in a collection of pages including linking hypertext pages, comprising:
-
(a) determining an a set of words used on a selected page to be ranked in a collection of pages; (b) determining an intrinsic ranking factor for the selected page for each word in the set of words by determining a content score for use of the selected word on the selected page and adjusting the ranking for a page weight associated with the page being ranked; (c) determining an extrinsic ranking factor for remaining pages in the collection of pages for use of said each word in association with an outbound link to the selected page by determining an anchor weight for the page being ranked and adjusting the ranking for a page weight associated with the page being ranked; (d) repeating (a) through (c) for all pages in the collection of pages; and (e) forming an index entry for each word in all the sets of words, each index entry including a list of the pages in the collection of pages using the word being indexed having the highest combined rankings for 1) the intrinsic ranking factor for use of the word being indexed on the page being listed and 2) the extrinsic ranking factor for use of the word being indexed in association with pages having an outbound link to the page being listed. - View Dependent Claims (13)
-
Specification