Systems and methods of retrieving relevant information
First Claim
1. A computer implemented method of searching the Internet, comprising:
- crawling the entire Internet, to collect web pages without limitation to topic;
extracting text and link structure from each collected web page;
generating an indexed database of all collected webpages in which each indexed word is associated with collected web pages on which said indexed word is used, and with other web pages having outbound links from said web pages on which said indexed word is used;
for each indexed word, ranking each web page in the indexed database associated therewith in descending order of rank;
wherein the ranking for said each web page on which said indexed word is used, is adjusted, in part, by summing rankings of each of said other collected web page having an outbound link from said each web page; and
thereafterreceiving one or more keywords in a query;
searching the ranked, indexed database for that one or more keywords to quickly provide a ranked list of pages on the Internet in descending order of ranking.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention provides systems and methods of retrieving the pages according to the quality of the individual pages. The rank of a page for a keyword is a combination of intrinsic and extrinsic ranks. Intrinsic rank is the measure of the relevancy of a page to a given keyword as claimed by the author of the page while extrinsic rank is a measure of the relevancy of a page on a given keyword as indicated by other pages. The former is obtained from the analysis of the keyword matching in various parts of the page while the latter is obtained from the context-sensitive connectivity analysis of the links connecting the entire Web. The present invention also provides the methods to solve the self-consistent equation satisfied by the page weights iteratively in a very efficient way. The ranking mechanism for multi-word query is also described. Finally, the present invention provides a method to obtain the more relevant page weights by dividing the entire hypertext pages into distinct number of groups.
46 Citations
10 Claims
-
1. A computer implemented method of searching the Internet, comprising:
-
crawling the entire Internet, to collect web pages without limitation to topic; extracting text and link structure from each collected web page; generating an indexed database of all collected webpages in which each indexed word is associated with collected web pages on which said indexed word is used, and with other web pages having outbound links from said web pages on which said indexed word is used; for each indexed word, ranking each web page in the indexed database associated therewith in descending order of rank; wherein the ranking for said each web page on which said indexed word is used, is adjusted, in part, by summing rankings of each of said other collected web page having an outbound link from said each web page; and
thereafterreceiving one or more keywords in a query; searching the ranked, indexed database for that one or more keywords to quickly provide a ranked list of pages on the Internet in descending order of ranking. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
Specification