Focused search engine and method
First Claim
Patent Images
1. A method of searching a search space comprising a plurality of pages;
- said method comprising;
responsive to a request for information, determining a set of pages wherein each page in said set of pages is relevant to said request;
performing topic distillation using said set of pages to determine a set of important keywords; and
performing site distillation using said set of important keywords to determine a set of important pages.
2 Assignments
0 Petitions
Accused Products
Abstract
A focused search engine and method are directed to crawling vast search spaces comprising markup language documents, for example. Both topic distillation and site distillation methodologies are incorporated into an integrated topic-focused search strategy. Categorization of search results may be initiated by the search engine itself; alternatively, topic categories of interest may be specified in conjunction with the original request for information.
134 Citations
67 Claims
-
1. A method of searching a search space comprising a plurality of pages;
- said method comprising;
responsive to a request for information, determining a set of pages wherein each page in said set of pages is relevant to said request;
performing topic distillation using said set of pages to determine a set of important keywords; and
performing site distillation using said set of important keywords to determine a set of important pages. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- said method comprising;
-
11. A computer-based system for searching a search space, the search space comprising a plurality of pages;
- said system comprising;
means for receiving a request for information; and
a focused search engine determining a set of pages wherein each page in said set of pages is relevant to said request, performing topic distillation using said set of pages to determine a set of important keywords, and further performing site distillation using said set of important keywords to determine a set of important pages. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
- said system comprising;
-
21. A method of searching a search space comprising a plurality of pages;
- said method comprising;
responsive to a request for information comprising a plurality of keywords and a one or more topic categories, determining a set of pages within the search space wherein each page in said set of pages contains at least one of said plurality of keywords and further contains information relevant to said one or more topic categories;
performing topic distillation using said set of pages to determine a set of important keywords; and
performing site distillation using said set of important keywords. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30)
- said method comprising;
-
31. A method of processing a query of a search space;
- said query comprising a plurality of keywords and said search space comprising a plurality of pages;
said method comprising;responsive to said query, determining a set of pages within the search space wherein each page in said set of pages contains at least one of said plurality of keywords;
assigning a score to each page in said set of pages in accordance with relevance to said query; and
distributing said score of each page in said set of pages to neighboring pages in said set of pages in accordance with relevance to each of said plurality of keywords. - View Dependent Claims (32, 33, 34, 35)
- said query comprising a plurality of keywords and said search space comprising a plurality of pages;
-
36. A method of processing a query of a search space;
- said query comprising a plurality of keywords and at least one topic category associated with at least one of said plurality of keywords;
said search space comprising a plurality of pages;
said method comprising;responsive to said query, determining a set of pages within the search space wherein each page in said set of pages contains at least one of said plurality of keywords and further contains information relevant to said at least one topic category;
assigning a score to each page in said set of pages in accordance with relevance to said query; and
distributing said score of each page in said set of pages to neighboring pages in said set of pages in accordance with relevance to each of said plurality of keywords and in accordance with relevance to said at least one topic category. - View Dependent Claims (37, 38, 39)
- said query comprising a plurality of keywords and at least one topic category associated with at least one of said plurality of keywords;
-
40. A method of focused crawling of a search space comprising a plurality of pages;
- said method comprising;
responsive to a request for information, determining a set of pages within the search space wherein each page in said set of pages contains information relevant to said request;
performing site distillation using said set of pages to determine a set of important pages; and
examining said set of important pages to determine a set of keywords;
scoring each keyword in said set of keywords in accordance with relevance to said request; and
selectively repeating said performing, said examining, and said scoring. - View Dependent Claims (41, 42, 43)
- said method comprising;
-
44. A method of focused crawling of a search space comprising a plurality of pages;
- said method comprising;
responsive to a request for information, determining a set of pages within the search space wherein each page in said set of pages contains information relevant to said request;
performing site distillation using said set of pages to determine a set of important pages, including assigning a relevance score to each page in said set of important pages in accordance with relevance to said request;
examining said set of important pages to determine a set of keywords;
scoring each keyword in said set of keywords in accordance with relevance to said request; and
selectively repeating said performing, said examining, and said scoring. - View Dependent Claims (45, 46, 47, 48, 49, 50)
- said method comprising;
-
51. A method of site distillation for identifying information in a search space comprising a plurality of pages;
- said method comprising;
responsive to a request for information, determining a set of pages within the search space wherein each page in said set of pages contains information relevant to said request;
identifying one or more keywords relevant to said request;
assigning a score to each page in said set of pages in accordance with relevance to said request and in accordance with relevance to said one or more keywords;
distributing said score of each page in said set of pages to neighboring pages in said set of pages in accordance with relevance to said one or more keywords; and
responsive to said distributing, computing a weight for each page in said set of pages in accordance with relevance to said request and relevance to said one or more keywords. - View Dependent Claims (52, 53, 54, 55, 56, 57, 58)
- said method comprising;
-
59. A method of topic distillation for identifying information in a search space comprising a plurality of pages;
- said method comprising;
responsive to a request for information, determining a set of pages within the search space wherein each page in said set of pages contains information relevant to said request;
for each page in said set of pages, extracting one or more keywords relevant to said request and assigning a score to each of said one or more keywords in accordance with relevance to said request;
propagating said score of each of said one or more keywords to neighboring pages in said set of pages;
weighting each page in said set of pages in accordance with said assigning and said propagating; and
identifying a set of important keywords associated with said request. - View Dependent Claims (60, 61, 62, 63)
propagating said score of each of said one or more keywords from one of said one or more predetermined categories to another of said one or more predetermined categories; and
weighting each of said one or more predetermined categories in accordance with said assigning and said propagating.
- said method comprising;
-
61. The method according to claim 59 wherein said propagating includes applying a neighboring page decay factor.
-
62. The method according to claim 60 wherein said propagating includes applying a category decay factor.
-
63. The method according to claim 60 wherein said request for information includes identifying said one or more predetermined categories.
-
64. A method of processing a query of a search space;
- said query comprising a plurality of keywords and said search space comprising a plurality of pages;
said method comprising;responsive to said query, determining a set of pages within the search space wherein each page in said set of pages contains at least one of said plurality of keywords;
identifying one or more topic categories relevant to said query;
determining, for each of said one or more topic categories, a set of important pages in accordance with relevance to said query and in accordance with relevance to said one or more topic categories;
assigning a score to each page in said set of pages in accordance with relevance to said query; and
distributing said score of each page in said set of pages to neighboring pages in said set of pages in accordance with relevance to each of said plurality of keywords. - View Dependent Claims (65, 66, 67)
- said query comprising a plurality of keywords and said search space comprising a plurality of pages;
Specification