URL AND ANCHOR TEXT ANALYSIS FOR FOCUSED CRAWLING
First Claim
Patent Images
1. A method of Uniform Resource Locator (URL) and anchor text analysis for focused crawling, comprising:
- training a focused crawler by;
obtaining a training set for a website;
computing a score for the training set of at least URL'"'"'s or anchor text;
extracting a plurality of features of the training set, the features identifying key information contained in the website; and
computing a score for each of the plurality of features; and
executing a trained focused crawler on other websites.
3 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods of URL and anchor text analysis for focused crawling are disclosed. In an exemplary embodiment, a method may include training a focused crawler by: obtaining a training set of at least URL'"'"'s or anchor text for a website, computing a score for the training set, and extracting a plurality of features of the training set, and computing a score for each of the plurality of features. The features identify key information contained in the website. The method may also include executing a trained focused crawler on other websites.
-
Citations
20 Claims
-
1. A method of Uniform Resource Locator (URL) and anchor text analysis for focused crawling, comprising:
-
training a focused crawler by; obtaining a training set for a website; computing a score for the training set of at least URL'"'"'s or anchor text; extracting a plurality of features of the training set, the features identifying key information contained in the website; and computing a score for each of the plurality of features; and executing a trained focused crawler on other websites. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A system comprising:
-
a training module operating to obtain a training set for a website, compute a score for the training set, and extract a plurality of features of the training set, the features identifying key information contained in the website; and an execution module operating to compute a score for each of the plurality of features, and crawl other websites. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19)
-
-
20. A system for focused crawling using Uniform Resource Locator (URL) and anchor text analysis, comprising:
-
means for training a focused crawler by obtaining a training set of at least URLs or anchor text for a website, computing a score for the training set, and extracting a plurality of features of the training set, and computing a score for each of the plurality of features, wherein the features identify key information contained in the website; and means for executing a trained focused crawler on other websites.
-
Specification