Directed web crawler with machine learning
First Claim
1. A system having computer-readable code associated with a network computer environment and one or more servers having one or more databases associated therewith containing information about database content for providing a network search in response to a user'"'"'s input, said system comprising:
- at least one computer, for receiving one or more queries, searching a plurality of databases, and displaying a specialized collection of documents related to said one or more queries;
at least one network, operatively connected to said at least one computer, for accessing said plurality of databases and transferring information from said plurality of databases to said at least one network;
at least one server, operatively connected to said at least one network, for storing said plurality of databases; and
software means, operatively connected to said at least one computer, for preparing an affinity set related to said one or more queries, identifying information in said plurality of databases, creating an index relating to said information in said plurality of databases, creating a set of seed documents based on information in said plurality of databases, training a classifier to classify said information in said plurality of databases using said seed documents, searching said network for relevant documents using a binary system created by said classifier, creating said specialized collection of documents related to said one or more queries, creating a ranked list of said specialized collection of documents, and displaying said ranked list on said at least one computer.
0 Assignments
0 Petitions
Accused Products
Abstract
A web crawler identifies and characterizes an expression of a topic of general interest (such as cryptography) entered and generates an affinity set which comprises a set of related words. This affinity set is related to the expression of a topic of general interest. Using a common search engine, seed documents are found. The seed documents along with the affinity set and other search data will provide training to a classifier to create classifier output for the web crawler to search the web based on multiple criteria, including a content-based rating provided by the trained classifier. The web crawler can perform it'"'"'s search topic focused, rather than “link” focused. The found relevant content will be ranked and results displayed or saved for a specialty search.
-
Citations
2 Claims
-
1. A system having computer-readable code associated with a network computer environment and one or more servers having one or more databases associated therewith containing information about database content for providing a network search in response to a user'"'"'s input, said system comprising:
-
at least one computer, for receiving one or more queries, searching a plurality of databases, and displaying a specialized collection of documents related to said one or more queries;
at least one network, operatively connected to said at least one computer, for accessing said plurality of databases and transferring information from said plurality of databases to said at least one network;
at least one server, operatively connected to said at least one network, for storing said plurality of databases; and
software means, operatively connected to said at least one computer, for preparing an affinity set related to said one or more queries, identifying information in said plurality of databases, creating an index relating to said information in said plurality of databases, creating a set of seed documents based on information in said plurality of databases, training a classifier to classify said information in said plurality of databases using said seed documents, searching said network for relevant documents using a binary system created by said classifier, creating said specialized collection of documents related to said one or more queries, creating a ranked list of said specialized collection of documents, and displaying said ranked list on said at least one computer.
-
-
2. A method of searching a database of records and displaying the records, said method including the steps of:
-
(a) receiving a user'"'"'s request query, said query including one or more words, phrases or documents, for defining a topic associated with said user'"'"'s request query;
(b) generating an affinity list, said list including one or more words, phrases or documents related to said user'"'"'s request query;
(c) causing one or more servers to locate and retrieve seed documents, said seed documents including information relevant and irrelevant to said affinity list;
(d) training a binary classifier, said binary classifier being trained using said seed documents to define documents;
(e) causing a web spider to locate and retrieve documents related to said user'"'"'s request query, said spider being directed to documents by said binary classifier;
(f) ranking URLs associated with said documents located by said web spider; and
(g) displaying said ranking of URLs.
-
Specification