Data gathering and distribution system
First Claim
1. A method for gathering and classifying information from the world wide web, the method comprising the steps of:
- providing an industry database containing a multi-level classification of industry groups, each of said industry groups having a set of associated industry group keywords;
providing a company database containing a plurality of company profiles, each company profile containing identifying information and an association with one or more of said industry groups;
extracting information from a website;
determining a data type for said extracted information;
comparing said extracted information with said identifying information from said plurality of company profiles and,if the identifying information is found within the extracted information for at least one company profile,then associating the extracted information with the at least one company and associating the extracted information with the one or more of said industry groups associated with the at least one company profile, andif the identifying information is not found within the extracted information,comparing said extracted information with said set of associated industry group keywords for each of said industry groups, and,if the extracted information is found to have at least a threshold degree of relevance to at least one industry group,selecting a most relevant industry group and associating the extracted information with the most relevant industry group,wherein said step of comparing said extracted information with said set of associated industry group keywords comprises filtering said extracted information to produce a subset of extracted information, and searching said subset for said associated industry group keywords,wherein said step of comparing said extracted information with said set of associated industry group keywords further comprises counting matches from said searching step to produce a match count, normalizing said match count over the size of the subset to generate a degree of relevance, and comparing said degree of relevance to said threshold degree of relevance, andwherein said step of filtering includes at least applying a common word filter to remove common words; and
storing said extracted information as a data record within an information database, including at least one association with at least one industry group.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and method for gathering and distributing data. The system is for extracting information from the world wide web and classifying the information in accordance with certain profiles. The information may include business intelligence, which may be categorized according to its relevance to predefined industry profiles or company profiles. The information may be further categorized according to its relevance to particular countries. If the information relates to a new company, the system builds a new company profile based upon the information. Users may create a user profile containing their information preferences, such as industry groups or particular countries or companies, and the system provides reports or alerts to the users referencing extracted information that is filtered by the user profile.
57 Citations
10 Claims
-
1. A method for gathering and classifying information from the world wide web, the method comprising the steps of:
-
providing an industry database containing a multi-level classification of industry groups, each of said industry groups having a set of associated industry group keywords; providing a company database containing a plurality of company profiles, each company profile containing identifying information and an association with one or more of said industry groups; extracting information from a website; determining a data type for said extracted information; comparing said extracted information with said identifying information from said plurality of company profiles and, if the identifying information is found within the extracted information for at least one company profile, then associating the extracted information with the at least one company and associating the extracted information with the one or more of said industry groups associated with the at least one company profile, and if the identifying information is not found within the extracted information, comparing said extracted information with said set of associated industry group keywords for each of said industry groups, and, if the extracted information is found to have at least a threshold degree of relevance to at least one industry group, selecting a most relevant industry group and associating the extracted information with the most relevant industry group, wherein said step of comparing said extracted information with said set of associated industry group keywords comprises filtering said extracted information to produce a subset of extracted information, and searching said subset for said associated industry group keywords, wherein said step of comparing said extracted information with said set of associated industry group keywords further comprises counting matches from said searching step to produce a match count, normalizing said match count over the size of the subset to generate a degree of relevance, and comparing said degree of relevance to said threshold degree of relevance, and wherein said step of filtering includes at least applying a common word filter to remove common words; and storing said extracted information as a data record within an information database, including at least one association with at least one industry group. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A system for gathering and classifying information from the world wide web, the system comprising:
-
at least one memory storing an industry database containing a multi-level classification of industry groups, each of said industry groups having a set of associated industry group keywords, a company database containing a plurality of company profiles, each company profile containing identifying information and an association with one or more of said industry groups, and an information database containing data records, each of said data records including at least one association with at least one industry group; an extractor for crawling the world wide web and producing extracted information from at least one website; a classifier for receiving said extracted information, said classifier including a data type component configured to determine a data type for said extracted information; a company comparison component configured to compare said extracted information with said identifying information from said plurality of company profiles and, if the identifying information is found within the extracted information for at least one company profile, then associate the extracted information with the at least one company and associate the extracted information with the one or more of said industry groups associated with the at least one company profile; an industry component configured to compare said extracted information with said set of associated industry group keywords for each of said industry groups if the company comparison component finds the identification information is not found within the extracted information, and wherein the industry component is configured to select a most relevant industry group and associated the most relevant industry group with the extracted information if the extracted information is found to have at least a threshold degree of relevance to at least one industry group, wherein said industry component includes one or more filters for filtering said extracted information to produce a subset of extracted information and wherein said industry component is configured to search said subset for said associated industry group keywords, wherein said industry component is configured to count matches from said search of said subset to produce a match count, to normalize said match count over the size of the subset to generate a degree of relevance, and to compare said degree of relevance to said threshold degree of relevance, and wherein said at least one filters include a common word filter to remove common words; and wherein said classifier is configured to store said extracted information as one of said data records within said information database. - View Dependent Claims (7, 8, 9, 10)
-
Specification