×

Data gathering and distribution system

  • US 7,409,393 B2
  • Filed: 07/28/2004
  • Issued: 08/05/2008
  • Est. Priority Date: 07/28/2004
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method for gathering and classifying information from the world wide web, the method comprising the steps of:

  • providing an industry database containing a multi-level classification of industry groups, each of said industry groups having a set of associated industry group keywords;

    providing a company database containing a plurality of company profiles, each company profile containing identifying information and an association with one or more of said industry groups;

    extracting information from a website;

    determining a data type for said extracted information;

    comparing said extracted information with said identifying information from said plurality of company profiles and,if the identifying information is found within the extracted information for at least one company profile,then associating the extracted information with the at least one company and associating the extracted information with the one or more of said industry groups associated with the at least one company profile, andif the identifying information is not found within the extracted information,comparing said extracted information with said set of associated industry group keywords for each of said industry groups, and,if the extracted information is found to have at least a threshold degree of relevance to at least one industry group,selecting a most relevant industry group and associating the extracted information with the most relevant industry group,wherein said step of comparing said extracted information with said set of associated industry group keywords comprises filtering said extracted information to produce a subset of extracted information, and searching said subset for said associated industry group keywords,wherein said step of comparing said extracted information with said set of associated industry group keywords further comprises counting matches from said searching step to produce a match count, normalizing said match count over the size of the subset to generate a degree of relevance, and comparing said degree of relevance to said threshold degree of relevance, andwherein said step of filtering includes at least applying a common word filter to remove common words; and

    storing said extracted information as a data record within an information database, including at least one association with at least one industry group.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×