×

Training set construction for taxonomic classification

  • US 8,484,194 B1
  • Filed: 01/13/2012
  • Issued: 07/09/2013
  • Est. Priority Date: 10/22/2009
  • Status: Active Grant
First Claim
Patent Images

1. A computer system comprising:

  • at least one processor; and

    a computer-readable medium storing instructions that, when executed by the at least one processor, cause the computer system to execute;

    a training set generator configured to input a taxonomy including a hierarchy of categories and a plurality of top-level sites, and to output a training set of categorized data, the training set generator including;

    a crawler configured to crawl each of the top-level sites to determine at least one lower-level site associated therewith and to store the top-level sites and associated lower-level sites as crawl data, the crawler including a site finder configured to receive the plurality of top-level sites from a user, andan extractor configured to determine, for each of the top-level sites, a corresponding site-specific extraction template associating at least one portion of the corresponding top-level site with at least one category of the hierarchy of categories, and further configured to apply each site-specific extraction template to corresponding crawl data to thereby associate the crawl data with the categories of the hierarchical categories and obtain categorized data of the training set.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×