Apparatus and method for automatic assignment of industry classification codes
First Claim
Patent Images
1. A system for automatically assigning a business classification code to a company, comprising:
- one or more processing units configured to;
trawl the Internet, locate and extract web data relevant to the company;
generate a business classification analysis with a node structure corresponding to a selected business classification code system and to compute a taxonomy word histogram based on the node structure;
generate an extracted word histogram corresponding to the presence of selected web data elements within the extracted web data relevant to the company; and
determine a business classification code assignment with a first list of matches for the business classification code for the company by comparing a normalized scalar product of the taxonomy word histogram and the extracted word histogram to a predetermined threshold.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and methods for automatically assigning of classification codes to a business based on information about the business collected from the Internet are provided in which data extracted from trawling the Internet is compared to a node structure based on a taxonomy of a selected business classification code system.
-
Citations
15 Claims
-
1. A system for automatically assigning a business classification code to a company, comprising:
-
one or more processing units configured to; trawl the Internet, locate and extract web data relevant to the company; generate a business classification analysis with a node structure corresponding to a selected business classification code system and to compute a taxonomy word histogram based on the node structure; generate an extracted word histogram corresponding to the presence of selected web data elements within the extracted web data relevant to the company; and determine a business classification code assignment with a first list of matches for the business classification code for the company by comparing a normalized scalar product of the taxonomy word histogram and the extracted word histogram to a predetermined threshold. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A non-transitory computer-readable storage medium having instructions stored thereon, which, when executed by one or more processors of one or more computing devices, cause the one or more processors to perform operations comprising:
-
trawling the Internet to locate and extract data relevant to a company; generating a node structure corresponding to a selected business classification code system and computing a taxonomy word histogram based on the node structure; generating an extracted word histogram corresponding to the presence of selected data elements within the extracted data relevant to the company; and determining a first list of matches for the business classification code for the company by comparing a normalized scalar product of the taxonomy word histogram and the extracted word histogram to a predetermined threshold. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
Specification