Method for extracting company names from text
First Claim
Patent Images
1. A method, for practice on a computer, for extracting company names from text comprising the steps of:
- scanning said text for a company name indicator;
reading words backwards through said text until a stop condition comprising reading a word which is on a stop list occurs; and
extracting the words read by said computer, before the stop condition occurred.
2 Assignments
0 Petitions
Accused Products
Abstract
A method for extracting company names from textual information uses a combination of heuristics, exception lists, and extensive corpus analysis. The method first locates company name suffixes (i.e., Company, Corporation) and attempts to locate the beginning of the company name. The method works on both mixed-case text and capitalized text. Upon identification of a company name, the method proceeds to generate variations of the name for later extraction.
56 Citations
3 Claims
-
1. A method, for practice on a computer, for extracting company names from text comprising the steps of:
-
scanning said text for a company name indicator; reading words backwards through said text until a stop condition comprising reading a word which is on a stop list occurs; and extracting the words read by said computer, before the stop condition occurred.
-
-
2. A method, for practice on a computer, for extracting company names from text comprising the steps of:
-
scanning said next for a company name indicator; determining if there is parallel sentence structure in the text including said company name indicator; reading words backwards through said text until a stop condition occurs; and extracting the words read by said computer, before the stop condition occurred.
-
-
3. A method, for practice on a computer, for extracting company names from text comprising the steps of:
-
scanning said text for a company name indicator; determining if a verb following said company name indicator is plural; reading words backwards through said text until a stop condition occurs; and extracting the words read by said computer, before the stop condition occurred.
-
Specification