Processing names in a text
First Claim
Patent Images
1. A system of one or more computers for processing names in one or more documents of text, the system comprising:
- a central processing unit (CPU) and a memory within each of the computers;
a database;
a tokenized text in the form of one or more strings of characters, each string being a token;
a name data structure having a plurality of named elements, each named element having a string and one or more attributes associated with the string;
a name extraction processor that scans, selects, and concatenates one or more of the tokens to create a raw name that is entered as a string value in the string of one of the named elements, where the name extraction processor is capable of creating a candidate name from each of one or more of the raw names by one or both of the following name processes;
cleaning the string value and splitting the string value.
1 Assignment
0 Petitions
Accused Products
Abstract
Occurrences of proper names in text are identified by scanning one or more documents in a database of a computer system to identify one or more sequences of capitalized words and other specially defined words that appear in the documents as raw names. Each of the raw names has zero or more proper names, zero or more medial substrings, zero or more leading substrings, and zero or more trailing substrings. The raw names of one or more documents are "cleaned" and "split" until certain "cleaning and splitting conditions" are no longer met to obtain a list of clean and split candidate names.
216 Citations
16 Claims
-
1. A system of one or more computers for processing names in one or more documents of text, the system comprising:
-
a central processing unit (CPU) and a memory within each of the computers; a database; a tokenized text in the form of one or more strings of characters, each string being a token; a name data structure having a plurality of named elements, each named element having a string and one or more attributes associated with the string; a name extraction processor that scans, selects, and concatenates one or more of the tokens to create a raw name that is entered as a string value in the string of one of the named elements, where the name extraction processor is capable of creating a candidate name from each of one or more of the raw names by one or both of the following name processes;
cleaning the string value and splitting the string value. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A method of identifying one or more proper names, the proper names appearing in one or more documents in a database, comprising the steps of:
-
a. scanning one of the documents to select one or more tokens of text; b. concatenating one or more of the tokens to create a raw name having zero or more proper names, zero or more medial substrings, zero or more leading substrings, and zero or more trailing substrings; c. cleaning each of the raw names by removing the leading and trailing substrings if a cleaning condition is met; d. splitting the candidate name at the medial substring if a splitting condition is met; and e. repeating steps c and d until no more cleaning conditions and splitting conditions are met, the candidate name being a final candidate. - View Dependent Claims (16)
-
Specification