Name searching
First Claim
1. A method of identifying personal names in an electronic file, the method comprising:
- (1) parsing the file to divide it into individual words;
(2) identifying words or word sequences which represent candidate names;
(4) for each candidate name, comparing the word or words making up that name against a database of known false positive name entities and, if the candidate name contains a known false positive name entity or entities, flagging that name as an invalid personal name and, if the candidate name does not contain a known false positive name entity or entities, either flagging the name as a potentially valid personal name or further processing the name to check its validity.
1 Assignment
0 Petitions
Accused Products
Abstract
A method of identifying personal names in an electronic file published on the WWW 2. The method comprises downloading the file to a computer 1, and parsing the file to divide it into individual words and identifying words or word sequences which represent candidate names. For each candidate name, the word or words making up that name are compared against a database of known false positive name entities. If the candidate name contains a known false positive name entity or entities, that name is flagged as an invalid personal name. If the candidate name does not contain a known false positive name entity or entities, the candidate name is either accepted as a personal name or further processed to check its validity.
11 Citations
17 Claims
-
1. A method of identifying personal names in an electronic file, the method comprising:
-
(1) parsing the file to divide it into individual words;
(2) identifying words or word sequences which represent candidate names;
(4) for each candidate name, comparing the word or words making up that name against a database of known false positive name entities and, if the candidate name contains a known false positive name entity or entities, flagging that name as an invalid personal name and, if the candidate name does not contain a known false positive name entity or entities, either flagging the name as a potentially valid personal name or further processing the name to check its validity. - View Dependent Claims (2, 3, 5, 6, 7, 17)
-
-
4. A method of identifying personal names in an electronic file which can be accessed via a computer network, the method comprising:
-
(1) downloading the file via the network to a computer;
(2) parsing the file to divide it into individual words;
(3) identifying words or word sequences which represent candidate names;
(4) for each candidate name, comparing the word or words making up that name against a database of known false positive name entities and, if the candidate name contains a known false positive name entity or entities, flagging that name as an invalid personal name and, if the candidate name does not contain a known false positive name entity or entities, either flagging the name as a potentially valid personal name or further processing the name to check its validity.
-
-
8. A method of monitoring electronic files published on a network, the method comprising:
-
at a computer having access to the network, defining at least one address pointing to an electronic file or files the contents of which are to be monitored;
periodically downloading the file(s) over the network from said location; and
for each download, identifying a personal name or names present in said file(s) and automatically generating a report containing said name(s). - View Dependent Claims (9, 10, 11, 12, 13, 15)
-
-
14. A method of facilitating access to documents over a network, the method comprising:
-
searching a plurality of electronic files to identify personal names;
generating a file containing the identified names or a sub-set thereof and links to the files containing the names; and
making the generated file available for downloading over the network.
-
-
16. An electronic news service comprising publishing on the Internet a list of personal names, said names having been identified by searching for personal names in a multiplicity of electronic files, each published name being associated with a hyperlink or hyperlinks to Internet pages containing that name.
Specification