Identifying non-distinct names in a set of names
First Claim
1. A method for identifying non-distinct names in a set of names, comprising:
- obtaining, using a processor of a computer, the set of names for a first entity;
in response to comparing a first name and a second name in the set of names, determining that the first name is similar to the second name;
searching for initials in the first name and the second name;
in response to the search indicating that there is at least one initial in at least one of the first name and the second name,determining that the at least one initial matches a corresponding initial in another one of the first name and the second name or that there is a token in one of the first name and the second name that has a matching character in a first position of a corresponding token in the other of the first name and the second name; and
in response to determining that that the at least one initial matches a corresponding initial in another one of the first name and the second name or that there is a token in one of the first name and the second name that has a matching character in the first position of the corresponding token in the other of the first name and the second name, marking one of the first name and the second name as a non-distinct name; and
applying a cross-entity scoring technique using distinct names in the set of names for the first entity and names in another set of names for a second entity.
1 Assignment
0 Petitions
Accused Products
Abstract
Non-distinct names are identified in a set of names. The set of names is obtained for a first entity. In response to comparing a first name and a second name in the set of names, it is determined that the first name is similar to the second name. Initials in the first name and the second name are searched for. In response to the search indicating that there is at least one initial in at least one of the first name and the second name, it is determined that the at least one initial matches a corresponding initial in another one of the first name and the second name and one of the first name and the second name are marked as a non-distinct name. A cross-entity scoring technique using distinct names in the set of names for the first entity and names in another set of names for a second entity is applied.
22 Citations
21 Claims
-
1. A method for identifying non-distinct names in a set of names, comprising:
-
obtaining, using a processor of a computer, the set of names for a first entity; in response to comparing a first name and a second name in the set of names, determining that the first name is similar to the second name; searching for initials in the first name and the second name; in response to the search indicating that there is at least one initial in at least one of the first name and the second name, determining that the at least one initial matches a corresponding initial in another one of the first name and the second name or that there is a token in one of the first name and the second name that has a matching character in a first position of a corresponding token in the other of the first name and the second name; and in response to determining that that the at least one initial matches a corresponding initial in another one of the first name and the second name or that there is a token in one of the first name and the second name that has a matching character in the first position of the corresponding token in the other of the first name and the second name, marking one of the first name and the second name as a non-distinct name; and applying a cross-entity scoring technique using distinct names in the set of names for the first entity and names in another set of names for a second entity. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer system for identifying non-distinct names in a set of names, comprising:
-
a processor; and a storage device connected to the processor, wherein the storage device has stored thereon a program, and wherein the processor is configured to execute instructions of the program to perform operations, wherein the operations comprise; obtaining the set of names for a first entity; in response to comparing a first name and a second name in the set of names, determining that the first name is similar to the second name; searching for initials in the first name and the second name; in response to the search indicating that there is at least one initial in at least one of the first name and the second name, determining that the at least one initial matches a corresponding initial in another one of the first name and the second name or that there is a token in one of the first name and the second name that has a matching character in a first position of a corresponding token in the other of the first name and the second name; and in response to determining that that the at least one initial matches a corresponding initial in another one of the first name and the second name or that there is a token in one of the first name and the second name that has a matching character in the first position of the corresponding token in the other of the first name and the second name, marking one of the first name and the second name as a non-distinct name; and applying a cross-entity scoring technique using distinct names in the set of names for the first entity and names in another set of names for a second entity. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A computer program product for identifying non-distinct names in a set of names, the computer program product comprising:
-
a computer readable storage medium having computer readable program code embodied therewith, the computer readable program code comprising; computer readable program code, when executed by a processor of a computer, configured to perform; obtaining the set of names for a first entity; in response to comparing a first name and a second name in the set of names, determining that the first name is similar to the second name; searching for initials in the first name and the second name; in response to the search indicating that there is at least one initial in at least one of the first name and the second name, determining that the at least one initial matches a corresponding initial in another one of the first name and the second name or that there is a token in one of the first name and the second name that has a matching character in a first position of a corresponding token in the other of the first name and the second name; and in response to determining that that the at least one initial matches a corresponding initial in another one of the first name and the second name or that there is a token in one of the first name and the second name that has a matching character in the first position of the corresponding token in the other of the first name and the second name, marking one of the first name and the second name as a non-distinct name; and applying a cross-entity scoring technique using distinct names in the set of names for the first entity and names in another set of names for a second entity. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
Specification