Identifying non-distinct names in a set of names

US 8,364,692 B1
Filed: 08/11/2011
Issued: 01/29/2013
Est. Priority Date: 08/11/2011
Status: Active Grant

First Claim

Patent Images

1. A method for identifying non-distinct names in a set of names, comprising:

obtaining, using a processor of a computer, the set of names for a first entity;

in response to comparing a first name and a second name in the set of names, determining that the first name is similar to the second name;

searching for initials in the first name and the second name;

in response to the search indicating that there is at least one initial in at least one of the first name and the second name,determining that the at least one initial matches a corresponding initial in another one of the first name and the second name or that there is a token in one of the first name and the second name that has a matching character in a first position of a corresponding token in the other of the first name and the second name; and

in response to determining that that the at least one initial matches a corresponding initial in another one of the first name and the second name or that there is a token in one of the first name and the second name that has a matching character in the first position of the corresponding token in the other of the first name and the second name, marking one of the first name and the second name as a non-distinct name; and

applying a cross-entity scoring technique using distinct names in the set of names for the first entity and names in another set of names for a second entity.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Non-distinct names are identified in a set of names. The set of names is obtained for a first entity. In response to comparing a first name and a second name in the set of names, it is determined that the first name is similar to the second name. Initials in the first name and the second name are searched for. In response to the search indicating that there is at least one initial in at least one of the first name and the second name, it is determined that the at least one initial matches a corresponding initial in another one of the first name and the second name and one of the first name and the second name are marked as a non-distinct name. A cross-entity scoring technique using distinct names in the set of names for the first entity and names in another set of names for a second entity is applied.

22 Citations

View as Search Results

21 Claims

1. A method for identifying non-distinct names in a set of names, comprising:
- obtaining, using a processor of a computer, the set of names for a first entity;
  
  in response to comparing a first name and a second name in the set of names, determining that the first name is similar to the second name;
  
  searching for initials in the first name and the second name;
  
  in response to the search indicating that there is at least one initial in at least one of the first name and the second name,determining that the at least one initial matches a corresponding initial in another one of the first name and the second name or that there is a token in one of the first name and the second name that has a matching character in a first position of a corresponding token in the other of the first name and the second name; and
  
  in response to determining that that the at least one initial matches a corresponding initial in another one of the first name and the second name or that there is a token in one of the first name and the second name that has a matching character in the first position of the corresponding token in the other of the first name and the second name, marking one of the first name and the second name as a non-distinct name; and
  
  applying a cross-entity scoring technique using distinct names in the set of names for the first entity and names in another set of names for a second entity.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, further comprising:
    - in response to the search indicating that there are no initials in the first name and the second name, marking one of the first name and the second name as a non-distinct name.
  - 3. The method of claim 1, wherein the determination that the first name is similar to the second name further comprises:
    - calculating a common character count between the first name and the second name;
      
      calculating a transposition count between the first name and the second name; and
      
      determining that the first name and the second name are similar if the common character count equals a length of the first name and the transposition count is less than a configurable number.
  - 4. The method of claim 3, further comprising:
    - in response to determining that the common character count equals the length of the first name and that the transposition count is less than the configurable number, comparing one or more initial tokens.
  - 5. The method of claim 3, wherein the common character count is based on performing a character comparison between the first name and the second name by moving left-to-right to identify characters that match and are in a same relative position.
  - 6. The method of claim 3, wherein, for any character that has not been matched in the first name, the common character count is based on searching forward and backward within a configurable search range in the first name and the second name to identify matching characters.
  - 7. The method of claim 3, wherein calculating the transposition count further comprises:
    - counting a number of transpositions; and
      
      dividing the counted number of transpositions by two.

8. A computer system for identifying non-distinct names in a set of names, comprising:
- a processor; and
  
  a storage device connected to the processor,wherein the storage device has stored thereon a program, andwherein the processor is configured to execute instructions of the program to perform operations, wherein the operations comprise;
  
  obtaining the set of names for a first entity;
  
  in response to comparing a first name and a second name in the set of names, determining that the first name is similar to the second name;
  
  searching for initials in the first name and the second name;
  
  in response to the search indicating that there is at least one initial in at least one of the first name and the second name,determining that the at least one initial matches a corresponding initial in another one of the first name and the second name or that there is a token in one of the first name and the second name that has a matching character in a first position of a corresponding token in the other of the first name and the second name; and
  
  in response to determining that that the at least one initial matches a corresponding initial in another one of the first name and the second name or that there is a token in one of the first name and the second name that has a matching character in the first position of the corresponding token in the other of the first name and the second name, marking one of the first name and the second name as a non-distinct name; and
  
  applying a cross-entity scoring technique using distinct names in the set of names for the first entity and names in another set of names for a second entity.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The computer system of claim 8, wherein the operations further comprise:
    - in response to the search indicating that there are no initials in the first name and the second name, marking one of the first name and the second name as a non-distinct name.
  - 10. The computer system of claim 8, wherein the operations for the determination that the first name is similar to the second name further comprise:
    - calculating a common character count between the first name and the second name;
      
      calculating a transposition count between the first name and the second name; and
      
      determining that the first name and the second name are similar if the common character count equals a length of the first name and the transposition count is less than a configurable number.
  - 11. The computer system of claim 10, wherein the operations further comprise:
    - in response to determining that the common character count equals the length of the first name and that the transposition count is less than the configurable number, comparing one or more initial tokens.
  - 12. The computer system of claim 10, wherein the common character count is based on performing a character comparison between the first name and the second name by moving left-to-right to identify characters that match and are in a same relative position.
  - 13. The computer system of claim 10, wherein, for any character that has not been matched in the first name, the common character count is based on searching forward and backward within a configurable search range in the first name and the second name to identify matching characters.
  - 14. The computer system of claim 10, wherein the operations for calculating the transposition count further comprise:
    - counting a number of transpositions; and
      
      dividing the counted number of transpositions by two.

15. A computer program product for identifying non-distinct names in a set of names, the computer program product comprising:
- a computer readable storage medium having computer readable program code embodied therewith, the computer readable program code comprising;
  
  computer readable program code, when executed by a processor of a computer, configured to perform;
  
  obtaining the set of names for a first entity;
  
  in response to comparing a first name and a second name in the set of names, determining that the first name is similar to the second name;
  
  searching for initials in the first name and the second name;
  
  in response to the search indicating that there is at least one initial in at least one of the first name and the second name,determining that the at least one initial matches a corresponding initial in another one of the first name and the second name or that there is a token in one of the first name and the second name that has a matching character in a first position of a corresponding token in the other of the first name and the second name; and
  
  in response to determining that that the at least one initial matches a corresponding initial in another one of the first name and the second name or that there is a token in one of the first name and the second name that has a matching character in the first position of the corresponding token in the other of the first name and the second name, marking one of the first name and the second name as a non-distinct name; and
  
  applying a cross-entity scoring technique using distinct names in the set of names for the first entity and names in another set of names for a second entity.
- View Dependent Claims (16, 17, 18, 19, 20, 21)
- - 16. The computer program product of claim 15, wherein the computer readable program code, when executed by the processor of the computer, is configured to perform:
    - in response to the search indicating that there are no initials in the first name and the second name, marking one of the first name and the second name as a non-distinct name.
  - 17. The computer program product of claim 15, wherein, for the determination that the first name is similar to the second name, the computer readable program code, when executed by the processor of the computer, is configured to perform:
    - calculating a common character count between the first name and the second name;
      
      calculating a transposition count between the first name and the second name; and
      
      determining that the first name and the second name are similar if the common character count equals a length of the first name and the transposition count is less than a configurable number.
  - 18. The computer program product of claim 17, wherein the computer readable program code, when executed by the processor of the computer, is configured to perform:
    - in response to determining that the common character count equals the length of the first name and that the transposition count is less than the configurable number, comparing one or more initial tokens.
  - 19. The computer program product of claim 17, wherein the common character count is based on performing a character comparison between the first name and the second name by moving left-to-right to identify characters that match and are in a same relative position.
  - 20. The computer program product of claim 17, wherein, for any character that has not been matched in the first name, the common character count is based on searching forward and backward within a configurable search range in the first name and the second name to identify matching characters.
  - 21. The computer program product of claim 17, wherein, for calculating the transposition count, the computer readable program code, when executed by the processor of the computer, is configured to perform:
    - counting a number of transpositions; and
      
      dividing the counted number of transpositions by two.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Allen, Thomas B., Macy, Brian E., Vincent, Caroljayne J.
Primary Examiner(s)
Nguyen, Cam-Linh

Application Number

US13/208,189
Publication Number

US 20130041895A1
Time in Patent Office

537 Days
Field of Search

707/728, 707/749, 707/758, 707/780
US Class Current

707/758
CPC Class Codes

G06F 16/90344 by using string matching te...

Identifying non-distinct names in a set of names

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

22 Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

Identifying non-distinct names in a set of names

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

22 Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links