NAME CLASSIFIER TECHNIQUE

US 20100114812A1
Filed: 01/06/2010
Published: 05/06/2010
Est. Priority Date: 11/23/2004
Status: Active Grant

First Claim

Patent Images

1. A method, comprising:

accessing a name;

dividing, using a computer including a processor, the name into a series of first n-grams;

forming multiple concatenated second n-grams by concatenating pairs of the first n-grams;

for each of multiple groups, for each of the second n-grams, determining the term frequency-group frequency score using equation;

((0.5+(0.5*(number of times the second n-gram occurs in a group))/(number of times a most common n-gram occurs in the group))*((number of times the second n-gram occurs in the group)/(number of times the second n-gram occurs in the multiple groups));

for each of the multiple groups, summing up the term frequency-group frequency scores for each second n-gram for that group; and

determining a likelihood that the name belongs to one group of the multiple groups based on the summed scores, wherein a largest summed score indicates a greater likelihood that the name belongs to the one group.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A particular technique for classifying a name includes accessing a name; dividing the name into a series of first n-grams; forming multiple concatenated second n-grams by concatenating pairs of the first n-grams; for each of multiple groups, for each of the second n-grams, determining the term frequency-group frequency score; for each of the multiple groups, summing up the term frequency-group frequency scores for each second n-gram for that group; and determining a likelihood that the name belongs to one group of the multiple groups based on the summed scores, wherein a largest summed score indicates a greater likelihood that the name belongs to the one group.

Citations

21 Claims

1. A method, comprising:
- accessing a name;
  
  dividing, using a computer including a processor, the name into a series of first n-grams;
  
  forming multiple concatenated second n-grams by concatenating pairs of the first n-grams;
  
  for each of multiple groups, for each of the second n-grams, determining the term frequency-group frequency score using equation;
  
  ((0.5+(0.5*(number of times the second n-gram occurs in a group))/(number of times a most common n-gram occurs in the group))*((number of times the second n-gram occurs in the group)/(number of times the second n-gram occurs in the multiple groups));
  
  for each of the multiple groups, summing up the term frequency-group frequency scores for each second n-gram for that group; and
  
  determining a likelihood that the name belongs to one group of the multiple groups based on the summed scores, wherein a largest summed score indicates a greater likelihood that the name belongs to the one group.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein each of the multiple groups comprises one of a language and a culture.
  - 3. The method of claim 1, further comprising:
    - normalizing a likelihood that the name belongs to the one group.
  - 4. The method of claim 1, wherein the one group is a first group and further comprising:
    - determining a likelihood that the name belongs to a second group based on the summed scores, wherein a second largest score indicates the likelihood that the name belongs to the second group.
  - 5. The method of claim 4, further comprising:
    - classifying the name as belonging to either the first group or the second group based on the likelihood that the name belongs to the first group and the likelihood that the name belongs to the second group.
  - 6. The method of claim 1, further comprising:
    - determining that the name is a surname;
      
      assigning a surname weight to the name based on the determination that the name is a surname;
      
      determining a weighted likelihood that the surname belongs to a first group by multiplying a likelihood that the surname belongs to a first group by the surname weight;
      
      accessing a given name that corresponds to the surname, wherein the given name and the surname form a complete name;
      
      determining a likelihood that the given name belongs to the first group;
      
      assigning a given-name weight to the given name;
      
      determining a weighted likelihood that the given name belongs to the first group by multiplying a likelihood that the given name belongs to the first group by the given-name weight; and
      
      determining a likelihood that the complete name belongs to the first group by adding the weighted likelihood that the given name belongs to the first group and the weighted likelihood that the surname belongs to the first group.
  - 7. The method of claim 1, wherein the name is a first name and further comprising:
    - determining that the first name occupies a given name field of a larger name;
      
      determining that a second name occupies a second given name field of the larger name, wherein the first name and the second name form a complete given name;
      
      accessing the second name;
      
      determining a likelihood that the second name belongs to the first group; and
      
      determining a likelihood that the complete given name belongs to the first group by averaging the likelihood that the second name belongs to the first group and the likelihood that the complete name belongs to the first group.

8. A system, comprising:
- hardware logic performing operations, the operations comprising;
  
  accessing a name;
  
  dividing, using a computer including a processor, the name into a series of first n-grams;
  
  forming multiple concatenated second n-grams by concatenating pairs of the first n-grams;
  
  for each of multiple groups, for each of the second n-grams, determining the term frequency-group frequency score using equation;
  
  ((0.5+(0.5*(number of times the second n-gram occurs in a group))/(number of times a most common n-gram occurs in the group))*((number of times the second n-gram occurs in the group)/(number of times the second n-gram occurs in the multiple groups));
  
  for each of the multiple groups, summing up the term frequency-group frequency scores for each second n-gram for that group; and
  
  determining a likelihood that the name belongs to one group of the multiple groups based on the summed scores, wherein a largest summed score indicates a greater likelihood that the name belongs to the one group.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The system of claim 8, wherein each of the multiple groups comprises one of a language and a culture.
  - 10. The system of claim 8, wherein the operations further comprise:
    - normalizing a likelihood that the name belongs to the one group.
  - 11. The system of claim 8, wherein the one group is a first group and wherein the operations further comprise:
    - determining a likelihood that the name belongs to a second group based on the summed scores, wherein a second largest score indicates the likelihood that the name belongs to the second group.
  - 12. The system of claim 11, wherein the operations further comprise:
    - classifying the name as belonging to either the first group or the second group based on the likelihood that the name belongs to the first group and the likelihood that the name belongs to the second group.
  - 13. The system of claim 8, wherein the operations further comprise:
    - determining that the name is a surname;
      
      assigning a surname weight to the name based on the determination that the name is a surname;
      
      determining a weighted likelihood that the surname belongs to a first group by multiplying a likelihood that the surname belongs to a first group by the surname weight;
      
      accessing a given name that corresponds to the surname, wherein the given name and the surname form a complete name;
      
      determining a likelihood that the given name belongs to the first group;
      
      assigning a given-name weight to the given name;
      
      determining a weighted likelihood that the given name belongs to the first group by multiplying a likelihood that the given name belongs to the first group by the given-name weight; and
      
      determining a likelihood that the complete name belongs to the first group by adding the weighted likelihood that the given name belongs to the first group and the weighted likelihood that the surname belongs to the first group.
  - 14. The system of claim 8, wherein the name is a first name and wherein the operations further comprise:
    - determining that the first name occupies a given name field of a larger name;
      
      determining that a second name occupies a second given name field of the larger name, wherein the first name and the second name form a complete given name;
      
      accessing the second name;
      
      determining a likelihood that the second name belongs to the first group; and
      
      determining a likelihood that the complete given name belongs to the first group by averaging the likelihood that the second name belongs to the first group and the likelihood that the complete name belongs to the first group.

15. A computer program product comprising a computer readable storage medium including a computer readable program, wherein the computer readable program when executed by a processor on a computer causes the computer to:
- access a name;
  
  divide the name into a series of first n-grams;
  
  forming multiple concatenated second n-grams by concatenating pairs of the first n-grams;
  
  for each of multiple groups, for each of the second n-grams, determining the term frequency-group frequency score using equation;
  
  ((0.5+(0.5*(number of times the second n-gram occurs in a group))/(number of times a most common n-gram occurs in the group))*((number of times the second n-gram occurs in the group)/(number of times the second n-gram occurs in the multiple groups));
  
  for each of the multiple groups, summing up the term frequency-group frequency scores for each second n-gram for that group; and
  
  determining a likelihood that the name belongs to one group of the multiple groups based on the summed scores, wherein a largest summed score indicates a greater likelihood that the name belongs to the one group.
- View Dependent Claims (16, 17, 18, 19, 20, 21)
- - 16. The computer program product of claim 15, wherein each of the multiple groups comprises one of a language and a culture.
  - 17. The computer program product of claim 15, wherein the computer readable program when executed by the processor on the computer causes the computer to:
    - normalize a likelihood that the name belongs to the one group.
  - 18. The computer program product of claim 15, wherein the one group is a first group and wherein the computer readable program when executed by the processor on the computer causes the computer to:
    - determine a likelihood that the name belongs to a second group based on the summed scores, wherein a second largest score indicates the likelihood that the name belongs to the second group.
  - 19. The computer program product of claim 18, wherein the computer readable program when executed by the processor on the computer causes the computer to:
    - classify the name as belonging to either the first group or second group based on the likelihood that the name belongs to the first group and the likelihood that the name belongs to the second group.
  - 20. The computer program product of claim 15, wherein the computer readable program when executed by the processor on the computer causes the computer to:
    - determine that the name is a surname;
      
      assign a surname weight to the name based on the determination that the name is a surname;
      
      determine a weighted likelihood that the surname belongs to a first group by multiplying a likelihood that the surname belongs to a first group by the surname weight;
      
      access a given name that corresponds to the surname, wherein the given name and the surname form a complete name;
      
      determine a likelihood that the given name belongs to the first group;
      
      assign a given-name weight to the given name;
      
      determine a weighted likelihood that the given name belongs to the first group by multiplying a likelihood that the given name belongs to the first group by the given-name weight; and
      
      determine a likelihood that the complete name belongs to the first group by adding the weighted likelihood that the given name belongs to the first group and the weighted likelihood that the surname belongs to the first group.
  - 21. The computer program product of claim 15, wherein the name is a first name and wherein the computer readable program when executed by the processor on the computer causes the computer to:
    - determine that the first name occupies a given name field of a larger name;
      
      determine that a second name occupies a second given name field of the larger name, wherein the first name and the second name form a complete given name;
      
      access the second name;
      
      determine a likelihood that the second name belongs to the first group; and
      
      determine a likelihood that the complete given name belongs to the first group by averaging the likelihood that the second name belongs to the first group and the likelihood that the complete name belongs to the first group.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Williams, Charles K.

Granted Patent

US 8,229,737 B2
Time in Patent Office

Days
Field of Search
US Class Current

706/52
CPC Class Codes

G06F 16/353 into predefined classes

NAME CLASSIFIER TECHNIQUE

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

NAME CLASSIFIER TECHNIQUE

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links