NAME CLASSIFIER TECHNIQUE
First Claim
1. A method, comprising:
- accessing a name;
dividing, using a computer including a processor, the name into a series of first n-grams;
forming multiple concatenated second n-grams by concatenating pairs of the first n-grams;
for each of multiple groups, for each of the second n-grams, determining the term frequency-group frequency score using equation;
((0.5+(0.5*(number of times the second n-gram occurs in a group))/(number of times a most common n-gram occurs in the group))*((number of times the second n-gram occurs in the group)/(number of times the second n-gram occurs in the multiple groups));
for each of the multiple groups, summing up the term frequency-group frequency scores for each second n-gram for that group; and
determining a likelihood that the name belongs to one group of the multiple groups based on the summed scores, wherein a largest summed score indicates a greater likelihood that the name belongs to the one group.
1 Assignment
0 Petitions
Accused Products
Abstract
A particular technique for classifying a name includes accessing a name; dividing the name into a series of first n-grams; forming multiple concatenated second n-grams by concatenating pairs of the first n-grams; for each of multiple groups, for each of the second n-grams, determining the term frequency-group frequency score; for each of the multiple groups, summing up the term frequency-group frequency scores for each second n-gram for that group; and determining a likelihood that the name belongs to one group of the multiple groups based on the summed scores, wherein a largest summed score indicates a greater likelihood that the name belongs to the one group.
-
Citations
21 Claims
-
1. A method, comprising:
-
accessing a name; dividing, using a computer including a processor, the name into a series of first n-grams; forming multiple concatenated second n-grams by concatenating pairs of the first n-grams; for each of multiple groups, for each of the second n-grams, determining the term frequency-group frequency score using equation;
((0.5+(0.5*(number of times the second n-gram occurs in a group))/(number of times a most common n-gram occurs in the group))*((number of times the second n-gram occurs in the group)/(number of times the second n-gram occurs in the multiple groups));for each of the multiple groups, summing up the term frequency-group frequency scores for each second n-gram for that group; and determining a likelihood that the name belongs to one group of the multiple groups based on the summed scores, wherein a largest summed score indicates a greater likelihood that the name belongs to the one group. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system, comprising:
-
hardware logic performing operations, the operations comprising; accessing a name; dividing, using a computer including a processor, the name into a series of first n-grams; forming multiple concatenated second n-grams by concatenating pairs of the first n-grams; for each of multiple groups, for each of the second n-grams, determining the term frequency-group frequency score using equation;
((0.5+(0.5*(number of times the second n-gram occurs in a group))/(number of times a most common n-gram occurs in the group))*((number of times the second n-gram occurs in the group)/(number of times the second n-gram occurs in the multiple groups));for each of the multiple groups, summing up the term frequency-group frequency scores for each second n-gram for that group; and determining a likelihood that the name belongs to one group of the multiple groups based on the summed scores, wherein a largest summed score indicates a greater likelihood that the name belongs to the one group. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A computer program product comprising a computer readable storage medium including a computer readable program, wherein the computer readable program when executed by a processor on a computer causes the computer to:
-
access a name; divide the name into a series of first n-grams; forming multiple concatenated second n-grams by concatenating pairs of the first n-grams; for each of multiple groups, for each of the second n-grams, determining the term frequency-group frequency score using equation;
((0.5+(0.5*(number of times the second n-gram occurs in a group))/(number of times a most common n-gram occurs in the group))*((number of times the second n-gram occurs in the group)/(number of times the second n-gram occurs in the multiple groups));for each of the multiple groups, summing up the term frequency-group frequency scores for each second n-gram for that group; and determining a likelihood that the name belongs to one group of the multiple groups based on the summed scores, wherein a largest summed score indicates a greater likelihood that the name belongs to the one group. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
Specification