Identifying related names
First Claim
1. A method for identifying related names, comprising:
- storing, using a processor of a computer, a collection of names from different languages, wherein each of the names has a native orthographic form and a romanized form;
receiving an input name in a known encoding scheme;
determining an alphabet of the input name based on the known encoding scheme;
generating romanized names based on the input name and the determined alphabet using multiple transliteration schemas;
identifying a culture associated with the input name;
selecting one or more culture-sensitive regularization rules for the identified culture, wherein there are different culture-sensitive regularization rules for different cultures;
applying the selected one or more culture-sensitive regularization rules to one of the romanized names to create an additional romanized name;
matching the romanized names and the additional romanized name against the romanized names in the collection of names from the different languages; and
returning data store records that have romanized names that match at least one of the romanized names and the additional romanized name.
1 Assignment
0 Petitions
Accused Products
Abstract
Provided are techniques for identifying related names. A collection of names from different languages is stored, wherein each of the names has a native orthographic form and a romanized form. An input name is received in a known encoding scheme. An alphabet of the input name is determined based on the known encoding scheme. One or more romanized names are generated based on the query name and the determined query name alphabet. Culture-sensitive regularization rules are applied to create an additional romanized name. The one or more romanized names and the additional romanized name are matched against the romanized names in the collection of names from the different languages. Data store records that have romanized names that match the one or more romanized names or the additional romanized name are returned.
93 Citations
18 Claims
-
1. A method for identifying related names, comprising:
-
storing, using a processor of a computer, a collection of names from different languages, wherein each of the names has a native orthographic form and a romanized form; receiving an input name in a known encoding scheme; determining an alphabet of the input name based on the known encoding scheme; generating romanized names based on the input name and the determined alphabet using multiple transliteration schemas; identifying a culture associated with the input name; selecting one or more culture-sensitive regularization rules for the identified culture, wherein there are different culture-sensitive regularization rules for different cultures; applying the selected one or more culture-sensitive regularization rules to one of the romanized names to create an additional romanized name; matching the romanized names and the additional romanized name against the romanized names in the collection of names from the different languages; and returning data store records that have romanized names that match at least one of the romanized names and the additional romanized name. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer system for identifying related names, comprising:
-
a processor; and a storage device connected to the processor, wherein the storage device has stored thereon a program, and wherein the processor is configured to execute instructions of the program to perform operations, wherein the operations comprise; storing a collection of names from different languages, wherein each of the names has a native orthographic form and a romanized form; receiving an input name in a known encoding scheme; determining an alphabet of the input name based on the known encoding scheme; generating romanized names based on the input name and the determined alphabet using multiple transliteration schemas; identifying a culture associated with the input name; selecting one or more culture-sensitive regularization rules for the identified culture, wherein there are different culture-sensitive regularization rules for different cultures; applying the selected one or more culture-sensitive regularization rules to one of the romanized names to create an additional romanized name; matching the romanized names and the additional romanized name against the romanized names in the collection of names from the different languages; and returning data store records that have romanized names that match at least one of the romanized names and the additional romanized name. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A computer program product for identifying related names, the computer program product comprising:
a non-transitory computer readable storage medium having computer readable program code embodied therewith, wherein the computer readable program code, when executed by a processor of a computer, is configured to perform; storing a collection of names from different languages, wherein each of the names has a native orthographic form and a romanized form; receiving an input name in a known encoding scheme; determining an alphabet of the input name based on the known encoding scheme; generating romanized names based on the input name and the determined alphabet using multiple transliteration schemas; identifying a culture associated with the input name; selecting one or more culture-sensitive regularization rules for the identified culture, wherein there are different culture-sensitive regularization rules for different cultures; applying the selected one or more culture-sensitive regularization rules to one of the romanized names to create an additional romanized name; matching the romanized names and the additional romanized name against the romanized names in the collection of names from the different languages; and returning data store records that have romanized names that match at least one of the romanized names and the additional romanized name. - View Dependent Claims (14, 15, 16, 17, 18)
Specification