Identifying related names

US 8,812,300 B2
Filed: 09/22/2011
Issued: 08/19/2014
Est. Priority Date: 03/25/1998
Status: Expired due to Fees

First Claim

Patent Images

1. A method for identifying related names, comprising:

storing, using a processor of a computer, a collection of names from different languages, wherein each of the names has a native orthographic form and a romanized form;

receiving an input name in a known encoding scheme;

determining an alphabet of the input name based on the known encoding scheme;

generating romanized names based on the input name and the determined alphabet using multiple transliteration schemas;

identifying a culture associated with the input name;

selecting one or more culture-sensitive regularization rules for the identified culture, wherein there are different culture-sensitive regularization rules for different cultures;

applying the selected one or more culture-sensitive regularization rules to one of the romanized names to create an additional romanized name;

matching the romanized names and the additional romanized name against the romanized names in the collection of names from the different languages; and

returning data store records that have romanized names that match at least one of the romanized names and the additional romanized name.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Provided are techniques for identifying related names. A collection of names from different languages is stored, wherein each of the names has a native orthographic form and a romanized form. An input name is received in a known encoding scheme. An alphabet of the input name is determined based on the known encoding scheme. One or more romanized names are generated based on the query name and the determined query name alphabet. Culture-sensitive regularization rules are applied to create an additional romanized name. The one or more romanized names and the additional romanized name are matched against the romanized names in the collection of names from the different languages. Data store records that have romanized names that match the one or more romanized names or the additional romanized name are returned.

93 Citations

View as Search Results

18 Claims

1. A method for identifying related names, comprising:
- storing, using a processor of a computer, a collection of names from different languages, wherein each of the names has a native orthographic form and a romanized form;
  
  receiving an input name in a known encoding scheme;
  
  determining an alphabet of the input name based on the known encoding scheme;
  
  generating romanized names based on the input name and the determined alphabet using multiple transliteration schemas;
  
  identifying a culture associated with the input name;
  
  selecting one or more culture-sensitive regularization rules for the identified culture, wherein there are different culture-sensitive regularization rules for different cultures;
  
  applying the selected one or more culture-sensitive regularization rules to one of the romanized names to create an additional romanized name;
  
  matching the romanized names and the additional romanized name against the romanized names in the collection of names from the different languages; and
  
  returning data store records that have romanized names that match at least one of the romanized names and the additional romanized name.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, wherein the culture-sensitive regularization rules are applied using an automatic cultural name classifier.
  - 3. The method of claim 1, wherein the culture-sensitive regularization rules are applied using a user-supplied cultural value.
  - 4. The method of claim 1, further comprising:
    - producing multiple transliterated forms of the input name, wherein at least one of the transliterated forms is a regularized form derived from the application of the culture-sensitive regularization rules to a transliterated form.
  - 5. The method of claim 1, wherein there are different culture-sensitive regularization rules for different languages.
  - 6. The method of claim 1, wherein a culture-sensitive regularization rule collapses two representations of a sound into a single symbol.

7. A computer system for identifying related names, comprising:
- a processor; and
  
  a storage device connected to the processor, wherein the storage device has stored thereon a program, and wherein the processor is configured to execute instructions of the program to perform operations, wherein the operations comprise;
  
  storing a collection of names from different languages, wherein each of the names has a native orthographic form and a romanized form;
  
  receiving an input name in a known encoding scheme;
  
  determining an alphabet of the input name based on the known encoding scheme;
  
  generating romanized names based on the input name and the determined alphabet using multiple transliteration schemas;
  
  identifying a culture associated with the input name;
  
  selecting one or more culture-sensitive regularization rules for the identified culture, wherein there are different culture-sensitive regularization rules for different cultures;
  
  applying the selected one or more culture-sensitive regularization rules to one of the romanized names to create an additional romanized name;
  
  matching the romanized names and the additional romanized name against the romanized names in the collection of names from the different languages; and
  
  returning data store records that have romanized names that match at least one of the romanized names and the additional romanized name.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The computer system of claim 7, wherein the culture-sensitive regularization rules are applied using an automatic cultural name classifier.
  - 9. The computer system of claim 7, wherein the culture-sensitive regularization rules are applied using a user-supplied cultural value.
  - 10. The computer system of claim 7, wherein the operations further comprise:
    - producing multiple transliterated forms of the input name, wherein at least one of the transliterated forms is a regularized form derived from the application of the culture-sensitive regularization rules to a transliterated form.
  - 11. The computer system of claim 7, wherein there are different culture-sensitive regularization rules for different languages.
  - 12. The computer system of claim 7, wherein a culture-sensitive regularization rule collapses two representations of a sound into a single symbol.

13. A computer program product for identifying related names, the computer program product comprising:
- a non-transitory computer readable storage medium having computer readable program code embodied therewith, wherein the computer readable program code, when executed by a processor of a computer, is configured to perform;
  
  storing a collection of names from different languages, wherein each of the names has a native orthographic form and a romanized form;
  
  receiving an input name in a known encoding scheme;
  
  determining an alphabet of the input name based on the known encoding scheme;
  
  generating romanized names based on the input name and the determined alphabet using multiple transliteration schemas;
  
  identifying a culture associated with the input name;
  
  selecting one or more culture-sensitive regularization rules for the identified culture, wherein there are different culture-sensitive regularization rules for different cultures;
  
  applying the selected one or more culture-sensitive regularization rules to one of the romanized names to create an additional romanized name;
  
  matching the romanized names and the additional romanized name against the romanized names in the collection of names from the different languages; and
  
  returning data store records that have romanized names that match at least one of the romanized names and the additional romanized name.
- View Dependent Claims (14, 15, 16, 17, 18)
- - 14. The computer program product of claim 13, wherein the culture-sensitive regularization rules are applied using an automatic cultural name classifier.
  - 15. The computer program product of claim 13, wherein the culture-sensitive regularization rules are applied using a user-supplied cultural value.
  - 16. The computer program product of claim 13, wherein the computer readable program code, when executed by the processor of the computer, is configured to perform:
    - producing multiple transliterated forms of the input name, wherein at least one of the transliterated forms is a regularized form derived from the application of the culture-sensitive regularization rules to a transliterated form.
  - 17. The computer program product of claim 13, wherein there are different culture-sensitive regularization rules for different languages.
  - 18. The computer program product of claim 13, wherein a culture-sensitive regularization rule collapses two representations of a sound into a single symbol.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Gillam, Richard T., Patman Maguire, Frankie E., Shaefer, Leonard A. Jr.
Primary Examiner(s)
GUERRA-ERAZO, EDGAR X

Application Number

US13/240,891
Publication Number

US 20120016663A1
Time in Patent Office

1,062 Days
Field of Search

704/10, 704/4, 704/8, 704/9, 704/270, 704/270.1, 704/275
US Class Current

704/9
CPC Class Codes

G06F 16/33 Querying

G06F 16/90344 by using string matching te...

Identifying related names

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

93 Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Identifying related names

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

93 Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links