System for adaptive multi-cultural searching and matching of personal names
First Claim
1. An apparatus comprising a tangible computer readable storage medium having instructions stored thereon that when executed by a machine result in at least the following:
- classifying a text input name as belonging to a particular culture by;
using a high frequency name data store of names that occur frequently in particular cultures, wherein, when there is a match with a name in the high frequency name data store of names, the particular culture associated with retrieved name and a confidence score associated with the retrieved name are recorded,determining whether morphemes in a morpheme data store are present in the input name by searching for matching substrings of name segments in the input name, and wherein, for each morpheme found in the input name, the particular culture associated with the morpheme and a confidence level associated with the morpheme are recorded, searching the input name for strings of letters that occur with statistical significance in particular cultures, wherein, for each n-gram present in an associated n-gram data store, when a match is found, the culture and score associated with that n-gram are recorded, andbreaking the name into segments and using information in the segments to match at least one of a title, an affix, and a qualifier of the text input name, wherein, for each segment present in the input name that matches a particle in a data store, the culture associated with that particle and a confidence score associated with that particle are recorded;
accessing the text input name entered as an input name by one or more of a user or a system;
determining multiple phonetic representations for a portion of the text input name, each of the multiple phonetic representations being for a different pronunciation of the text input name;
comparing each of the multiple phonetic representations of the portion of the text input name to a phonetic representation of a portion of a text known name stored in a database, wherein comparing each of the multiple phonetic representations of the portion of the text input name to the phonetic representation of the portion of the text known name comprises comparing, for at least one of the multiple phonetic representations of the portion of the text input name, corresponding parts of (i) the at least one phonetic representation of the portion of the text input name and (ii) the phonetic representation of the portion of the text known name, wherein the corresponding parts include parts that correspond at a phonologic level, wherein the parts that correspond at the phonologic level include (i) a first part that relates to a final phoneme of the portion of the text input name and (ii) a second part that relates to a final phoneme of the portion of the text known name; and
providing an indication of whether the text input name matches the text known name based on the comparing.
0 Assignments
0 Petitions
Accused Products
Abstract
An automated name searching system incorporates an automatic name classifier and a multi-path architecture in which different algorithms are applied based on cultural identity of the query name. The name classifier operates with a preemptive list, analysis of morphological elements, length, and linguistic rules. A name regularizer produces a character based computational representation of the name. A pronunciation equivalent representation such as an IPA language representation, and language specific rules to generate name searching keys, are used in a first pass to eliminate database entries which are obviously not matches for the query name. The methods can also be implemented as a callable set of library routines including an intelligent preprocessor and a name evaluator that produces a score comparing a query name and database name, based on a variety of user-adjustable parameters. The user-controlled parameters permit tuning of the search methodologies for specific custom applications.
-
Citations
20 Claims
-
1. An apparatus comprising a tangible computer readable storage medium having instructions stored thereon that when executed by a machine result in at least the following:
-
classifying a text input name as belonging to a particular culture by; using a high frequency name data store of names that occur frequently in particular cultures, wherein, when there is a match with a name in the high frequency name data store of names, the particular culture associated with retrieved name and a confidence score associated with the retrieved name are recorded, determining whether morphemes in a morpheme data store are present in the input name by searching for matching substrings of name segments in the input name, and wherein, for each morpheme found in the input name, the particular culture associated with the morpheme and a confidence level associated with the morpheme are recorded, searching the input name for strings of letters that occur with statistical significance in particular cultures, wherein, for each n-gram present in an associated n-gram data store, when a match is found, the culture and score associated with that n-gram are recorded, and breaking the name into segments and using information in the segments to match at least one of a title, an affix, and a qualifier of the text input name, wherein, for each segment present in the input name that matches a particle in a data store, the culture associated with that particle and a confidence score associated with that particle are recorded; accessing the text input name entered as an input name by one or more of a user or a system; determining multiple phonetic representations for a portion of the text input name, each of the multiple phonetic representations being for a different pronunciation of the text input name; comparing each of the multiple phonetic representations of the portion of the text input name to a phonetic representation of a portion of a text known name stored in a database, wherein comparing each of the multiple phonetic representations of the portion of the text input name to the phonetic representation of the portion of the text known name comprises comparing, for at least one of the multiple phonetic representations of the portion of the text input name, corresponding parts of (i) the at least one phonetic representation of the portion of the text input name and (ii) the phonetic representation of the portion of the text known name, wherein the corresponding parts include parts that correspond at a phonologic level, wherein the parts that correspond at the phonologic level include (i) a first part that relates to a final phoneme of the portion of the text input name and (ii) a second part that relates to a final phoneme of the portion of the text known name; and providing an indication of whether the text input name matches the text known name based on the comparing. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. An apparatus comprising a tangible computer readable storage medium having instructions stored thereon that when executed by a machine result in at least the following:
-
classifying a text input name as belonging to a particular culture by; using a high frequency name data store of names that occur frequently in particular cultures, wherein, when there is a match with a name in the high frequency name data store of names, the particular culture associated with retrieved name and a confidence score associated with the retrieved name are recorded, determining whether morphemes in a morpheme data store are present in the input name by searching for matching substrings of name segments in the input name, and wherein, for each morpheme found in the input name, the particular culture associated with the morpheme and a confidence level associated with the morpheme are recorded, searching the input name for strings of letters that occur with statistical significance in particular cultures, wherein, for each n-gram present in an associated n-gram data store, when a match is found, the culture and score associated with that n-gram are recorded, and breaking the name into segments and using information in the segments to match at least one of a title, an affix, and a qualifier of the text input name, wherein, for each segment present in the input name that matches a particle in a data store, the culture associated with that particle and a confidence score associated with that particle are recorded; accessing the text input name entered as an input name by one or more of a user or a system; determining multiple phonetic representations for a portion of the text input name, each of the multiple phonetic representations being for a different pronunciation of the text input name; comparing each of the multiple phonetic representations of the portion of the text input name to a phonetic representation of a portion of a text known name stored in a database, wherein comparing each of the multiple phonetic representations of the portion of the text input name to the phonetic representation of the portion of the text known name comprises comparing, for at least one of the multiple phonetic representations of the portion of the text input name, corresponding parts of (i) the at least one phonetic representation of the portion of the text input name and (ii) the phonetic representation of the portion of the text known name, wherein the corresponding parts include parts that correspond at a syllabic level, wherein the parts that correspond at the syllabic level include (i) a first part that relates to a left-most syllable of the portion of the text input name and (ii) a second part that relates to a left-most syllable of the portion of the text known name, the first part further relates to both an initial phonologic element and a final phonologic element of the left-most syllable of the portion of the text input name, and the second part further relates to an initial phonologic element and a final phonologic element of the left-most syllable of the portion of the text known name; and providing an indication of whether the text input name matches the text known name based on the comparing. - View Dependent Claims (20)
-
Specification