System and method for using a statistical classifier to score contact entities
First Claim
1. A method for associating an entity of a contact with a character string, wherein a database stores a plurality of contacts, and each contact has a defined set of entities, comprising:
- determining, for a first subgroup of the defined set of entities, using a processor, whether the character string comprises a structure associated with the first subgroup of the defined set of entities, prior to any probabilistic approximation, wherein the structure comprises at least one character other than alphanumeric characters;
replacing all numerical digits in the character string with a corresponding digit placeholder to generate a blurred character string in response to a determination that the character string does not comprise the structure associated with the first subgroup; and
determining, for a second subgroup of the defined set of entities, using a processor, a probabilistic approximation that the blurred character string is associated with at least one of the defined set of entities in the second subgroup by scoring the blurred character string for each of the defined set of entities associated with the blurred character string; and
identifying the character string as one of the set of defined entities having a highest score that satisfies a predetermined score threshold.
2 Assignments
0 Petitions
Accused Products
Abstract
A system and method for associating a character string with one or more defined entities of a contact record. An input character string is received. The string is first evaluated to see if the structure of the string is recognized. If not, then the string is compared to entries in a look up table. If the string format is not recognized, and the string is not found in the look up table, then a posterior probability is calculated for a set of defined entities over a limited set of string processing features. The result of probabilistic scoring determines which of the defined entities to associate with the character string.
130 Citations
23 Claims
-
1. A method for associating an entity of a contact with a character string, wherein a database stores a plurality of contacts, and each contact has a defined set of entities, comprising:
-
determining, for a first subgroup of the defined set of entities, using a processor, whether the character string comprises a structure associated with the first subgroup of the defined set of entities, prior to any probabilistic approximation, wherein the structure comprises at least one character other than alphanumeric characters; replacing all numerical digits in the character string with a corresponding digit placeholder to generate a blurred character string in response to a determination that the character string does not comprise the structure associated with the first subgroup; and determining, for a second subgroup of the defined set of entities, using a processor, a probabilistic approximation that the blurred character string is associated with at least one of the defined set of entities in the second subgroup by scoring the blurred character string for each of the defined set of entities associated with the blurred character string; and identifying the character string as one of the set of defined entities having a highest score that satisfies a predetermined score threshold. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method for identifying entities of a contact associated with a character string, comprising:
-
receiving a character string; determining whether the character string comprises a structure associated with a first defined entity, prior to any probabilistic approximation, wherein the structure comprises at least one character other than alphanumeric characters; determining whether the character string is found in a look up table in response to an evaluation that the character string does not comprise the structure associated with the first defined entity; replacing all numerical digits in the character string with a corresponding digit placeholder to generate a blurred character string in response to a determination that the character string is not found in the look up table; calculating, based on the blurred character string, a posterior probability to generate a score for each of a set of defined entities, and identifying the character string as one of the set of defined entities having a highest score that satisfies a predetermined score threshold. - View Dependent Claims (8, 9, 10, 11, 12, 13, 14)
-
-
15. A non-transitory computer-readable medium encoded with instruction for identifying entities of a contact associated with a character string, the instructions when executed by a processor comprise the steps of:
-
receiving a character string; evaluating whether the character string comprises a structure associated with a first defined entity, prior to any probabilistic approximation, wherein the structure comprises at least one character other than alphanumeric characters; evaluating whether the character string is found in a look up table in response to an evaluation that the character string does not comprise the structure associated with the first defined entity; replacing all numerical digits in the character string with a corresponding digit placeholder to generate a blurred character string in response to a determination that the character string is not found in the look up table; calculating, based on the blurred character string, a posterior probability to generate a score for each of a set of defined entities; and identifying the character string as one entity of the set of defined entities having a highest score that satisfies a predetermined score threshold. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22)
-
-
23. An apparatus for identifying entities of a contact associated with a character string, comprising:
-
a processor; and one or more stored sequences of instructions which, when executed by the processor, cause the processor to carry out the steps of; receiving a character string; evaluating the character string first for structure, without calculating any posterior probability, wherein the structure comprises at least one character other than alphanumeric characters, then evaluating the character string as a look up, and replacing all numerical digits in the character string with a corresponding digit placeholder to generate a blurred character string in response to a determination that the character string does not comprise structure and is not found in the look up table; calculating, based on the blurred character string, a posterior probability to generate a score for each of a set of defined entities; and identifying the character string as one entity of the set of defined entities having a highest score that satisfies a predetermined score threshold.
-
Specification