Associating data records in multiple languages
First Claim
1. A computer-implemented method for processing data records in multiple languages within an identity hub to identify data records in different languages associated with a common entity, the method comprising:
- associating a data record received at said identity hub with a language;
mapping said language to a particular member type in said identity hub;
applying a language-specific derivation code on attributes of said data record based on said particular member type, wherein said language-specific derivation code comprises language-specific standardization and language-independent bucketing, and wherein said language-specific standardization standardizes said attributes of said data record with respect to said language and said language-independent bucketing produces a set of candidate data records for comparison with said data record;
comparing said standardized attributes of said data record with one or more standardized attributes of said candidate data records, wherein said comparing includes;
generating a weight for one or more attributes of first and second data records to be compared; and
comparing said attributes of said first data record in a first language and said attributes of said second data record in a second language to determine a score based on said attribute weights;
linking said data record to one or more candidate data records based on the score to associate said data record with one or more entities associated with said linked candidate data records; and
selecting and applying one or more algorithms to process entities in individual languages, wherein at least two data records of different languages are associated with a common entity.
2 Assignments
0 Petitions
Accused Products
Abstract
Embodiments disclosed herein provide a system and method for associating data records in multiple languages within a single hub. As a record comes in from an information source coupled to the hub, it is associated with a particular language at a core layer. The hub maps each language one-to-one to a member type. For each data record of a particular member type, unique derivation code is utilized to perform standardization and bucketing at a derived layer. A weight may be used to balance the richness of languages so that data records in different languages can have the same statistical meaning. Since attributes are standardized with respect to a language of a data record, appropriate languages or script can be passed along with the data record. The hub can then match the data record to the optimum algorithm(s) for entity processing at an entity layer.
261 Citations
14 Claims
-
1. A computer-implemented method for processing data records in multiple languages within an identity hub to identify data records in different languages associated with a common entity, the method comprising:
-
associating a data record received at said identity hub with a language; mapping said language to a particular member type in said identity hub; applying a language-specific derivation code on attributes of said data record based on said particular member type, wherein said language-specific derivation code comprises language-specific standardization and language-independent bucketing, and wherein said language-specific standardization standardizes said attributes of said data record with respect to said language and said language-independent bucketing produces a set of candidate data records for comparison with said data record; comparing said standardized attributes of said data record with one or more standardized attributes of said candidate data records, wherein said comparing includes; generating a weight for one or more attributes of first and second data records to be compared; and comparing said attributes of said first data record in a first language and said attributes of said second data record in a second language to determine a score based on said attribute weights; linking said data record to one or more candidate data records based on the score to associate said data record with one or more entities associated with said linked candidate data records; and selecting and applying one or more algorithms to process entities in individual languages, wherein at least two data records of different languages are associated with a common entity. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer readable storage medium storing computer instructions executable by a processor, wherein when executed by said processor said computer instructions cause a computer to:
-
associate a data record received at an identity hub with a language; map said language to a particular member type; apply a language-specific derivation code on attributes of said data record based on said particular member type, wherein said language-specific derivation code comprises language-specific standardization and language-independent bucketing, and wherein said language-specific standardization standardizes said attributes of said data record with respect to said language and said language-independent bucketing produces a set of candidate data records for comparison with said data record; compare said standardized attributes of said data record with one or more standardized attributes of said candidate data records, wherein said comparing includes; generating a weight for one or more attributes of first and second data records to be compared; and comparing said attributes of said first data record in a first language and said attributes of said second data record in a second language to determine a score based on said attribute weights; link said data record to one or more candidate data records based on the score to associate said data record with one or more entities associated with said linked candidate data records; and select and apply one or more algorithms to process entities in individual languages, wherein at least two data records of different languages are associated with a common entity. - View Dependent Claims (10, 11, 12)
-
-
13. A system for processing data records in multiple languages to identify data records in different languages associated with a common entity, comprising:
-
at least one processor; and at least one computer readable storage medium accessible by said at least one processor and storing computer instructions executable by said at least one processor, wherein when executed by said at least one processor said computer instructions cause said system to; associate a data record received at an identity hub with a language; map said language to a particular member type; apply a language-specific derivation code on attributes of said data record based on said particular member type, wherein said language-specific derivation code comprises language-specific standardization and language-independent bucketing, and wherein said language-specific standardization standardizes said attributes of said data record with respect to said language and said language-independent bucketing produces a set of candidate data records for comparison with said data record; compare said standardized attributes of said data record with one or more standardized attributes of said candidate data records, wherein said comparing includes; generating a weight for one or more attributes of first and second data records to be compared; and comparing said attributes of said first data record in a first language and said attributes of said second data record in a second language to determine a score based on said attribute weights; link said data record to one or more candidate data records based on the score to associate said data record with one or more entities associated with said linked candidate data records; and select and apply one or more algorithms to process entities in individual languages, wherein at least two data records of different languages are associated with a common entity. - View Dependent Claims (14)
-
Specification