Associating data records in multiple languages

US 8,417,702 B2
Filed: 09/26/2008
Issued: 04/09/2013
Est. Priority Date: 09/28/2007
Status: Expired due to Fees

First Claim

Patent Images

1. A computer-implemented method for processing data records in multiple languages within an identity hub to identify data records in different languages associated with a common entity, the method comprising:

associating a data record received at said identity hub with a language;

mapping said language to a particular member type in said identity hub;

applying a language-specific derivation code on attributes of said data record based on said particular member type, wherein said language-specific derivation code comprises language-specific standardization and language-independent bucketing, and wherein said language-specific standardization standardizes said attributes of said data record with respect to said language and said language-independent bucketing produces a set of candidate data records for comparison with said data record;

comparing said standardized attributes of said data record with one or more standardized attributes of said candidate data records, wherein said comparing includes;

generating a weight for one or more attributes of first and second data records to be compared; and

comparing said attributes of said first data record in a first language and said attributes of said second data record in a second language to determine a score based on said attribute weights;

linking said data record to one or more candidate data records based on the score to associate said data record with one or more entities associated with said linked candidate data records; and

selecting and applying one or more algorithms to process entities in individual languages, wherein at least two data records of different languages are associated with a common entity.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Embodiments disclosed herein provide a system and method for associating data records in multiple languages within a single hub. As a record comes in from an information source coupled to the hub, it is associated with a particular language at a core layer. The hub maps each language one-to-one to a member type. For each data record of a particular member type, unique derivation code is utilized to perform standardization and bucketing at a derived layer. A weight may be used to balance the richness of languages so that data records in different languages can have the same statistical meaning. Since attributes are standardized with respect to a language of a data record, appropriate languages or script can be passed along with the data record. The hub can then match the data record to the optimum algorithm(s) for entity processing at an entity layer.

261 Citations

14 Claims

1. A computer-implemented method for processing data records in multiple languages within an identity hub to identify data records in different languages associated with a common entity, the method comprising:
- associating a data record received at said identity hub with a language;
  
  mapping said language to a particular member type in said identity hub;
  
  applying a language-specific derivation code on attributes of said data record based on said particular member type, wherein said language-specific derivation code comprises language-specific standardization and language-independent bucketing, and wherein said language-specific standardization standardizes said attributes of said data record with respect to said language and said language-independent bucketing produces a set of candidate data records for comparison with said data record;
  
  comparing said standardized attributes of said data record with one or more standardized attributes of said candidate data records, wherein said comparing includes;
  
  generating a weight for one or more attributes of first and second data records to be compared; and
  
  comparing said attributes of said first data record in a first language and said attributes of said second data record in a second language to determine a score based on said attribute weights;
  
  linking said data record to one or more candidate data records based on the score to associate said data record with one or more entities associated with said linked candidate data records; and
  
  selecting and applying one or more algorithms to process entities in individual languages, wherein at least two data records of different languages are associated with a common entity.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, wherein each of said multiple languages is mapped one-to-one to a particular member type in said identity hub.
  - 3. The method of claim 2, wherein said language-specific derivation code is selectively applied based on said particular member type.
  - 4. The method of claim 2, wherein said data records in said multiple languages share attribute types within a language definition in said identity hub.
  - 5. The method of claim 1, wherein said attributes of said data record are in two or more languages and wherein said language is selected from said two or more languages.
  - 6. The method of claim 1, wherein said data records are from a plurality of information sources accessible by said identity hub, wherein said plurality of information sources are in one or more languages.
  - 7. The method of claim 1, wherein associating said data record with said language further comprises:
    - evaluating said data record to obtain a country code; and
      
      determining said language utilizing said country code.
  - 8. The method of claim 1, wherein said language is a default language.

9. A computer readable storage medium storing computer instructions executable by a processor, wherein when executed by said processor said computer instructions cause a computer to:
- associate a data record received at an identity hub with a language;
  
  map said language to a particular member type;
  
  apply a language-specific derivation code on attributes of said data record based on said particular member type, wherein said language-specific derivation code comprises language-specific standardization and language-independent bucketing, and wherein said language-specific standardization standardizes said attributes of said data record with respect to said language and said language-independent bucketing produces a set of candidate data records for comparison with said data record;
  
  compare said standardized attributes of said data record with one or more standardized attributes of said candidate data records, wherein said comparing includes;
  
  generating a weight for one or more attributes of first and second data records to be compared; and
  
  comparing said attributes of said first data record in a first language and said attributes of said second data record in a second language to determine a score based on said attribute weights;
  
  link said data record to one or more candidate data records based on the score to associate said data record with one or more entities associated with said linked candidate data records; and
  
  select and apply one or more algorithms to process entities in individual languages, wherein at least two data records of different languages are associated with a common entity.
- View Dependent Claims (10, 11, 12)
- - 10. The computer readable storage medium of claim 9, wherein when executed by said processor said computer instructions further cause said computer to:
    - evaluate said data record to obtain a country code; and
      
      determine said language utilizing said country code.
  - 11. The computer readable storage medium of claim 9, wherein said attributes of said data record are in two or more languages and wherein said language is selected from said two or more languages.
  - 12. The computer readable storage medium of claim 9, wherein said data records are from a plurality of information sources accessible by said identity hub, wherein said plurality of information sources are in one or more languages.

13. A system for processing data records in multiple languages to identify data records in different languages associated with a common entity, comprising:
- at least one processor; and
  
  at least one computer readable storage medium accessible by said at least one processor and storing computer instructions executable by said at least one processor, wherein when executed by said at least one processor said computer instructions cause said system to;
  
  associate a data record received at an identity hub with a language;
  
  map said language to a particular member type;
  
  apply a language-specific derivation code on attributes of said data record based on said particular member type, wherein said language-specific derivation code comprises language-specific standardization and language-independent bucketing, and wherein said language-specific standardization standardizes said attributes of said data record with respect to said language and said language-independent bucketing produces a set of candidate data records for comparison with said data record;
  
  compare said standardized attributes of said data record with one or more standardized attributes of said candidate data records, wherein said comparing includes;
  
  generating a weight for one or more attributes of first and second data records to be compared; and
  
  comparing said attributes of said first data record in a first language and said attributes of said second data record in a second language to determine a score based on said attribute weights;
  
  link said data record to one or more candidate data records based on the score to associate said data record with one or more entities associated with said linked candidate data records; and
  
  select and apply one or more algorithms to process entities in individual languages, wherein at least two data records of different languages are associated with a common entity.
- View Dependent Claims (14)
- - 14. The system of claim 13, wherein when executed by said at least one processor said computer instructions further cause said system to:
    - evaluate said data record to obtain a country code; and
      
      determine said language utilizing said country code.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Harger, Douglas Scott, Schumacher, Scott
Primary Examiner(s)
Alam, Hosain
Assistant Examiner(s)
Tang, Jieying

Application Number

US12/239,380
Publication Number

US 20090089332A1
Time in Patent Office

1,656 Days
Field of Search

707/999.104, 707/730, 707/736, 707/739, 707/748, 707/750, 707/758, 704/2, 704/8
US Class Current

707/736
CPC Class Codes

G06F 16/215 Improving data quality; Dat...

G06F 40/197 Version control for softwar...

Associating data records in multiple languages

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

261 Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

Associating data records in multiple languages

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

261 Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links