×

Data de-duplication

  • US 7,200,604 B2
  • Filed: 02/17/2004
  • Issued: 04/03/2007
  • Est. Priority Date: 02/17/2004
  • Status: Active Grant
First Claim
Patent Images

1. A processor-implemented method for generating masks for data de-duplication from entity eponym data fields in a given set of data records, said data records each including an entity location data field, the method comprising:

  • for each data record, splitting each entity eponym data field into a corresponding prefix-suffix combination, and for each prefix, a processor computing a tally of distinct entity locations, and for each prefix and entity location combination, the processor computing a tally of distinct suffixes; and

    setting, by the processor, a threshold boundary wherein a prefix is defined as one of said masks when one or more of the tallies are indicative of different eponyms signifying a particular entity, wherein the one mask enables a particular data record to be matched to the particular entity by ignoring a portion of the particular data record, wherein said de-duplication involves matching each data record representing a specific activity to the particular entity of a plurality of known entities such that duplication of entities is reduced in a database of said plurality of known entities.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×