×

Entity normalization via name normalization

  • US 10,223,406 B2
  • Filed: 06/29/2017
  • Issued: 03/05/2019
  • Est. Priority Date: 02/17/2006
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method of identifying duplicate objects in a plurality of objects, wherein each object in the plurality of objects is associated with one or more facts, and each of the one or more facts has an attributed and a value, the method comprising, using a computer processor to perform:

  • associating facts extracted from web documents with a plurality of objects; and

    for each of the plurality of objects, normalizing the value of a name fact by applying at least one normalization rule from a group of normalization rules to the value of the name fact, the name fact being among one or more facts associated with the object;

    based on the normalized values of the name facts, grouping the plurality of objects into a plurality of buckets, each object in a bucket having the same normalized value of a name fact; and

    processing the plurality of objects in a bucket in accordance with the normalized value of the name fact of the plurality of objects to identify at least one pair of duplicate objects in the plurality of objects in the bucket, based on a similarity of values of facts other than the name fact for the objects in the bucket; and

    removing one of the duplicate objects from a memory repository, wherein the group of normalization rules includes at least one rule selected from the group of;

    removing social titles;

    removing predefined adjective words;

    removing single letter words;

    removing punctuation marks;

    removing stop words; and

    converting uppercase characters into lowercase.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×