Entity normalization via name normalization
First Claim
1. A computer-implemented method of identifying duplicate objects in a plurality of objects, wherein each object in the plurality of objects is associated with one or more facts, and each of the one or more facts has an attribute and a value, the method comprising using a computer processor to perform:
- associating facts extracted from web documents with the plurality of objects;
for each of the plurality of objects, normalizing a value of a name fact, the name fact being among one or more facts associated with the object;
based on the normalized values of the name facts, grouping the plurality of objects into a plurality of buckets, each object in a bucket having the same normalized value of a name fact;
processing the plurality of objects in a bucket to identify at least one pair of duplicate objects in the plurality of objects in the bucket, based on a similarity of values of facts other than the name fact for the objects in the bucket; and
merging the duplicate objects together, the merging including removing one of the duplicate objects from a memory repository.
2 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods for normalizing entities via name normalization are disclosed. In some implementations, a computer-implemented method of identifying duplicate objects in a plurality of objects is provided. Each object in the plurality of objects is associated with one or more facts, and each of the one or more facts having a value. The method includes: using a computer processor to perform: associating facts extracted from web documents with a plurality of objects; and for each of the plurality of objects, normalizing the value of a name fact, the name fact being among one or more facts associated with the object; processing the plurality of objects in accordance with the normalized value of the name facts of the plurality of objects. In some implementations, normalizing the value of the name fact is optionally carried out by applying a group of normalization rules to the value of the name fact.
-
Citations
18 Claims
-
1. A computer-implemented method of identifying duplicate objects in a plurality of objects, wherein each object in the plurality of objects is associated with one or more facts, and each of the one or more facts has an attribute and a value, the method comprising using a computer processor to perform:
-
associating facts extracted from web documents with the plurality of objects; for each of the plurality of objects, normalizing a value of a name fact, the name fact being among one or more facts associated with the object; based on the normalized values of the name facts, grouping the plurality of objects into a plurality of buckets, each object in a bucket having the same normalized value of a name fact; processing the plurality of objects in a bucket to identify at least one pair of duplicate objects in the plurality of objects in the bucket, based on a similarity of values of facts other than the name fact for the objects in the bucket; and merging the duplicate objects together, the merging including removing one of the duplicate objects from a memory repository. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system for identifying duplicate objects in a plurality of objects, wherein each object in the plurality of objects is associated with one or more facts, and each of the one or more facts has an attribute and a value, the system comprising:
-
memory; one or more processors; and one or more programs stored in the memory and configured for execution by the one or more processors, the one or more programs including instructions for; associating facts extracted from web documents with the plurality of objects; for each of the plurality of objects, normalizing a value of a name fact, the name fact being among one or more facts associated with the object; based on the normalized values of the name facts, grouping the plurality of objects into a plurality of buckets, each object in a bucket having the same normalized value of a name fact; processing the plurality of objects in a bucket to identify at least one pair of duplicate objects in the plurality of objects in the bucket, based on a similarity of values of facts other than the name fact for the objects in the bucket; and merging the duplicate objects together, the merging including removing one of the duplicate objects from a memory repository. - View Dependent Claims (12, 13, 14)
-
-
15. A non-transitory computer readable storage medium storing one or more programs for identifying duplicate objects in a plurality of objects, wherein each object in the plurality of objects is associated with one or more facts, and each of the one or more facts has an attribute and a value, the one or more programs comprising instructions for:
-
associating facts extracted from web documents with the plurality of objects; for each of the plurality of objects, normalizing a value of a name fact, the name fact being among one or more facts associated with the object; based on the normalized values of the name facts, grouping the plurality of objects into a plurality of buckets, each object in a bucket having the same normalized value of a name fact; processing the plurality of objects in a bucket to identify at least one pair of duplicate objects in the plurality of objects in the bucket, based on a similarity of values of facts other than the name fact for the objects in the bucket; and merging the duplicate objects together, the merging including removing one of the duplicate objects from a memory repository. - View Dependent Claims (16, 17, 18)
-
Specification