Entity normalization via name normalization
First Claim
Patent Images
1. A method of identifying duplicate objects in a plurality of objects, each of the plurality of objects having one or more associated facts, each of the one or more facts having a value, the method comprising:
- for each of the plurality of objects, normalizing the value of a name fact, the name fact being among the one or more facts associated with the object;
grouping the plurality of objects into a plurality of buckets in accordance with the normalized value of the name facts of the plurality of objects; and
applying a matcher to a pair of objects in one of the plurality of buckets to determine if the pair of objects are duplicates, one of the pair of objects having an associated fact that is not a common fact of the pair of objects.
2 Assignments
0 Petitions
Accused Products
Abstract
A system and method identifying duplicate objects from a plurality of objects. The system and method normalizes name values of objects, groups objects into buckets based at least in part on the normalized name values, matches objects within the same bucket based on a selected matcher, and identifies the matching objects as duplicate objects.
-
Citations
12 Claims
-
1. A method of identifying duplicate objects in a plurality of objects, each of the plurality of objects having one or more associated facts, each of the one or more facts having a value, the method comprising:
-
for each of the plurality of objects, normalizing the value of a name fact, the name fact being among the one or more facts associated with the object;
grouping the plurality of objects into a plurality of buckets in accordance with the normalized value of the name facts of the plurality of objects; and
applying a matcher to a pair of objects in one of the plurality of buckets to determine if the pair of objects are duplicates, one of the pair of objects having an associated fact that is not a common fact of the pair of objects. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system for identifying duplicate objects in a plurality of objects, each of the plurality of objects having one or more associated facts, each of the one or more facts having a value, the system comprising:
-
a processor for executing programs; and
a subsystem executable by the processor, the subsystem including;
instructions for normalizing the value of a name fact of each of the plurality of objects, the name fact being among the one or more facts associated with the object;
instructions for grouping the plurality of objects into a plurality of buckets in accordance with the normalized value of the name facts of the plurality of objects; and
instructions for applying a matcher to a pair of objects in one of the plurality of buckets to determine if the pair of objects are duplicates, one of the pair of objects having an associated fact that is not a common fact of the pair of objects.
-
-
12. A computer program product for use in conjunction with a computer system, the computer program product comprising a computer readable storage medium and a computer program mechanism embedded therein, the computer program mechanism including:
-
instructions for normalizing the value of a name fact of each of the plurality of objects, the name fact being among the one or more facts associated with the object;
instructions for grouping the plurality of objects into a plurality of buckets in accordance with the normalized value of the name facts of the plurality of objects; and
instructions for applying a matcher to a pair of objects in one of the plurality of buckets to determine if the pair of objects are duplicates, one of the pair of objects having an associated fact that is not a common fact of the pair of objects.
-
Specification