Entity normalization via name normalization
First Claim
Patent Images
1. A computer-implemented method of identifying duplicate objects in a plurality of objects, each of the plurality of objects having one or more associated facts and being stored in a computer memory, each of the one or more facts having a value, the method comprising:
- using a computer processor to perform;
extracting facts from web documents that are located on document hosts;
associating the facts extracted from the web documents with a plurality of objects;
for each of the plurality of objects, normalizing the value of a name fact, the name fact being among the one or more facts associated with the object;
grouping the plurality of objects into a plurality of buckets in accordance with the normalized value of the name facts of the plurality of objects; and
applying a matcher to a pair of objects in one of the plurality of buckets to determine if the pair of objects are duplicates, one of the pair of objects having an associated fact that is not a common fact of the pair of objects.
2 Assignments
0 Petitions
Accused Products
Abstract
A system and method identifying duplicate objects from a plurality of objects. The system and method normalizes name values of objects, groups objects into buckets based at least in part on the normalized name values, matches objects within the same bucket based on a selected matcher, and identifies the matching objects as duplicate objects.
-
Citations
15 Claims
-
1. A computer-implemented method of identifying duplicate objects in a plurality of objects, each of the plurality of objects having one or more associated facts and being stored in a computer memory, each of the one or more facts having a value, the method comprising:
-
using a computer processor to perform; extracting facts from web documents that are located on document hosts; associating the facts extracted from the web documents with a plurality of objects; for each of the plurality of objects, normalizing the value of a name fact, the name fact being among the one or more facts associated with the object; grouping the plurality of objects into a plurality of buckets in accordance with the normalized value of the name facts of the plurality of objects; and applying a matcher to a pair of objects in one of the plurality of buckets to determine if the pair of objects are duplicates, one of the pair of objects having an associated fact that is not a common fact of the pair of objects. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A system for identifying duplicate objects in a plurality of objects, each of the plurality of objects having one or more associated facts, each of the one or more facts having a value, the system comprising:
-
a processor for executing programs; and a subsystem executable by the processor, the subsystem including; instructions for extracting facts from web documents that are located on document hosts; instructions for associating the facts extracted from the web documents with a plurality of objects; instructions for normalizing the value of a name fact of each of the plurality of objects, the name fact being among the one or more facts associated with the object; instructions for grouping the plurality of objects into a plurality of buckets in accordance with the normalized value of the name facts of the plurality of objects; and instructions for applying a matcher to a pair of objects in one of the plurality of buckets to determine if the pair of objects are duplicates, one of the pair of objects having an associated fact that is not a common fact of the pair of objects. - View Dependent Claims (13)
-
-
14. A computer program product for use in conjunction with a computer system, the computer program product comprising a computer readable storage medium storing one or more programs configured for execution by a computer, the one or more programs comprising:
-
instructions for extracting facts from web documents that are located on document hosts; instructions for associating the facts extracted from the web documents with a plurality of objects; instructions for normalizing the value of a name fact of each of the plurality of objects, the name fact being among the one or more facts associated with the object; instructions for grouping the plurality of objects into a plurality of buckets in accordance with the normalized value of the name facts of the plurality of objects; and instructions for applying a matcher to a pair of objects in one of the plurality of buckets to determine if the pair of objects are duplicates, one of the pair of objects having an associated fact that is not a common fact of the pair of objects. - View Dependent Claims (15)
-
Specification