Attribute entropy as a signal in object normalization
First Claim
Patent Images
1. A computer-implemented method of determining if a first object and a second object represent a same entity, the method comprising:
- identifying one or more common attributes between the first object and the second object, wherein the first object and the second object are included in a fact repository located on one or more computer systems;
determining an entropy for each of the one or more common attributes, wherein a respective entropy for a respective attribute in the one or more common attributes comprises a respective numeric value measuring a respective amount of information carried by the respective attribute;
identifying a subset of the one or more common attributes whose respective values are equivalent;
determining whether the first object and the second object represent the same entity by comparing a sum of entropies for the subset of the one or more common attributes to an entropy threshold measure; and
in response to determining that the first object and the second object represent the same entity, merging the first object and the second object in the fact repository.
2 Assignments
0 Petitions
Accused Products
Abstract
A system and method determines whether two objects are duplicate objects. The system and method matches common facts of the two objects based on a match measure, combines the entropies of the matching common facts, and determines whether the two objects are duplicate objects based on the sum of entropies.
-
Citations
18 Claims
-
1. A computer-implemented method of determining if a first object and a second object represent a same entity, the method comprising:
-
identifying one or more common attributes between the first object and the second object, wherein the first object and the second object are included in a fact repository located on one or more computer systems; determining an entropy for each of the one or more common attributes, wherein a respective entropy for a respective attribute in the one or more common attributes comprises a respective numeric value measuring a respective amount of information carried by the respective attribute; identifying a subset of the one or more common attributes whose respective values are equivalent; determining whether the first object and the second object represent the same entity by comparing a sum of entropies for the subset of the one or more common attributes to an entropy threshold measure; and in response to determining that the first object and the second object represent the same entity, merging the first object and the second object in the fact repository. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer-implemented method of determining if a first object and a second object represent different entities, the method comprising:
-
identifying one or more common attributes between the first object and the second object, wherein the first object and the second object are included in a fact repository located on one or more computer systems; determining an entropy for each of the one or more common attributes; identifying a first subset of the one or more common attributes whose respective values are equivalent; identifying a second subset of the one or more common attributes whose respective values are nonequivalent; determining whether the first object and the second object represent different entities by comparing a difference of a sum of entropies for the first subset of the one or more common attributes and a sum of entropies for the second subset of the one or more common attributes to an entropy threshold measure; and in response to determining that the first object and the second object do not represent different entities, merging the first object and the second object in the fact repository and deleting in the merged object the one or more common attributes and values corresponding to the one or more common attributes for one of the first object. - View Dependent Claims (11, 12, 13, 14)
-
-
15. A system for determining if a first object and a second object represent a same entity, the system comprising:
-
a processor for executing programs; and a subsystem executable by the processor, the subsystem including; instructions for identifying one or more common attributes between the first object and the second object wherein the first object and the second object are included in a fact repository located on one or more computer systems; instructions for determining an entropy for each of the one or more common attributes, wherein a respective entropy for a respective attribute in the one or more common attributes comprises a respective numeric value measuring a respective amount of information carried by the respective attribute; instructions for identifying a subset of the one or more common attributes whose respective values are equivalent; instructions for determining whether the first object and the second object represent the same entity by comparing a sum of entropies for the subset of the one or more common attributes to an entropy threshold measure; and instructions for merging the first object and the second object in the fact repository in response to determining that the first object and the second object represent the same entity. - View Dependent Claims (16)
-
-
17. A computer program product for use in conjunction with a computer system, the computer program product comprising a non-transitory computer readable storage medium and a computer program mechanism embedded therein, the computer program mechanism including:
-
instructions for identifying one or more common attributes between the first object and the second object, wherein the first object and the second object are included in a fact repository located on one or more computer systems; instructions for determining an entropy for each of the one or more common attributes, wherein a respective entropy for a respective attribute in the one or more common attributes comprises a respective numeric value measuring a respective amount of information carried by the respective attribute; instructions for identifying a subset of the one or more common attributes whose respective values are equivalent; instructions for determining whether the first object and the second object represent a same entity by comparing a sum of entropies for the subset of the one or more common attributes to an entropy threshold measure; and instructions for merging the first object and the second object in the fact repository in response to determining that the first object and the second object represent the same entity. - View Dependent Claims (18)
-
Specification