ID persistence through normalization
First Claim
1. A computer-implemented method for maintaining object ID persistence in a collection of data, comprising:
- at a computer system including one or more processors and memory storing one or more programs, the one or more processors executing the one or more programs to perform the operations of;
selecting a first object from the collection of data having a first object ID, wherein a first fact comprising an associated object ID is associated with the first object, the collection of data includes a plurality of objects and a plurality of facts associated with the objects, each fact comprises an attribute-value pair, and the plurality of facts are extracted from a plurality of web documents;
selecting a second object from the collection of data having a second object ID;
performing a heuristic comparison on the first object and the second object to determine if the first object and the second object refer to a same entity;
responsive to determining that the first object and the second object refer to the same entity,associating with the first object a forwarding reference to the second object, so that the second object can be referenced using the first object ID;
dissociating the first fact from the first object; and
associating the first fact with the second object by setting the associated object ID of the first fact to the second object ID, so that the first fact is merged with facts for the second object; and
responsive to receiving an external reference to the first object,identifying that the first object includes a forwarding reference to the second object; and
retrieving the second object.
2 Assignments
0 Petitions
Accused Products
Abstract
A system and method for maintaining persistent object identifiers across versions of a collection of data. According to one embodiment of the present invention, a first collection of objects is compared to a second collection of objects. If an object in the first collection matches an object in the second collection, a reference is added to the object in the first collection referring to the object in the second collection, allowing the identifier to persist in both collections of objects. Additionally, according to one embodiment of the present invention, the data (or “facts”) associated with the object from the first collection are moved to the object from the second collection. In this way, data associated with matching objects is combined between two collections of objects while maintaining persistent object identifiers.
170 Citations
18 Claims
-
1. A computer-implemented method for maintaining object ID persistence in a collection of data, comprising:
-
at a computer system including one or more processors and memory storing one or more programs, the one or more processors executing the one or more programs to perform the operations of; selecting a first object from the collection of data having a first object ID, wherein a first fact comprising an associated object ID is associated with the first object, the collection of data includes a plurality of objects and a plurality of facts associated with the objects, each fact comprises an attribute-value pair, and the plurality of facts are extracted from a plurality of web documents; selecting a second object from the collection of data having a second object ID; performing a heuristic comparison on the first object and the second object to determine if the first object and the second object refer to a same entity; responsive to determining that the first object and the second object refer to the same entity, associating with the first object a forwarding reference to the second object, so that the second object can be referenced using the first object ID; dissociating the first fact from the first object; and associating the first fact with the second object by setting the associated object ID of the first fact to the second object ID, so that the first fact is merged with facts for the second object; and responsive to receiving an external reference to the first object, identifying that the first object includes a forwarding reference to the second object; and retrieving the second object. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer readable storage medium for maintaining object ID persistence in a collection of data, the computer readable storage medium storing one or more programs for execution by one or more processors in a computer system, the one or more programs comprising:
-
instructions for selecting a first object from the collection of data having a first object ID, wherein a first fact comprising an associated object ID is associated with the first object, the collection of data includes a plurality of objects and a plurality of facts associated with the objects, each fact comprises an attribute-value pair, and the plurality of facts are extracted from a plurality of web documents; instructions for selecting a second object from the collection of data having a second object ID; instructions for performing a heuristic comparison on the first object and the second object to determine if the first object and the second object refer to a same entity; instructions for, responsive to determining that the first object and the second object refer to the same entity, associating with the first object a forwarding reference to the second object, so that the second object can be referenced using the first object ID; dissociating the first fact from the first object; and associating the first fact with the second object by setting the associated object ID of the first fact to the second object ID, so that the first fact is merged with facts for the second object; and program code for, responsive to receiving an external reference to the first object, identifying that the first object includes a forwarding reference to the second object; and retrieving the second object.
-
-
12. A system for maintaining object ID persistence in a collection of data, comprising:
-
one or more processors; memory; and one or more programs stored in the memory, the one or more programs comprising instructions to; select a first object from the collection of data having a first object ID, wherein a first fact comprising an associated object ID is associated with the first object, the collection of data includes a plurality of objects and a plurality of facts associated with the objects, each fact comprises an attribute-value pair and the plurality of facts are extracted from a plurality of web documents; select a second object from the collection of data having a second object ID; perform a heuristic comparison on the first object and the second object to determine if the first object and the second object refer to a same entity; and responsive to determining that the first object and the second object refer to the same entity, associate with the first object a forwarding reference to the second object, so that the second object can be referenced using the first object ID; dissociate the first fact from the first object; and associate the first fact with the second object by setting the associated object ID of the first fact to the second object ID, so that the first fact is merged with facts for the second object; and responsive to receiving an external reference to the first object, identify that the first object includes a forwarding reference to the second object; and retrieve the second object.
-
-
13. A computer-implemented method for maintaining object ID persistence in a collection of data, comprising:
-
at a computer system including one or more processors and memory storing one or more programs, the one or more processors executing the one or more programs to perform the operations of; selecting a first set of one or more facts from the collection of data associated with a first object ID, the collection of data includes a plurality of objects and a plurality of facts associated with the objects, each fact comprises an attribute-value pair and the plurality of facts are extracted from a plurality of web documents; selecting a second set of one or more facts from the collection of data associated with a second object ID; performing a heuristic comparison on the first set of one or more facts and the second set of one or more facts to determine if the first set of one or more facts associated with the first object ID and the second set of one or more facts associated with the second object ID refer to a same entity; responsive to determining that the first set of one or more facts associated with the first object ID and the second set of one or more facts associated with the second object ID refer to a same entity and; associating with the first object ID a forwarding reference to the second object ID, so that the second set of one or more facts associated with the second object ID can be referenced using the first object ID, and dissociating the first set of one or more facts from the first object ID; and associating the first fact with the second object by setting the associated object ID of the first fact to the second object ID, so that the first fact is merged with facts for the second object; and responsive to receiving an external reference to the first object ID, identifying that the first object ID includes a forwarding reference to the second object ID and retrieving the second object ID. - View Dependent Claims (14, 15, 16, 17)
-
-
18. A computer-implemented method for maintaining object ID persistence in a collection of data, comprising:
-
at a computer system including one or more processors and memory storing one or more programs, the one or more processors executing the one or more programs to perform the operations of; performing a heuristic comparison on the first object and the second object to determine if a first object from the collection of data having a first object ID and a second object from the collection of data having a second object ID refer to a same entity, wherein a first fact comprising an associated object ID is associated with the first object, the collection of data includes a plurality of objects and a plurality of facts associated with the objects, each fact comprises an attribute-value pair and the plurality of facts are extracted from a plurality of web documents; and responsive to determining that the first object and the second object refer to the same entity, associating with the first object a forwarding reference to the second object, so that the second object can be referenced using the first object ID; dissociating the first fact from the first object; and associating the first fact with the second object by setting the associated object ID of the first fact to the second object ID, so that the first fact is merged with facts for the second object; and responsive to receiving an external reference to the first object, identifying that the first object includes a forwarding reference to the second object; and retrieving the second object.
-
Specification