Graph based resolution of matching items in data sources
First Claim
1. A method comprising:
- calculating a first relational classification score for a first node in a first graph that is digitally stored in computer memory, the first graph representing a first digitally stored database, the first node representing an element of the first digital stored database, and the first relational classification score being a measure of a logical position of the first node in the first graph;
calculating a second relational classification score for a second node in a second graph that is digitally stored in computer memory, the second graph representing a second digitally stored database, the second node representing an element of said second digitally stored database, and the second relational classification score being a measure of a logical position of said second node in the first graph;
calculating a relational classification matching score for the first node and the second node that is based upon on the first relational classification score and the second relational classification score, the relational classification matching score representing a similarity in location in the graphs of said first node and said second node;
calculating a composite score based at least upon the relational classification matching score, the composite score being a measure of quality of match of said first node and said second node;
generating a canonical tuple that represents a match between the first node and the second node in response to determining that the composite score is equal to or greater than a specified threshold score value;
storing said canonical tuple as a merged digitally stored database that is created to resolve said first node and said second node;
wherein the method is performed by one or more computing devices.
8 Assignments
0 Petitions
Accused Products
Abstract
In an embodiment, a computer-implemented method comprises calculating a first relational classification score for a first node in a first graph; calculating a second relational classification core for a second node in a second graph; calculating a relational classification matching score for the first node and the second node that is based upon on the first relational classification score and the second relational classification score; calculating a composite score based at least upon the relational classification matching score; generating a canonical tuple that represents a match between the first node and the second node in response to determining that the composite score is equal to or greater than a specified threshold score value.
-
Citations
20 Claims
-
1. A method comprising:
-
calculating a first relational classification score for a first node in a first graph that is digitally stored in computer memory, the first graph representing a first digitally stored database, the first node representing an element of the first digital stored database, and the first relational classification score being a measure of a logical position of the first node in the first graph; calculating a second relational classification score for a second node in a second graph that is digitally stored in computer memory, the second graph representing a second digitally stored database, the second node representing an element of said second digitally stored database, and the second relational classification score being a measure of a logical position of said second node in the first graph; calculating a relational classification matching score for the first node and the second node that is based upon on the first relational classification score and the second relational classification score, the relational classification matching score representing a similarity in location in the graphs of said first node and said second node; calculating a composite score based at least upon the relational classification matching score, the composite score being a measure of quality of match of said first node and said second node; generating a canonical tuple that represents a match between the first node and the second node in response to determining that the composite score is equal to or greater than a specified threshold score value; storing said canonical tuple as a merged digitally stored database that is created to resolve said first node and said second node; wherein the method is performed by one or more computing devices. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A data processing system, comprising:
-
one or more processors; one or more non-transitory machine readable storage media storing sequences of instructions which, when executed using the one or more processors, cause the one or more processors to perform; calculating a first relational classification score for a first node in a first graph, wherein said first graph represents a first digital stored database, said first node represents an element of said first digital stored database, and said first relational classification score is a measure of a logical position of said first node in the first graph; calculating a second relational classification score for a second node in a second graph, wherein said second graph represents a second digitally stored database and said second node represents an element of said second digital stored database, and said second relational classification score is a measure of a logical position of said second node in the first graph; calculating a relational classification matching score for the first node and the second node that is based upon on the first relational classification score and the second relational classification score, wherein said relational classification matching score represents a similarity in location of said first node and said second node; calculating a composite score based at least upon the relational classification matching score, wherein said composite score is a measure of quality of match of said first node and said second node; generating a canonical tuple that represents a match between the first node and the second node in response to determining that the composite score is equal to or greater than a specified threshold score value; storing said canonical tuple as a merged digital stored database that is created to resolve said first node and said second node; wherein the method is performed by one or more computing devices. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A computer-implemented data processing method for determining whether a first node, which is associated with a first entity of a first digitally stored database, matches and represents a same object as a second node, which is associated with a second entity of a second, different digitally stored database, the method comprising:
-
calculating a first relational classification score for the first node in a first graph that has been created and stored in a computer memory based upon the first entity of the first digitally stored database and calculating a second relational classification score for the second node in a second graph that has been created and stored in the computer memory based upon the second entity of the second digitally stored database, including calculating each relational classification score as a measure of a logical position of the first node, and based upon relationships to other nodes by calculating a distance of the first node to a root node of the first graph, counting proximate nodes to which the first node is joined as parent nodes, grandparent nodes, child nodes, or grandchild nodes, and determining which proximate nodes match the first node in node identifiers, properties or attributes; calculating a relational classification matching score for the first node and the second node that is based upon on the first relational classification score and the second relational classification score, as a representation of similarity between the first node and the second node based upon a first similarity of the respective locations of the nodes, a second similarity of all proximate nodes that are joined respectively to the first node and the second node, and a third similarity of numbers of child nodes of the first node and the second node; calculating a composite score as a measure of quality of match of the first node and the second node, using the relational classification matching score and also using one or more of;
a node identifier edit distance matching score, property matching score, known synonym matching score, or known abbreviation matching score;generating a canonical tuple that represents a match between the first node and the second node in response to determining that the composite score is equal to or greater than a specified threshold score value. - View Dependent Claims (20)
-
Specification