Method and system to compare data objects
First Claim
1. A method for comparing a first data object with a second data object, the first data object comprising first unstructured data, the first unstructured data comprising one or more first sets of ontology-based attributes pertaining to a domain knowledge model, the second data object comprising second unstructured data, the second unstructured data comprising one or more second sets of ontology-based attributes pertaining to the domain knowledge model, the domain knowledge model comprising one or more data acyclic graphs representing the one or more first sets of ontology-based attributes and the one or more second sets of ontology-based attributes, the method comprising the steps of:
- a. converting the first data object into a first directed acyclic graph forest, the first directed acyclic graph forest comprising a first set of one or more directed acyclic graphs, wherein the first set of one or more data acyclic graphs are constructed from the one or more directed acyclic graphs of the domain knowledge model representing the one or more first set of ontology-based attributes of the first data object;
b. converting the second data object into a second directed acyclic graph forest, the second directed acyclic graph forest comprising a second set of one or more directed acyclic graphs, wherein the second set of one or more data acyclic graphs are constructed from the one or more directed acyclic graphs of the domain knowledge model representing the one or more second set of ontology-based attributes of the second data object;
c. determining a graph-based similarity score between each directed acyclic graph of the first set of one or more directed acyclic graphs and a corresponding directed acyclic graph of the second set of one or more directed acyclic graphs, wherein the graph based similarity score is determined by calculating a cosine distance between vectors defined by the each directed acyclic graph of the first set of one or more directed acyclic graphs and the corresponding directed acyclic graph of the second set of one or more directed acyciic graphs; and
d. determining a forest-based similarity score between the first directed acyclic graph forest of the first data object and the second directed acyclic graph forest of the second data object, wherein the forest-based similarity score is calculated as a function of the graph-based similarity score between each directed acyclic graph of the first set of one or more directed acyclic graphs and a corresponding directed acyclic graph of the second set of one or more directed acyclic graphs.
4 Assignments
0 Petitions
Accused Products
Abstract
The invention provides a method and system to compare data objects. Each data object is converted into a directed acyclic graph forest, which comprises one or more directed acyclic graphs. The directed acyclic graph forests corresponding to data objects are then compared to calculate a similarity score between the data objects. The similarity score is then used as a measure to determine the extent of similarity between the data objects.
-
Citations
12 Claims
-
1. A method for comparing a first data object with a second data object, the first data object comprising first unstructured data, the first unstructured data comprising one or more first sets of ontology-based attributes pertaining to a domain knowledge model, the second data object comprising second unstructured data, the second unstructured data comprising one or more second sets of ontology-based attributes pertaining to the domain knowledge model, the domain knowledge model comprising one or more data acyclic graphs representing the one or more first sets of ontology-based attributes and the one or more second sets of ontology-based attributes, the method comprising the steps of:
-
a. converting the first data object into a first directed acyclic graph forest, the first directed acyclic graph forest comprising a first set of one or more directed acyclic graphs, wherein the first set of one or more data acyclic graphs are constructed from the one or more directed acyclic graphs of the domain knowledge model representing the one or more first set of ontology-based attributes of the first data object; b. converting the second data object into a second directed acyclic graph forest, the second directed acyclic graph forest comprising a second set of one or more directed acyclic graphs, wherein the second set of one or more data acyclic graphs are constructed from the one or more directed acyclic graphs of the domain knowledge model representing the one or more second set of ontology-based attributes of the second data object; c. determining a graph-based similarity score between each directed acyclic graph of the first set of one or more directed acyclic graphs and a corresponding directed acyclic graph of the second set of one or more directed acyclic graphs, wherein the graph based similarity score is determined by calculating a cosine distance between vectors defined by the each directed acyclic graph of the first set of one or more directed acyclic graphs and the corresponding directed acyclic graph of the second set of one or more directed acyciic graphs; and d. determining a forest-based similarity score between the first directed acyclic graph forest of the first data object and the second directed acyclic graph forest of the second data object, wherein the forest-based similarity score is calculated as a function of the graph-based similarity score between each directed acyclic graph of the first set of one or more directed acyclic graphs and a corresponding directed acyclic graph of the second set of one or more directed acyclic graphs. - View Dependent Claims (2, 3, 4, 5, 11)
-
-
6. A system for comparing a first data object with a second data object, the first data object comprising first unstructured data, the first unstructured data comprising one or more first sets of ontology-based attributes pertaining to a domain knowledge model, the second data object comprising second unstructured data, the second unstructured data comprising one or more second sets of ontology-based attributes pertaining to the domain knowledge model, the domain knowledge model comprising one or more data acyclic graphs representing the one or more first sets of ontology-based attributes and the one or more second sets of ontology-based attributes, the system comprising:
-
a processor; a. a data converter to convert the first data object into a first directed acyclic graph forest and the second data object into a second directed acyclic graph forest, the first directed acyclic graph forest comprising a first set of one or more directed acyclic graphs wherein the first set of one or more data acyclic graphs are constructed from the one or more directed acyclic graphs of the domain knowledge model representing the one or more first set of ontology-based attributes of the first data object and the second directed acyclic graph forest comprising a second set of one or more directed acyclic graphs wherein the second set of one or more data acyclic graphs are constructed from the one or more directed acyclic graphs of the domain knowledge model representing the one or more second set of ontology-based attributes of the second data object the first set of one or more directed acyclic graphs being formed based on one or more first set of one or more ontology based attributes of the first data object and b. a similarity-calculator to determine the extent of similarity between the first directed acyclic graph forest and the second directed acyclic graph forest, the similarity calculator comprising; a graph-based similarity-score calculator to determine a graph-based similarity score between each directed acyclic graph of the first set of one or more directed acyclic graphs and a corresponding directed acyclic graph of the second set of one or more directed acyclic graphs, wherein the graph based similarity score is determined by calculating a cosine distance between vectors defined by the each directed acyclic graph of the first set of one or more directed acyclic graphs and the corresponding directed acyclic graph of the second set of one or more directed acyclic graphs; and a forest-based similarity-score calculator to determine a forest-based similarity score between the first directed acyclic graph forest of the first data object and the second directed acyclic graph forest of the second data object, wherein the forest-based similarity score is calculated as a function of the graph-based similarity score between each directed acyclic graph of the first set of one or more directed acyclic graphs and a corresponding directed acyclic graph of the second set of one or more directed acyclic graphs. - View Dependent Claims (7)
-
-
8. A computer program product for use with a computer, the computer program product comprising a computer usable medium having a computer program code embodied therein for comparing a first data object with a second data object, the first data object comprising first unstructured data, the first unstructured data comprising one or more first set of ontology-based attributes pertaining to a domain knowledge model, the second data object comprising second unstructured data, the second unstructured data comprising one or more second set of ontology-based attributes pertaining to the domain knowledge model, the domain knowledge model comprising one or more data acyclic graphs representing one or more first set of ontology-based attributes and one or more second set of ontology-based attributes, the computer code performing the steps of:
-
a. converting the first data object into a first directed acyclic graph forest, the first directed acyclic graph forest comprising a first set of one or more directed acyclic graphs, wherein the first set of one or more data acyclic graphs are constructed from the one or more directed acyclic graphs of the domain knowledge model representing the one or more first set of ontology-based attributes of the first data object; b. converting the second data object into a second directed acyclic graph forest, the second directed acyclic graph forest comprising a second set of one or more directed acyclic graphs, wherein the second set of one or more data acyclic graphs are constructed from the one or more directed acyclic graphs of the domain knowledge model representing the one or more second set of ontology-based attributes of the second data object; c. determining a graph-based similarity score between each directed acyclic graph of the first set of one or more directed acyclic graphs and a corresponding directed acyclic graph of the second set of one or more directed acyclic graphs, wherein the graph based similarity score is determined by calculating cosine distance between vectors defined by the each directed acyclic graph of the first set of one or more directed acyclic graphs and the corresponding directed acyclic graph of the second set of one or more directed acyclic graphs; and d. determining a forest-based similarity score between the first directed acyclic graph forest of the first data object and the second directed acyclic graph forest of the second data object, wherein the forest-based similarity score is calculated as a function of the graph-based similarity score between each directed acyclic graph of the first set of one or more directed acyclic graphs and a corresponding directed acyclic graph of the second set of one or more directed acyclic graphs. - View Dependent Claims (9, 10, 12)
-
Specification