×

Method and system to compare data objects

  • US 7,440,955 B2
  • Filed: 04/25/2005
  • Issued: 10/21/2008
  • Est. Priority Date: 01/14/2005
  • Status: Active Grant
First Claim
Patent Images

1. A method for comparing a first data object with a second data object, the first data object comprising first unstructured data, the first unstructured data comprising one or more first sets of ontology-based attributes pertaining to a domain knowledge model, the second data object comprising second unstructured data, the second unstructured data comprising one or more second sets of ontology-based attributes pertaining to the domain knowledge model, the domain knowledge model comprising one or more data acyclic graphs representing the one or more first sets of ontology-based attributes and the one or more second sets of ontology-based attributes, the method comprising the steps of:

  • a. converting the first data object into a first directed acyclic graph forest, the first directed acyclic graph forest comprising a first set of one or more directed acyclic graphs, wherein the first set of one or more data acyclic graphs are constructed from the one or more directed acyclic graphs of the domain knowledge model representing the one or more first set of ontology-based attributes of the first data object;

    b. converting the second data object into a second directed acyclic graph forest, the second directed acyclic graph forest comprising a second set of one or more directed acyclic graphs, wherein the second set of one or more data acyclic graphs are constructed from the one or more directed acyclic graphs of the domain knowledge model representing the one or more second set of ontology-based attributes of the second data object;

    c. determining a graph-based similarity score between each directed acyclic graph of the first set of one or more directed acyclic graphs and a corresponding directed acyclic graph of the second set of one or more directed acyclic graphs, wherein the graph based similarity score is determined by calculating a cosine distance between vectors defined by the each directed acyclic graph of the first set of one or more directed acyclic graphs and the corresponding directed acyclic graph of the second set of one or more directed acyciic graphs; and

    d. determining a forest-based similarity score between the first directed acyclic graph forest of the first data object and the second directed acyclic graph forest of the second data object, wherein the forest-based similarity score is calculated as a function of the graph-based similarity score between each directed acyclic graph of the first set of one or more directed acyclic graphs and a corresponding directed acyclic graph of the second set of one or more directed acyclic graphs.

View all claims
  • 4 Assignments
Timeline View
Assignment View
    ×
    ×