Method and system for analyzing similarity of concept sets
First Claim
1. A system comprising:
- a concept analysis engine including;
a taxonomy manager configured to obtain a set of one or more taxonomies wherein each of the taxonomies includes one root node and one or more hierarchically ordered paths, wherein each hierarchically ordered path includes the root node and a hierarchically ordered sequence of concept nodes;
a concept set engine configured to receive a first set of first set concepts and a second set of second set concepts;
a concept pair engine configured to determine a plurality of concept pairs, wherein each concept pair includes one of the first set concepts and one of the second set concepts;
a hierarchical path engine configured to determine, for each one of the concept pairs, an associated length of a nondiverging intersection of a first subpath of one of the hierarchically ordered paths from the root node of one of the taxonomies to a first concept node representing the first set concept and a second subpath of one of the hierarchically ordered paths from the root node of the one of the taxonomies to a second concept node representing the second set concept, and an associated length of a first portion of the first subpath from a last concept node included in the nondiverging intersection to the first concept node, and an associated length of a second portion of the second subpath from the last concept node included in the nondiverging intersection to the second concept node;
a concept similarity engine configured to determine pairwise similarity values associated with each of the concept pairs based on ratios based on associated lengths of nondiverging intersections determined by the hierarchical path engine and the associated lengths of the first and second portions, wherein a pairwise similarity value indicating a high similarity is determined for association with concept pairs associated with nonempty nondiverging intersections including the root node and hierarchically immediate successor nodes of the root node that are included in the first subpath and the second subpath; and
a concept set similarity engine configured to determine a concept set similarity value based on a weighted sum of the pairwise similarity values associated with optimal selected ones of the concept pairs.
2 Assignments
0 Petitions
Accused Products
Abstract
A method and system are described for determining similar concept sets. An example method includes obtaining taxonomies, each including one root node and hierarchically ordered paths; receiving first and second sets each including set concepts; determining concept pairs, each including a first and second set concept; determining lengths of nondiverging intersections of first and second subpaths from the root node to first and second concept nodes, and associated lengths of first and second portions of the subpaths from a last concept node included in the nondiverging intersection to the first and second concept nodes; determining pairwise similarity values based on ratios based on associated lengths of nondiverging intersections and the associated lengths of the first and second portions; and determining a concept set similarity value based on a weighted sum of the pairwise similarity values associated with optimal selected ones of the concept pairs.
24 Citations
26 Claims
-
1. A system comprising:
a concept analysis engine including; a taxonomy manager configured to obtain a set of one or more taxonomies wherein each of the taxonomies includes one root node and one or more hierarchically ordered paths, wherein each hierarchically ordered path includes the root node and a hierarchically ordered sequence of concept nodes; a concept set engine configured to receive a first set of first set concepts and a second set of second set concepts; a concept pair engine configured to determine a plurality of concept pairs, wherein each concept pair includes one of the first set concepts and one of the second set concepts; a hierarchical path engine configured to determine, for each one of the concept pairs, an associated length of a nondiverging intersection of a first subpath of one of the hierarchically ordered paths from the root node of one of the taxonomies to a first concept node representing the first set concept and a second subpath of one of the hierarchically ordered paths from the root node of the one of the taxonomies to a second concept node representing the second set concept, and an associated length of a first portion of the first subpath from a last concept node included in the nondiverging intersection to the first concept node, and an associated length of a second portion of the second subpath from the last concept node included in the nondiverging intersection to the second concept node; a concept similarity engine configured to determine pairwise similarity values associated with each of the concept pairs based on ratios based on associated lengths of nondiverging intersections determined by the hierarchical path engine and the associated lengths of the first and second portions, wherein a pairwise similarity value indicating a high similarity is determined for association with concept pairs associated with nonempty nondiverging intersections including the root node and hierarchically immediate successor nodes of the root node that are included in the first subpath and the second subpath; and a concept set similarity engine configured to determine a concept set similarity value based on a weighted sum of the pairwise similarity values associated with optimal selected ones of the concept pairs. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
19. A method comprising:
-
obtaining a set of one or more taxonomies wherein each of the taxonomies includes one root node and one or more hierarchically ordered paths, wherein each hierarchically ordered path includes the root node and a hierarchically ordered sequence of concept nodes; receiving a first set of first set concepts and a second set of second set concepts; determining a plurality of concept pairs, wherein each concept pair includes one of the first set concepts and one of the second set concepts; determining, for each one of the concept pairs, an associated length of a nondiverging intersection of a first subpath of one of the hierarchically ordered paths from the root node of one of the taxonomies to a first concept node representing the first set concept and a second subpath of one of the hierarchically ordered paths from the root node of the one of the taxonomies to a second concept node representing the second set concept, and an associated length of a first portion of the first subpath from a last concept node included in the nondiverging intersection to the first concept node, and an associated length of a second portion of the second subpath from the last concept node included in the nondiverging intersection to the second concept node; determining pairwise similarity values associated with each of the concept pairs based on ratios based on associated lengths of nondiverging intersections determined by the determining the associated length of the nondiverging intersection and the associated lengths of the first and second portions, wherein a pairwise similarity value indicating a high similarity is determined for association with concept pairs associated with nonempty nondiverging intersections including the root node and hierarchically immediate successor nodes of the root node that are included in the first subpath and the second subpath; and determining a concept set similarity value based on a weighted sum of the pairwise similarity values associated with optimal selected ones of the concept pairs. - View Dependent Claims (20, 21, 22, 23)
-
-
24. A computer program product being tangibly embodied on a computer-readable medium and being configured to cause a data processing apparatus to:
-
obtain a set of one or more taxonomies wherein each of the taxonomies includes one root node and one or more hierarchically ordered paths, wherein each hierarchically ordered path includes the root node and a hierarchically ordered sequence of concept nodes; receive a first set of first set concepts and a second set of second set concepts; determine a plurality of concept pairs, wherein each concept pair includes one of the first set concepts and one of the second set concepts; determine, for each one of the concept pairs, an associated length of a nondiverging intersection of a first subpath of one of the hierarchically ordered paths from the root node of one of the taxonomies to a first concept node representing the first set concept and a second subpath of one of the hierarchically ordered paths from the root node of the one of the taxonomies to a second concept node representing the second set concept, and an associated length of a first portion of the first subpath from a last concept node included in the nondiverging intersection to the first concept node, and an associated length of a second portion of the second subpath from the last concept node included in the nondiverging intersection to the second concept node; determine pairwise similarity values associated with each of the concept pairs based on ratios based on associated lengths of nondiverging intersections determined by the determining the associated length of the nondiverging intersection and the associated lengths of the first and second portions, wherein a pairwise similarity value indicating a high similarity is determined for association with concept pairs associated with nonempty nondiverging intersections including the root node and hierarchically immediate successor nodes of the root node that are included in the first subpath and the second subpath; and determine a concept set similarity value based on a weighted sum of the pairwise similarity values associated with optimal selected ones of the concept pairs. - View Dependent Claims (25, 26)
-
Specification