Information-theory based measure of similarity between instances in ontology

US 7,792,838 B2
Filed: 03/29/2007
Issued: 09/07/2010
Est. Priority Date: 03/29/2007
Status: Active Grant

First Claim

Patent Images

1. A method of measuring similarity between instances in an ontology for use in an information retrieval system, the method comprising the steps of:

obtaining a set of instances from the ontology;

computing a first similarity metric that measures similarity between instances in the set of instances with respect to ontology concepts to which the instances belong; and

storing at least one taxonomy induced by the first similarity metric, wherein the at least one induced taxonomy is usable for responding to requests submitted to the information retrieval system;

wherein the first similarity metric measures similarity of instances i and j in the set of instances based on the similarity of C(i) and C(j), where the C(i) and the C(j) represent sets of concepts to which the instances belong; and

wherein the first similarity metric considers concept membership statements of the instances in the set of instances by defining a description of an individual and a commonality between the instances based on the ontology concepts to which the instances belong;

and further wherein the obtaining, computing and storing steps are performed by a processor and memory.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Improved information processing techniques for measuring similarity between instances in an ontology are disclosed. For example, a method of measuring similarity between instances in an ontology for use in an information retrieval system includes the following steps. A set of instances from the ontology is obtained. At least one of the following similarity metrics for the set of instances is computed: (i) a first metric that measures similarity between instances in the set of instances with respect to ontology concepts to which the instances belong; (ii) a second metric which measures similarity between instances in the set of instances where the instances are subjects in statements involving a given ontology property; and (iii) a third metric which measures similarity between instances in the set of instances where the instances are objects in statements involving a given ontology property. At least one taxonomy induced by the at least one computed similarity metric is stored, wherein the at least one induced taxonomy is usable for responding to requests submitted to an information retrieval system. When two or more of the first metric, the second metric and the third metric are computed, and two or more induced taxonomies corresponding to the two or more computed similarity metrics are stored, the method may include merging the two or more induced taxonomies to form a combined taxonomy, wherein the combined taxonomy is usable for responding to requests submitted to an information retrieval system.

Citations

25 Claims

1. A method of measuring similarity between instances in an ontology for use in an information retrieval system, the method comprising the steps of:
- obtaining a set of instances from the ontology;
  
  computing a first similarity metric that measures similarity between instances in the set of instances with respect to ontology concepts to which the instances belong; and
  
  storing at least one taxonomy induced by the first similarity metric, wherein the at least one induced taxonomy is usable for responding to requests submitted to the information retrieval system;
  
  wherein the first similarity metric measures similarity of instances i and j in the set of instances based on the similarity of C(i) and C(j), where the C(i) and the C(j) represent sets of concepts to which the instances belong; and
  
  wherein the first similarity metric considers concept membership statements of the instances in the set of instances by defining a description of an individual and a commonality between the instances based on the ontology concepts to which the instances belong;
  
  and further wherein the obtaining, computing and storing steps are performed by a processor and memory.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1 further comprising the steps of:
    - computing at least one of the following additional similarity metrics for the set of instances;
      
      a second similarity metric which measures similarity between second instances in the set of instances, wherein the second instances are subjects in first statements involving at least one given first ontology property; and
      
      a third similarity metric which measures similarity between third instances in the set of instances, wherein the third instances are objects in second statements involving at least one given second ontology property;
      
      wherein two or more induced taxonomies corresponding to at least two of the first similarity metric, the second similarity metric and the third similarity metric are stored; and
      
      merging the two or more induced taxonomies to form a combined taxonomy, wherein the combined taxonomy is usable for responding to requests submitted to the information retrieval system.
  - 3. The method of claim 1, wherein a description of an instance is computed by defining a virtual class and making the instance the only member of the virtual class.
  - 4. The method of claim 3, wherein information content of the description of the instance is a probability that a random instance belongs to the virtual class.
  - 5. The method of claim 4, wherein the commonality between the instances i and j is computed by expanding respective descriptions to include one or more other concept membership statements that can be inferred based on a concept taxonomy such that information content in the commonality is a probability that a pair of random instances satisfies the pair of class membership statements in the commonality.
  - 6. The method of claim 5, wherein C(i, j) denotes the set of classes that are a least common ancestors of a virtual class for i and a virtual class for j, and V_i,jdenotes the intersection of all classes in the C(i, j) and represents a least common ancestors intersection class for the instances i and j, and information content in the commonality between the instances i and j is a probability that a pair of random instances belong to V_i,j.
  - 7. The method of claim 2, wherein the two or more induced taxonomies are merged by merging two or more virtual classes for each instance computed in accordance with two or more of the first, second and third similarity metrics.

8. A method of measuring similarity between instances in an ontology for use in an information retrieval system, the method comprising the steps of:
- obtaining a set of instances from the ontology;
  
  computing at least one of the following similarity metrics for the set of instances;
  
  a first metric that measures similarity between instances in the set of instances with respect to ontology concepts to which the instances belong;
  
  a second metric which measures similarity between instances in the set of instances where the instances are subjects in statements involving a given ontology property; and
  
  a third metric which measures similarity between instances in the set of instances where the instances are objects in statements involving a given ontology property; and
  
  storing at least one taxonomy induced by the at least one computed similarity metric, wherein the at least one induced taxonomy is usable for responding to requests submitted to an information retrieval system;
  
  wherein the first metric, the second metric and the third metric comprise information theory-based measurements.

9. A method of measuring similarity between instances in an ontology for use in an information retrieval system, the method comprising the steps of:
- obtaining a set of instances from the ontology;
  
  computing a similarity metric which measures similarity between instances in the set of instances where the instances are subjects in statements involving at least one given ontology property; and
  
  storing at least one taxonomy induced by the similarity metric, wherein the at least one induced taxonomy is usable for responding to requests submitted to the information retrieval system;
  
  wherein the similarity metric measures similarity of instances i and j in the set of instances based on the similarity of sets of objects in statements where the instances are subjects in the statements;
  
  and further wherein the obtaining, computing and storing steps are performed by a processor and memory.
- View Dependent Claims (10, 11, 12, 13, 14, 15)
- - 10. The method of claim 9, wherein the similarity between the instances i and j is based on statements where the instances i and j are the subjects and a predicate is the at least one given ontology property.
  - 11. The method of claim 10, wherein an instance is considered to belong to a virtual class of size one and a description for the instance is defined in terms of membership of the instance to the virtual class of size one.
  - 12. The method of claim 11, wherein the virtual class is associated with a taxonomy defined based on a range of the at least one given ontology property.
  - 13. The method of claim 12, wherein information content of the description of the instance is a probability that a random individual belongs to the virtual class.
  - 14. The method of claim 13, wherein a commonality between the instances i and j is computed by expanding respective descriptions to include one or more other statements that can be inferred based on the object-sets for the at least one given ontology property and the taxonomy defined based on the range of the at least one given ontology property, and obtaining common pairs of statements that occur in both of the respective descriptions.
  - 15. The method of claim 14, wherein information content in the commonality is a probability that a pair of random instances satisfies a pair of statements in the commonality.

16. A method of measuring similarity between instances in an ontology for use in an information retrieval system, the method comprising the steps of:
- obtaining a set of instances from the ontology;
  
  computing a similarity metric which measures similarity between instances in the set of instances where the instances are objects in statements involving at least one given ontology property; and
  
  storing at least one taxonomy induced by the similarity metric, wherein the at least one induced taxonomy is usable for responding to requests submitted to the information retrieval system;
  
  wherein the similarity metric measures similarity of instances i and j in the set of instances based on the similarity of sets of subjects in statements where the instances are objects in the statements;
  
  and further wherein the obtaining, computing and storing steps are performed by a processor and memory.
- View Dependent Claims (17, 18, 19, 20, 21, 22)
- - 17. The method of claim 16, wherein the similarity between the instances i and j is based on statements where the instances i and j are the objects in the statements and a predicate is the at least one given ontology property.
  - 18. The method of claim 17, wherein an instance is considered to belong to a virtual class of size one and a description for the instance is defined in terms of membership of the instance to the virtual class of size one.
  - 19. The method of claim 18, wherein the virtual class is associated with a taxonomy defined based on the domain of the at least one given ontology property.
  - 20. The method of claim 19, wherein information content of the description of the instance is a probability that a random individual belongs to the virtual class.
  - 21. The method of claim 20, wherein a commonality between the instances i and j is computed by expanding respective descriptions to include one or more other statements that can be inferred based on the subject-sets for the at least one given ontology property and the taxonomy defined based on the domain of the at least one given ontology property, and obtaining common pairs of statements that occur in both of the respective descriptions.
  - 22. The method of claim 21, wherein information content in the commonality is a probability that a pair of random instances satisfies a pairs of statements in the commonality.

23. An article of manufacture for measuring similarity between instances in an ontology for use in an information retrieval system, comprising a computer readable storage medium containing one or more programs which when executed by a processor implement the steps of:
- obtaining a set of instances from the ontology;
  
  computing at least one of the following similarity metrics for the set of instances;
  
  a first similarity metric that measures similarity between first instances in the set of instances with respect to ontology concepts to which the first instances belong, wherein the first similarity metric measures similarity of first instances i1 and j1 in the set of instances based on the similarity of C(i1) and C(j1), wherein the C(i1) and the C(j1) represent sets of concepts to which the first instances belong, and wherein the first similarity metric considers concept membership statements of the first instances in the set of instances by defining a description of an individual and a commonality between the first instances based on the ontology concepts to which the first instances belong;
  
  a second similarity metric which measures similarity between second instances in the set of instances, wherein the second instances are subjects in first statements involving at least one given first ontology property, wherein the second similarity metric measures similarity of second instances i2 and j2 in the set of instances based on similarity of sets of objects in the first statements where the second instances are the subjects in the first statements; and
  
  a third similarity metric which measures similarity between third instances in the set of instances, wherein the third instances are objects in second statements involving at least one given second ontology property, wherein the third similarity metric measures similarity of third instances i3 and j3 in the set of instances based on similarity of sets of subjects in the second statements where the third instances are objects in the second statements; and
  
  storing at least one taxonomy induced by at least one of the first, the second and the third similarity metric, wherein the at least one induced taxonomy is usable for responding to requests submitted to the information retrieval system.

24. Apparatus for measuring similarity between instances in an ontology for use in an information retrieval system, the apparatus comprising:
- a memory; and
  
  a processor coupled to the memory and operative to;
  
  (i) obtain a set of instances from the ontology;
  
  (ii) compute at least one of the following similarity metrics for the set of instances;
  
  a first similarity metric that measures similarity between first instances in the set of instances with respect to ontology concepts to which the first instances belong, wherein the first similarity metric measures similarity of first instances i1 and j1 in the set of instances based on the similarity of C(i1) and C(j1), wherein the C(i1) and the C(j1) represent sets of concepts to which the first instances belong, and wherein the first similarity metric considers concept membership statements of the first instances in the set of instances by defining a description of an individual and a commonality between the first instances based on the ontology concepts to which the first instances belong;
  
  a second similarity metric which measures similarity between second instances in the set of instances where the second instances are subjects in first statements involving at least one given first ontology property, wherein the second similarity metric measures similarity of second instances i2 and j2 in the set of instances based on similarity of sets of objects in the first statements where the second instances are the subjects in the first statements; and
  
  a third similarity metric which measures similarity between third instances in the set of instances where the third instances are objects in second statements involving at least one given second ontology property, wherein the third similarity metric measures similarity of third instances i3 and j3 in the set of instances based on similarity of sets of subjects in the second statements where the third instances are objects in the second statements; and
  
  (iii) store at least one taxonomy induced by at least one of the first, the second and the third similarity metrics, wherein the at least one induced taxonomy is usable for responding to requests submitted to an information retrieval system.

25. An information retrieval system, comprising a similarity measurement system comprising a memory and a processor coupled to the memory, the information retrieval system configured to:
- (i) obtain a set of instances from the ontology;
  
  (ii) compute at least one of the following similarity metrics for the set of instances;
  
  a first similarity metric that measures similarity between first instances in the set of instances with respect to ontology concepts to which the first instances belong, wherein the first similarity metric measures similarity of first instances i1 and j1 in the set of instances based on the similarity of C(i1) and C(j1), wherein the C(i1) and the C(j1) represent sets of concepts to which the first instances belong, and wherein the first similarity metric considers concept membership statements of the first instances in the set of instances by defining a description of an individual and a commonality between the first instances based on the ontology concepts to which the first instances belong;
  
  a second similarity metric which measures similarity between second instances in the set of instances where the second instances are subjects in first statements involving at least one given first ontology property, wherein the second similarity metric measures similarity of second instances i2 and j2 in the set of instances based on similarity of sets of objects in the first statements where the second instances are the subjects in the first statements; and
  
  a third similarity metric which measures similarity between third instances in the set of instances where the third instances are objects in second statements involving at least one given second ontology property, wherein the third similarity metric measures similarity of third instances i3 and j3 in the set of instances based on similarity of sets of subjects in the second statements where the third instances are objects in the second statements; and
  
  (iii) store at least one taxonomy induced by at least one of the first, the second and the third similarity metrics, wherein the at least one induced taxonomy is usable for responding to requests submitted to an information retrieval system.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Ronen, Royi, Ranganathan, Anand
Primary Examiner(s)
Cottingham; John R.
Assistant Examiner(s)
Allen; Nicholas E

Application Number

US11/693,367
Publication Number

US 20080243809A1
Time in Patent Office

1,258 Days
Field of Search

None
US Class Current

707/739
CPC Class Codes

G06F 16/367 Ontology

Information-theory based measure of similarity between instances in ontology

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

25 Claims

Specification

Solutions

Use Cases

Quick Links

Information-theory based measure of similarity between instances in ontology

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

25 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links