Computer-based method for finding similar objects using a taxonomy
First Claim
Patent Images
1. A computer-based method of finding similar items labeled in a taxonomy comprising the steps of:
- determining a label LA representing a set of concepts that a target object T and a candidate object C have in common, said target object T and candidate object C part of a taxonomy structure as acyclic graphs wherein at least one child class has multiple parents;
determining information content I(LA) of label LA representing said set of common concepts;
combining individual information content I(LT) and I(LC), where I(LT) and I(LC) represent individual information content of labels of target object and candidate object, respectively,finding similarity between said target object and said candidate object in said taxonomy as a function of I(LA) and I(LT)+I(LC), andwherein said information content I(LA), I(LT), and I(LC) are functions of inclusion probabilities p(LA), p(LT), and p(LC), respectively, said inclusion probability of label L defined as the probability that an ancestor graph of label L of an object o chosen at random from a corpus
contains L, and said similarity between said target object T and said candidate object C is found based on the following mathematical function;
and said inclusion probability is given by;
pi(L)=p(L
Terms(Anc(o))).
1 Assignment
0 Petitions
Accused Products
Abstract
A generalized axiomatic definition of information-theoretic similarity is provided for taxonomies that are structured as directed acyclic graph form which multiple terns may be used to describe an object. The definition is adaptable in the presence of ambiguity, as introduced by an evolving taxonomy or classifiers with imperfect knowledge, and two new similarity measures are introduced based on the definitions. A pragmatic implementation is also provided for similarity measures that arc tightly integrated with an object-relational database and scales to large taxonomies and large datasets.
49 Citations
12 Claims
-
1. A computer-based method of finding similar items labeled in a taxonomy comprising the steps of:
-
determining a label LA representing a set of concepts that a target object T and a candidate object C have in common, said target object T and candidate object C part of a taxonomy structure as acyclic graphs wherein at least one child class has multiple parents; determining information content I(LA) of label LA representing said set of common concepts; combining individual information content I(LT) and I(LC), where I(LT) and I(LC) represent individual information content of labels of target object and candidate object, respectively, finding similarity between said target object and said candidate object in said taxonomy as a function of I(LA) and I(LT)+I(LC), and wherein said information content I(LA), I(LT), and I(LC) are functions of inclusion probabilities p(LA), p(LT), and p(LC), respectively, said inclusion probability of label L defined as the probability that an ancestor graph of label L of an object o chosen at random from a corpus
contains L, and said similarity between said target object T and said candidate object C is found based on the following mathematical function;and said inclusion probability is given by;
pi(L)=p(L
Terms(Anc(o))).- View Dependent Claims (2, 3, 4, 5)
-
-
6. An article of manufacture comprising a computer storage medium storing computer executable instructions for finding similar items labeled in a taxonomy, said medium comprising:
-
computer readable program code determining a label LA representing a set of concepts that a target object T and a candidate object C have in common, said target object T and candidate object C part of a taxonomy structure as acyclic graphs wherein at least one child class has multiple parents; computer readable program code determining information content I(LA) of label LA representing said set of common concepts; computer readable program code combining individual information content I(LT) and I(LC), where I(LT) and I(LC) represent individual information content of labels of target object and candidate object, respectively, and computer readable program code finding similarity between said target object and said candidate object in said taxonomy as a function of I(LA) and I(LT)+I(LC), wherein said information content I(LA), I(LT), and I(LC) are functions of inclusion probabilities p(LA), p(LT), and p(LC), respectively, said inclusion probability of label L defined as the probability that an ancestor graph of label L of an object o chosen at random from a corpus
contains L, and said similarity between said target object T and said candidate object C is found based on the following mathematical function;and said inclusion probability is given by;
pi(L)=p(L
Terms(Anc(o))).
-
-
7. A computer-based method of finding similar items labeled in a taxonomy comprising the steps of:
-
determining a label LA representing a set of concepts that a target object T and a candidate object C have in common, said target object T and candidate object C part of a taxonomy structure as acyclic graphs wherein at least one child class has multiple parents; determining information content I(LA) of label LA representing said set of common concepts; combining individual information content I(LT) and I(LA), where I(LT) represent information content of label of target object; finding similarity between said target object and said candidate object in said taxonomy as a function of I(LA) and I(LT)+I(LA), and wherein said similarity is equal to 1 if and only if object C is substitutable for object T, and said information content I(LA), I(LT), and I(LC) are functions of inclusion probabilities p(LA), p(LT), and p(LC), respectively, said inclusion probability of label L defined as the probability that an ancestor graph of label L of an object o chosen at random from a corpus
contains L, and said similarity between said target object T and said candidate object C is found based on the following mathematical function;and said inclusion probability is given by;
pi(L)=p(L
Terms(Anc(o))).- View Dependent Claims (8, 9, 10, 11)
-
-
12. An article of manufacture comprising a computer storage medium storing computer executable instructions for finding similar items labeled in a taxonomy, said medium comprising:
-
computer readable program code determining a label LA representing a set of concepts that a target object T and a candidate object C have in common, said target object T and candidate object C part of a taxonomy structure as acyclic graphs wherein at least one child class has multiple parents; computer readable program code determining information content I(LA) of label LA representing said set of common concepts; computer readable program code combining individual information content I(LT) and I(LA), where I(LT) represent information content of label of target object; computer readable program code finding similarity between said target object and said candidate object in said taxonomy as a function of I(LA) and I(LT)+I(LA), and wherein said similarity is equal to 1 if and only if object C is substitutable for object T and said information content I(LA), I(LT), and I(LC) are functions of inclusion probabilities p(LA), p(LT), and p(LC), respectively, said inclusion probability of label L defined as the probability that an ancestor graph of label L of an object o chosen at random from a corpus
contains L, and said similarity between said target object T and said candidate object C is found based on the following mathematical function;and said inclusion probability is given by;
pi(L)=p(L
Terms(Anc(o))).
-
Specification