Class description generation for clustering and categorization
First Claim
1. A method for characterizing a class of a probabilistic classifier or clustering system that classifies or clusters documents into classes and includes probabilistic model parameters, the method comprising:
- for each of a plurality of candidate words or word combinations wherein the candidate words or word combinations include natural language phrases, computing a divergence element of the class from each of a plurality of other classes of the probabilistic classifier or clustering system based on one or more probabilistic model parameters profiling the candidate word or word combination; and
selecting one or more words or word combinations including at least one natural language phrase for characterizing the class as those candidate words or word combinations for which the class has a substantial computed divergence element from at least one of the plurality of other classes of the probabilistic classifier or clustering system that is effective for distinguishing the class from at least one of the plurality of other classes; and
labeling the class based on the selected one or more words or word combinations, the labeling including constructing a semantic description of the class based on the at least one selected natural language phrase;
wherein the computing a divergence operation and the selecting operation and the labeling operation are performed by a digital processor.
1 Assignment
0 Petitions
Accused Products
Abstract
A class is to be characterized of a probabilistic classifier or clustering system that includes probabilistic model parameters. For each of a plurality of candidate words or word combinations, divergence of the class from other classes is computed based on one or more probabilistic model parameters profiling the candidate word or word combination. One or more words or word combinations are selected for characterizing the class as those candidate words or word combinations for which the class has substantial computed divergence from the other classes.
32 Citations
12 Claims
-
1. A method for characterizing a class of a probabilistic classifier or clustering system that classifies or clusters documents into classes and includes probabilistic model parameters, the method comprising:
-
for each of a plurality of candidate words or word combinations wherein the candidate words or word combinations include natural language phrases, computing a divergence element of the class from each of a plurality of other classes of the probabilistic classifier or clustering system based on one or more probabilistic model parameters profiling the candidate word or word combination; and selecting one or more words or word combinations including at least one natural language phrase for characterizing the class as those candidate words or word combinations for which the class has a substantial computed divergence element from at least one of the plurality of other classes of the probabilistic classifier or clustering system that is effective for distinguishing the class from at least one of the plurality of other classes; and labeling the class based on the selected one or more words or word combinations, the labeling including constructing a semantic description of the class based on the at least one selected natural language phrase; wherein the computing a divergence operation and the selecting operation and the labeling operation are performed by a digital processor. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A storage medium storing instructions which when executed by a digital processor perform a method for characterizing classes of a probabilistic classifier or clustering system including (i) computing divergences of classes from other classes respective to candidate phrases based on probabilistic model parameters of the probabilistic classifier or clustering system, and (ii) characterizing classes using candidate phrases that provide substantial computed divergence from other classes effective for distinguishing the class from other classes, wherein the divergences computing (i) includes:
-
computing a word-based divergence based on one or more probabilistic model parameters of the probabilistic classifier or clustering system that profile one or more words of the candidate phrase, computing a phrase-based divergence based on one or more probabilistic model parameters of the probabilistic classifier or clustering system that profile training documents containing the candidate phrase, and combining the word-based divergence and the phrase-based divergence to determine the divergence. - View Dependent Claims (9, 10, 11)
-
-
12. A method for characterizing a class of a probabilistic classifier or clustering system that classifies or clusters documents into classes and includes probabilistic model parameters, the method comprising:
-
for each of a plurality of candidate words or word combinations, computing a divergence element of the class from each of a plurality of other classes of the probabilistic classifier or clustering system based on one or more probabilistic model parameters profiling the candidate word or word combination; and selecting one or more words or word combinations for characterizing the class as those candidate words or word combinations for which the class has a substantial computed divergence element from at least one of the plurality of other classes of the probabilistic classifier or clustering system that is effective for distinguishing the class from at least one of the plurality of other classes; and labeling the class based on the selected one or more words or word combinations; wherein the computing a divergence operation and the selecting operation and the labeling operation are performed by a digital processor.
-
Specification