USING RULE INDUCTION TO IDENTIFY EMERGING TRENDS IN UNSTRUCTURED TEXT STREAMS
First Claim
Patent Images
1. A computer-implemented method, comprising:
- selecting a subset V of documents from a set U of documents;
generating at least one Boolean combination of terms that partitions the set U into a plurality of categories that represent a generalized, statistically based model of the selected subset V wherein the categories are disjoint insofar as each document of U is included in only one category of the partition;
generating a descriptive label for each of the disjoint categories from the Boolean combination of terms for that category;
aggregating the descriptive labels to form a description of the subset V; and
displaying the aggregating descriptive labels to a user.
1 Assignment
0 Petitions
Accused Products
Abstract
A method for identifying emerging concepts in unstructured text streams comprises: selecting a subset V of documents from a set U of documents; generating at least one Boolean combination of terms that partitions the set U into a plurality of categories that represent a generalized, statistically based model of the selected subset V wherein the categories are disjoint inasmuch as each document of U is included in only one category of the partition; and generating a descriptive label for each of the disjoint categories from the Boolean combination of terms for that category.
68 Citations
20 Claims
-
1. A computer-implemented method, comprising:
-
selecting a subset V of documents from a set U of documents; generating at least one Boolean combination of terms that partitions the set U into a plurality of categories that represent a generalized, statistically based model of the selected subset V wherein the categories are disjoint insofar as each document of U is included in only one category of the partition; generating a descriptive label for each of the disjoint categories from the Boolean combination of terms for that category; aggregating the descriptive labels to form a description of the subset V; and displaying the aggregating descriptive labels to a user. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system, the system executing steps for:
-
using a decision tree to classify documents from a set U of documents into categories based on a subset V of U; converting the decision tree into a logically equivalent rule set, wherein each document of U is guaranteed to only be classified by one rule of the rule set; labeling, for each one of the categories based on the subset V, a text event; and displaying a list of results based on the text event labels to a user. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
-
16. A computer program product comprising a computer useable medium including a computer readable program, wherein the computer readable program when executed on a computer causes the computer to:
-
identify a dictionary of frequently used terms in a text data set U; create a feature space that identifies the dictionary term occurrences in each document of U; apply a rule induction algorithm to the feature space over U to identify rules that classify documents into categories based on a subset V of U; use feature based antecedents of each rule to describe events; and display the events using the positive antecedents. - View Dependent Claims (17, 18, 19, 20)
-
Specification