Systems and methods for automatically categorizing unstructured text
First Claim
Patent Images
1. A method for processing unstructured text, comprising:
- receiving a message stream, the message stream including a plurality of unstructured text messages originating from at least one homogeneous source;
capturing at least a subset of the text messages as an exploration set;
displaying the text messages of the exploration set to an analyst for review;
receiving at least one text category from the analyst for each displayed text message;
associating each text category with at least one text message within the exploration set, the associated categories and messages providing a classification model;
initiating an automated training process to categorize text messages based on the classification model.
10 Assignments
0 Petitions
Accused Products
Abstract
Systems, methods and software products analyze messages of a message stream based upon human generated concept recognizers. A sample set of messages, representative of messages from the message stream, are analyzed to determine interesting or useful categories. Text categorization engines are then trained, using the sample set and text classifiers are published. These text classifiers are then used to categorizing further text messages from the message stream.
-
Citations
9 Claims
-
1. A method for processing unstructured text, comprising:
-
receiving a message stream, the message stream including a plurality of unstructured text messages originating from at least one homogeneous source;
capturing at least a subset of the text messages as an exploration set;
displaying the text messages of the exploration set to an analyst for review;
receiving at least one text category from the analyst for each displayed text message;
associating each text category with at least one text message within the exploration set, the associated categories and messages providing a classification model;
initiating an automated training process to categorize text messages based on the classification model. - View Dependent Claims (2, 3)
-
-
4. A method for processing unstructured text with selected concept identifiers, comprising:
-
receiving a message stream, the message stream including a plurality of unstructured text messages originating from at least one homogeneous source;
establishing from the message stream a distinct training set having at least one text message as a training document;
establishing from the message stream a distinct audit set having at least one text message as an audit document;
receiving from an analyst at least one text category, the text category common in the training set, and at least one target category;
reviewing each training document to determine a pseudo instance of the target category and, in response to a positive determination assigning a point value to the training document;
reviewing each audit document to determine a pseudo instance of the target category and, in response to a positive determination assigning a point value to the audit document;
concurrently displaying to the analyst a training document and an audit document;
requesting the auditor to indicate the presence or absence of at least one target category in each displayed text document;
in response to an indicated presence, recording a positive association of the text message to the target category;
repeating the concurrent display of a training document and an audit document until all members of both sets have been displayed;
comparing the recorded positive associations with assigned point values, a positive correlation permitting the pseudo instance text to be added to a text classifier;
utilizing the text classifier in an automated process to evaluate the message stream. - View Dependent Claims (5, 6, 7, 8)
-
-
9. A method for processing unstructured text with selected concept identifiers, comprising:
-
receiving a message stream, the message stream including a plurality of unstructured text messages originating from at least one homogeneous source;
establishing from the message stream a distinct training set having at least one text message as a training document;
establishing from the message stream a distinct audit set having at least one text message as an audit document;
receiving from an analyst at least one text category, the text category common in the training set, and at least one target category;
concurrently displaying to the analyst a training document and an audit document;
requesting the auditor to indicate the presence or absence of at least one pseudo target category in each displayed text document;
in response to an indicated presence, recording a positive association of the text message and the pseudo target category;
repeating the concurrent display of a training document and an audit document until all members of both sets have been displayed;
reviewing the recorded positive association text documents and associated pseudo terms to establish a text classifier;
utilizing the text classifier in an automated process to evaluate the message stream, the text messages of the text stream evaluated by the text classifier to identify pseudo categories.
-
Specification