Systems and methods for automatically categorizing unstructured text
First Claim
Patent Images
1. A computer implemented method for identifying a set of categories for unstructured text messages and training an automated classifier therefor, the method comprising:
- from a stream of the unstructured text messages captured in computer readable form, selecting a subset thereof for presentation to a user as an exploration set, the subset selected from the stream by a programmed computer, wherein the selection of the exploration set is in a generally random manner though in accord with one or more set delimiting criteria provided by the user;
via a display of the programmed computer, providing the user with both (i) a reviewable presentation of each unstructured text message selected for presentation as part of the exploration set and (ii) a flag definition and assignment interface, whereby the user defines categories for the unstructured text messages and flags at least one message of the exploration set as associated with each of the categories so defined;
via the display of the programmed computer, providing the user with a reviewable presentation of a training subset of the unstructured text messages, wherein each of the unstructured text messages of the training subset is presented together with a category selection interface whereby the user accumulates, for each of at least a subset of the categories, a respective pool of training instances from the training subset for use in training an automated classifier; and
training an automated classifier to classify individual ones of the unstructured text messages using the training subset.
10 Assignments
0 Petitions
Accused Products
Abstract
Systems, methods and software products analyze messages of a message stream based upon human generated concept recognizers. A sample set of messages, representative of messages from the message stream, are analyzed to determine interesting or useful categories. Text categorization engines are then trained, using the sample set and text classifiers are published. These text classifiers are then used to categorizing further text messages from the message stream.
-
Citations
23 Claims
-
1. A computer implemented method for identifying a set of categories for unstructured text messages and training an automated classifier therefor, the method comprising:
-
from a stream of the unstructured text messages captured in computer readable form, selecting a subset thereof for presentation to a user as an exploration set, the subset selected from the stream by a programmed computer, wherein the selection of the exploration set is in a generally random manner though in accord with one or more set delimiting criteria provided by the user; via a display of the programmed computer, providing the user with both (i) a reviewable presentation of each unstructured text message selected for presentation as part of the exploration set and (ii) a flag definition and assignment interface, whereby the user defines categories for the unstructured text messages and flags at least one message of the exploration set as associated with each of the categories so defined; via the display of the programmed computer, providing the user with a reviewable presentation of a training subset of the unstructured text messages, wherein each of the unstructured text messages of the training subset is presented together with a category selection interface whereby the user accumulates, for each of at least a subset of the categories, a respective pool of training instances from the training subset for use in training an automated classifier; and training an automated classifier to classify individual ones of the unstructured text messages using the training subset. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A computer implemented method for identifying a set of categories for unstructured text messages and training an automated classifier therefor, the method comprising:
-
from a stream of the unstructured text messages captured in computer readable form, selecting a subset thereof for presentation to a user as an exploration set, the subset selected from the stream by a programmed computer; via a display of the programmed computer, providing the user with both (i) a reviewable presentation of each unstructured text message selected for presentation as part of the exploration set and (ii) a flag definition and assignment interface, whereby the user defines categories for the unstructured text messages and flags at least one message of the exploration set as associated with each of the categories so defined; via the display of the programmed computer, providing the user with a reviewable presentation of a training subset of the unstructured text messages, wherein each of the unstructured text messages of the training subset is presented together with a category selection interface whereby the user accumulates, for each of at least a subset of the categories, a respective pool of training instances from the training subset for use in training an automated classifier; training an automated classifier to classify individual ones of the unstructured text messages using the training subset; successively refining the training of the automated classifier based on successive additions, by the user using the selection interface, of further unstructured text messages to the training subset; evaluating the successively refined training by applying the automated classifier to an audit subset of the unstructured text messages; and via the display, providing the user with a reviewable presentation of the audit subset of unstructured text messages, wherein each of the unstructured text messages of the audit subset is presented together with the category selection interface whereby the user accumulates, for each of at least a subset of the categories, a respective pool of audit instances from the audit subset for use in the evaluating, wherein unstructured text messages from the audit and training subsets as well as respective category selections therefor are presented in a uniform manner, such that the user is generally not aware of which unstructured text messages comprise the audit subset and which unstructured text messages comprise the training subset.
-
-
16. A computer implemented method for identifying a set of categories for unstructured text messages and training an automated classifier therefor, the method comprising:
-
from a stream of the unstructured text messages captured in computer readable form, selecting a subset thereof for presentation to a user as an exploration set, the subset selected from the stream by a programmed computer; via a display of the programmed computer providing the user with both (i) a reviewable presentation of each unstructured text message selected for presentation as part of the exploration set and (ii) a flag definition and assignment interface, whereby the user defines categories for the unstructured text messages and flags at least one message of the exploration set as associated with each of the categories so defined; via the display of the programmed computer, providing the user with a reviewable presentation of a training subset of the unstructured text messages, wherein each of the unstructured text messages of the training subset is presented together with a category selection interface whereby the user accumulates, for each of at least a subset of the categories, a respective pool of training instances from the training subset for use in training an automated classifier; training an automated classifier to classify individual ones of the unstructured text messages using the training subset; and successively refining the training of the automated classifier based on successive additions, by the user using the selection interface, of further unstructured text messages to the training subset; and evaluating the successively refined training by applying the automated classifier to an audit subset of the unstructured text messages, wherein the training subset and the audit subset are disjoint sets of unstructured text messages drawn from the exploration set in a manner generally not perceivable by the user.
-
-
17. A system comprising:
-
a store of unstructured text messages captured in computer readable form; a computer coupled to the store and programmed to select and present a subset of the unstructured text messages to a user as an exploration set; the programmed computer providing the user with both (i) a reviewable presentation of each unstructured text message selected for presentation as part of the exploration set and (ii) a flag definition and assignment interface, whereby the user defines categories for the unstructured text messages and flags at least one message of the exploration set as associated with each of the categories so defined; the programmed computer further providing the user with a reviewable presentation of a training subset of the unstructured text messages, wherein each of the unstructured text messages of the training subset is presented together with a category selection interface whereby the user accumulates, for each of at least a subset of the categories, a respective pool of training instances from the training subset for use in training an automated classifier; the programmed computer training an automated classifier to classify individual ones of the unstructured text messages using the training subset, wherein the programmed computer successively refines the training of the automated classifier based on successive additions, by the user using the selection interface, of further unstructured text messages to the training subset, and wherein the programmed computer evaluates the successively refined training by applying the automated classifier to an audit subset of the unstructured text messages; the programmed computer further providing the user with a reviewable presentation of the audit subset of unstructured text messages, wherein each of the unstructured text messages of the audit subset is presented together with the category selection interface whereby the user accumulates, for each of at least a subset of the categories, a respective pool of audit instances from the audit subset for use in the evaluating; and wherein the selection interface includes a next page control, wherein for those unstructured text messages of the training pool for which categories have been selected on a current page, the next page control adds as training instances corresponding unstructured text together with the categories selected by the user therefor, and initiates a retraining of the automated classifier therewith, and wherein for those unstructured text messages of the audit pool for which categories have been selected on a current page, the next page control adds as audit instances corresponding unstructured text together with the categories selected by the user therefor, and initiates classification of unstructured text messages from the newly added-to audit pool using the retrained automated classifier. - View Dependent Claims (18, 19, 20, 21, 22)
-
-
23. A computer implemented method for identifying a set of categories for unstructured text messages and training an automated classifier therefor, the method comprising:
-
from a stream of the unstructured text messages captured in computer readable form, selecting a subset thereof for presentation to a user as an exploration set, the subset selected from the stream by a programmed computer; via a display of the programmed computer, providing the user with both (i) a reviewable presentation of each unstructured text message selected for presentation as part of the exploration set and (ii) a flag definition and assignment interface, whereby the user defines categories for the unstructured text messages and flags at least one message of the exploration set as associated with each of the categories so defined; via the display of the programmed computer, providing the user with a reviewable presentation of a training subset of the unstructured text messages, wherein each of the unstructured text messages of the training subset is presented together with a category selection interface whereby the user accumulates, for each of at least a subset of the categories, a respective pool of training instances from the training subset for use in training an automated classifier; training an automated classifier to classify individual ones of the unstructured text messages using the training subset; successively refining the training of the automated classifier based on successive additions, by the user using the selection interface, of further unstructured text messages to the training subset; evaluating the successively refined training by applying the automated classifier to an audit subset of the unstructured text messages; and via the display, providing the user with a reviewable presentation of the audit subset of unstructured text messages, wherein each of the unstructured text messages of the audit subset is presented together with the category selection interface whereby the user accumulates, for each of at least a subset of the categories, a respective pool of audit instances from the audit subset for use in the evaluating, wherein the user interface selection includes a next page control, wherein for those unstructured text messages of the training pool for which categories have been selected on a current page, the next page control adds as training instances corresponding unstructured text together with the categories selected by the user therefor, and initiates a retraining of the automated classifier therewith, and wherein for those unstructured text messages of the audit pool for which categories have been selected on a current page, the next page control adds as audit instances corresponding unstructured text together with the categories selected by the user therefor, and initiates classification of unstructured text messages from the newly added-to audit pool using the retrained automated classifier.
-
Specification