Method and system for remediating topic drift in near-real-time classification of customer feedback
First Claim
1. A computer-implemented method of classifying documents including executing instructions stored on a non-transitory computer-readable medium, said method comprising:
- receiving a stream of documents from at least one user, each document including a topic of information relating to a customer support issue or sentiment;
in near-real-time,performing a first classification of each of the received documents using a plurality of trained classifiers, the classification based on a voting by the trained classifiers, each document labeled according to a similar topic;
generating a trend of a change of a number of documents classified by each classifier;
analyzing the trend for anomalies in the trend related to an introduction of a new feature or product, the anomaly occurring within a predetermined time period after the introduction of the new product or feature;
determining a drift of the topic of one or more of the first classifications using the analyzed trend, the drift related to received documents that include information relating to an unclassified customer support issue or sentiment;
if the determined drift exceeds a predetermined threshold range, rebuilding the plurality of classifiers to include a second set of classifiers trained to recognize the unclassified customer support issue or sentiment;
performing a second classification of each of the received documents using the rebuilt plurality of trained classifiers; and
outputting a frequency of occurrence of each classifier, the frequency based on said classifying.
2 Assignments
0 Petitions
Accused Products
Abstract
A method and system of classifying documents is provided. The method includes receiving a stream of documents from at least one user wherein each document includes a topic of information relating to a customer support issue or sentiment. The method includes classifying each of the received documents using a plurality of trained classifiers, the classification based on a voting by the trained classifiers, each document labeled according to a similar topic. A drift of the topic of one or more of the classifications is determined wherein the drift is related to the received documents that include information relating to an unclassified customer support issue or sentiment. If the determined drift exceeds a predetermined threshold range, rebuilding the plurality of classifiers to include a second set of classifiers trained to recognize the unclassified customer support issue or sentiment.
-
Citations
20 Claims
-
1. A computer-implemented method of classifying documents including executing instructions stored on a non-transitory computer-readable medium, said method comprising:
-
receiving a stream of documents from at least one user, each document including a topic of information relating to a customer support issue or sentiment; in near-real-time, performing a first classification of each of the received documents using a plurality of trained classifiers, the classification based on a voting by the trained classifiers, each document labeled according to a similar topic; generating a trend of a change of a number of documents classified by each classifier; analyzing the trend for anomalies in the trend related to an introduction of a new feature or product, the anomaly occurring within a predetermined time period after the introduction of the new product or feature; determining a drift of the topic of one or more of the first classifications using the analyzed trend, the drift related to received documents that include information relating to an unclassified customer support issue or sentiment; if the determined drift exceeds a predetermined threshold range, rebuilding the plurality of classifiers to include a second set of classifiers trained to recognize the unclassified customer support issue or sentiment; performing a second classification of each of the received documents using the rebuilt plurality of trained classifiers; and outputting a frequency of occurrence of each classifier, the frequency based on said classifying. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer system for remediating topic drift in a corpus of documents, the system comprising a computer device coupled to a user interface and a memory device, the system comprising:
-
a classifier and a clustering engine configured to execute on the computing device, the classifier configured to receive a stream of documents from at least one user using the user interface, each document including a topic of information relating to a customer support issue or sentiment, the classifier configured to classify each of the received documents using a plurality of trained classifiers, the classification based on a voting by the trained classifiers, each document labeled according to a similar topic, said clustering engine configured to cluster the plurality of documents into respective groups based on the determined topic using said clustering engine, the clustering engine configured to apply a word analysis; a trending module configured to generate a trend of a change of a number of documents classified by each classifier, said trending module configured to analyze the trend for anomalies in the trend related to an introduction of a new feature or product, the anomaly occurring within a predetermined time period after the introduction of the new product or feature; and a drift engine configured to execute on the computing device and configured to determine a drift of the topic of one or more of the classifications using the analyzed trend, the drift related to received documents that include information relating to an unclassified customer support issue or sentiment; if the determined drift exceeds a predetermined threshold range, at least one of rebuild the plurality of classifiers to include a second set of classifiers trained to recognize the unclassified customer support issue or sentiment and re-cluster the plurality of documents into an increased number of groups based on the determined topic using the clustering engine. - View Dependent Claims (8, 9, 10, 11, 12, 13, 14)
-
-
15. One or more non-transitory computer-readable storage media having computer-executable instructions embodied thereon, wherein when executed by at least one processor, the computer-executable instructions cause the processor to:
-
receive a stream of documents from at least one user; in near-real-time, determine a topic of each received document wherein the topic includes information relating to a customer support issue or sentiment; cluster the plurality of documents into respective groups based on the determined topic using a clustering engine, the clustering engine applying a word analysis; generate a trend of a change of a number of documents classified by each classifier; analyze the trend for anomalies in the trend, the anomalies related to an introduction of a new feature or product, the anomaly occurring within a predetermined time period after the introduction of the new product or feature; and determine a drift of the determined topic of one or more groups using the analyzed trend, the drift related to received documents that include information relating to an undetermined customer support issue or sentiment; if the determined drift exceeds a predetermined threshold range, increase a number of allowed groups; in a batch process, determine a topic of each received document wherein the topic includes information relating to a customer support issue or sentiment; re-cluster the plurality of documents into the increased number of groups based on the determined topic using the clustering engine; and output a frequency of occurrence of each topic, the frequency based on said clustering. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification