Method and system for remediating topic drift in near-real-time classification of customer feedback

US 9,111,218 B1
Filed: 06/22/2012
Issued: 08/18/2015
Est. Priority Date: 12/27/2011
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method of classifying documents including executing instructions stored on a non-transitory computer-readable medium, said method comprising:

receiving a stream of documents from at least one user, each document including a topic of information relating to a customer support issue or sentiment;

in near-real-time,performing a first classification of each of the received documents using a plurality of trained classifiers, the classification based on a voting by the trained classifiers, each document labeled according to a similar topic;

generating a trend of a change of a number of documents classified by each classifier;

analyzing the trend for anomalies in the trend related to an introduction of a new feature or product, the anomaly occurring within a predetermined time period after the introduction of the new product or feature;

determining a drift of the topic of one or more of the first classifications using the analyzed trend, the drift related to received documents that include information relating to an unclassified customer support issue or sentiment;

if the determined drift exceeds a predetermined threshold range, rebuilding the plurality of classifiers to include a second set of classifiers trained to recognize the unclassified customer support issue or sentiment;

performing a second classification of each of the received documents using the rebuilt plurality of trained classifiers; and

outputting a frequency of occurrence of each classifier, the frequency based on said classifying.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and system of classifying documents is provided. The method includes receiving a stream of documents from at least one user wherein each document includes a topic of information relating to a customer support issue or sentiment. The method includes classifying each of the received documents using a plurality of trained classifiers, the classification based on a voting by the trained classifiers, each document labeled according to a similar topic. A drift of the topic of one or more of the classifications is determined wherein the drift is related to the received documents that include information relating to an unclassified customer support issue or sentiment. If the determined drift exceeds a predetermined threshold range, rebuilding the plurality of classifiers to include a second set of classifiers trained to recognize the unclassified customer support issue or sentiment.

Citations

20 Claims

1. A computer-implemented method of classifying documents including executing instructions stored on a non-transitory computer-readable medium, said method comprising:
- receiving a stream of documents from at least one user, each document including a topic of information relating to a customer support issue or sentiment;
  
  in near-real-time,performing a first classification of each of the received documents using a plurality of trained classifiers, the classification based on a voting by the trained classifiers, each document labeled according to a similar topic;
  
  generating a trend of a change of a number of documents classified by each classifier;
  
  analyzing the trend for anomalies in the trend related to an introduction of a new feature or product, the anomaly occurring within a predetermined time period after the introduction of the new product or feature;
  
  determining a drift of the topic of one or more of the first classifications using the analyzed trend, the drift related to received documents that include information relating to an unclassified customer support issue or sentiment;
  
  if the determined drift exceeds a predetermined threshold range, rebuilding the plurality of classifiers to include a second set of classifiers trained to recognize the unclassified customer support issue or sentiment;
  
  performing a second classification of each of the received documents using the rebuilt plurality of trained classifiers; and
  
  outputting a frequency of occurrence of each classifier, the frequency based on said classifying.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. A method in accordance with claim 1, wherein the plurality of documents includes one or more of electronic mail documents, forum post documents, telephone call transcript documents and chat session record documents.
  - 3. A method in accordance with claim 1, wherein determining a drift of the topic of one or more of the classifications comprises determining a drift of the topic of one or more classifications using a manual review by a subject matter expert.
  - 4. A method in accordance with claim 1, wherein determining a drift of the topic of one or more of the classifications comprises determining a drift of the topic of one or more classifications using an automatic classifier training method.
  - 5. A method in accordance with claim 1, further comprising trending a frequency of occurrence of classifications to determine mislabeling of documents.
  - 6. A method in accordance with claim 1, further comprising reclassifying the plurality of documents based on an expiration of a predetermined time period.

7. A computer system for remediating topic drift in a corpus of documents, the system comprising a computer device coupled to a user interface and a memory device, the system comprising:
- a classifier and a clustering engine configured to execute on the computing device, the classifier configured to receive a stream of documents from at least one user using the user interface, each document including a topic of information relating to a customer support issue or sentiment, the classifier configured to classify each of the received documents using a plurality of trained classifiers, the classification based on a voting by the trained classifiers, each document labeled according to a similar topic, said clustering engine configured to cluster the plurality of documents into respective groups based on the determined topic using said clustering engine, the clustering engine configured to apply a word analysis;
  
  a trending module configured to generate a trend of a change of a number of documents classified by each classifier, said trending module configured to analyze the trend for anomalies in the trend related to an introduction of a new feature or product, the anomaly occurring within a predetermined time period after the introduction of the new product or feature; and
  
  a drift engine configured to execute on the computing device and configured to determine a drift of the topic of one or more of the classifications using the analyzed trend, the drift related to received documents that include information relating to an unclassified customer support issue or sentiment;
  
  if the determined drift exceeds a predetermined threshold range, at least one of rebuild the plurality of classifiers to include a second set of classifiers trained to recognize the unclassified customer support issue or sentiment and re-cluster the plurality of documents into an increased number of groups based on the determined topic using the clustering engine.
- View Dependent Claims (8, 9, 10, 11, 12, 13, 14)
- - 8. A system in accordance with claim 7, wherein at least one of said classifier and said clustering engine is configured to receive a plurality of documents containing text referring to a customer support issue.
  - 9. A system in accordance with claim 7, wherein at least one of said classifier and said clustering engine is configured to receive a plurality of documents including one or more of electronic mail documents, forum post documents, telephone call transcript documents and chat session record documents.
  - 10. A system in accordance with claim 7, wherein at least one of said classifier and said clustering engine is configured to determine the drift of the topic of one or more classifications using a manual review by a subject matter expert.
  - 11. A system in accordance with claim 7, wherein at least one of said classifier and said clustering engine is configured to determine the drift of the topic of one or more classifications using an automatic classifier training method.
  - 12. A system in accordance with claim 7, further comprising a trending engine configured to trend a frequency of occurrence of classifications to determine mislabeling of documents.
  - 13. A system in accordance with claim 7, wherein said clustering engine is configured to re-cluster the plurality of documents based on an expiration of a predetermined time period.
  - 14. A system in accordance with claim 7, wherein said clustering engine is configured to determine a drift of the determined topic of one or more groups using a separate clustering of at least one of the groups to determine a degree of non-similarity of the documents in the group, if the degree of non-similarity exceeds a predetermined non-similarity threshold said clustering engine is configured to re-cluster the plurality of documents.

15. One or more non-transitory computer-readable storage media having computer-executable instructions embodied thereon, wherein when executed by at least one processor, the computer-executable instructions cause the processor to:
- receive a stream of documents from at least one user;
  
  in near-real-time,determine a topic of each received document wherein the topic includes information relating to a customer support issue or sentiment;
  
  cluster the plurality of documents into respective groups based on the determined topic using a clustering engine, the clustering engine applying a word analysis;
  
  generate a trend of a change of a number of documents classified by each classifier;
  
  analyze the trend for anomalies in the trend, the anomalies related to an introduction of a new feature or product, the anomaly occurring within a predetermined time period after the introduction of the new product or feature; and
  
  determine a drift of the determined topic of one or more groups using the analyzed trend, the drift related to received documents that include information relating to an undetermined customer support issue or sentiment;
  
  if the determined drift exceeds a predetermined threshold range, increase a number of allowed groups;
  
  in a batch process,determine a topic of each received document wherein the topic includes information relating to a customer support issue or sentiment;
  
  re-cluster the plurality of documents into the increased number of groups based on the determined topic using the clustering engine; and
  
  output a frequency of occurrence of each topic, the frequency based on said clustering.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The computer-readable storage media of claim 15, wherein the computer-executable instructions further cause the processor to determine a drift of the determined topic of one or more groups using a manual review by a subject matter expert.
  - 17. The computer-readable storage media of claim 15, wherein the computer-executable instructions further cause the processor to determine a drift of the determined topic of one or more groups using a separate clustering of at least one of the groups to determine a degree of non-similarity of the documents in the group, if the degree of non-similarity exceeds a predetermined non-similarity threshold, re-clustering the plurality of documents.
  - 18. The computer-readable storage media of claim 15, wherein the computer-executable instructions further cause the processor to trend a frequency of occurrence of topics associated with each group to determine mislabeling of documents.
  - 19. The computer-readable storage media of claim 15, wherein the computer-executable instructions further cause the processor to re-cluster the plurality of documents based on an expiration of a predetermined time period.
  - 20. The computer-readable storage media of claim 15, wherein the word analysis is based on a similarity of words contained in each document.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Lewis, Glenn M., Buryak, Kirill, Ben-Artzi, Aner, Peng, Jun, Benbarak, Nadav
Primary Examiner(s)
Chaki, Kakali
Assistant Examiner(s)
PELLETT, DANIEL T

Application Number

US13/530,667
Time in Patent Office

1,152 Days
Field of Search

None
US Class Current

1/1
CPC Class Codes

G06F 16/35   Clustering; Classification

G06N 5/04   Inference or reasoning models

G06Q 30/01   Customer relationship services

Method and system for remediating topic drift in near-real-time classification of customer feedback

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for remediating topic drift in near-real-time classification of customer feedback

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links