Classification of transcripts by sentiment

US 10,432,789 B2
Filed: 02/09/2017
Issued: 10/01/2019
Est. Priority Date: 02/09/2017
Status: Active Grant

First Claim

Patent Images

1. A method for classifying a sentiment of a dialog transcript, the method comprising:

training a lexicon, wherein the training comprises;

receiving a training set of dialog transcripts;

splitting the training set into a negative set and a non-negative set based on a seed;

identifying n-grams in the dialog transcripts;

computing, for each n-gram, a polarity score that corresponds to the likelihood of the n-gram having either a negative or a non-negative sentiment, wherein the computing the polarity score for a particular n-gram comprises comparing the frequency of the particular n-gram in the negative set to the frequency of the particular n-gram in the non-negative set;

identifying prominent n-grams based on each n-gram'"'"'s polarity score;

expanding the lexicon by adding the prominent n-grams, which are not already in the lexicon, to the lexicon; and

repeating the splitting, computing, identifying, and expanding for a plurality of iterations to obtain a trained lexicon, wherein the splitting for each iteration uses the expanded lexicon from the previous iteration; and

classifying the sentiment of the dialog transcript using the trained lexicon wherein the classifying comprises;

receiving a dialog transcript;

selecting an utterance in the dialog transcript;

identifying n-grams in the utterance;

obtaining a polarity score for each n-gram using the trained lexicon;

determining the utterance is negative or non-negative based, at least, on the polarity scores for each n-gram;

repeating the selecting, identifying, computing, and determining for other utterances in the dialog transcript; and

distinguishing the sentiment of the dialog transcript as negative or non-negative based on the negative or non-negative utterances determined in the dialog transcript.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for distinguishing the sentiment of utterances in a dialog is disclosed. The system utilizes a lexicon that is expanded from a seed using unsupervised machine learning. What results is a sentiment classifier that may be optimized for a variety of environments (e.g., conversation, chat, email, etc.), each of which may communicate sentiment differently.

39 Citations

View as Search Results

19 Claims

1. A method for classifying a sentiment of a dialog transcript, the method comprising:
- training a lexicon, wherein the training comprises;
  
  receiving a training set of dialog transcripts;
  
  splitting the training set into a negative set and a non-negative set based on a seed;
  
  identifying n-grams in the dialog transcripts;
  
  computing, for each n-gram, a polarity score that corresponds to the likelihood of the n-gram having either a negative or a non-negative sentiment, wherein the computing the polarity score for a particular n-gram comprises comparing the frequency of the particular n-gram in the negative set to the frequency of the particular n-gram in the non-negative set;
  
  identifying prominent n-grams based on each n-gram'"'"'s polarity score;
  
  expanding the lexicon by adding the prominent n-grams, which are not already in the lexicon, to the lexicon; and
  
  repeating the splitting, computing, identifying, and expanding for a plurality of iterations to obtain a trained lexicon, wherein the splitting for each iteration uses the expanded lexicon from the previous iteration; and
  
  classifying the sentiment of the dialog transcript using the trained lexicon wherein the classifying comprises;
  
  receiving a dialog transcript;
  
  selecting an utterance in the dialog transcript;
  
  identifying n-grams in the utterance;
  
  obtaining a polarity score for each n-gram using the trained lexicon;
  
  determining the utterance is negative or non-negative based, at least, on the polarity scores for each n-gram;
  
  repeating the selecting, identifying, computing, and determining for other utterances in the dialog transcript; and
  
  distinguishing the sentiment of the dialog transcript as negative or non-negative based on the negative or non-negative utterances determined in the dialog transcript.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
- - 2. The method according to claim 1, wherein the dialog transcripts in the training set are all of a particular type.
  - 3. The method according to claim 2, further comprising:
    - selecting the seed based on the particular type.
  - 4. The method according to claim 3, wherein the particular type is a telephone call, a text chat, an email, or a website post.
  - 5. The method according to claim 3, wherein the seed comprises words, phrases, punctuations, or dialog characteristics that, for the particular type, are indicative of a negative or non-negative sentiment.
  - 6. The method according to claim 1, wherein the identifying n-grams in the dialog transcripts comprises:
    - for each dialog transcript;
      
      selecting each word in the dialog transcript as a uni-gram;
      
      selecting each group of two sequential words in the dialog transcript as a bi-gram;
      
      selecting each group of three sequential words in the dialog transcript as a tri-gram; and
      
      identifying n-grams in the dialog transcript as the uni-grams, bi-grams, and tri-grams.
  - 7. The method according to claim 1, wherein:
    - the polarity score has a sign that corresponds to the sentiment of the n-gram, and wherein;
      
      the polarity score has an amplitude that corresponds to the likelihood that the n-gram is indicative of the sentiment.
  - 8. The method according to claim 7, wherein the identifying prominent n-grams based on each n-gram'"'"'s polarity score, comprises:
    - comparing the amplitude each n-gram to a threshold, andidentifying prominent n-grams as n-grams having a polarity-score amplitude larger than the threshold.
  - 9. The method according to claim 1, wherein repeating the splitting, computing, identifying, and expanding for a plurality of iterations to obtain a trained lexicon, comprises:
    - repeating for three iterations or until an iteration results in no expansion of the lexicon.
  - 10. The method according to claim 1, wherein the identifying n-grams in the utterance comprises:
    - selecting each word in the utterance as a uni-gram;
      
      selecting each group of two sequential words in the utterance as a bi-gram; and
      
      selecting each group of three sequential words in the utterance as a tri-gram.
  - 11. The method according to claim 10, wherein determining the utterance is negative or non-negative based, at least, on the polarity scores for each n-gram, comprises:
    - computing a uni-gram sentiment score as the sum of the polarity scores of each uni-gram in the utterance;
      
      computing a bi-gram sentiment score as the sum of the polarity scores of each bi-gram in the utterance; and
      
      computing a tri-gram sentiment score as the sum of the polarity scores of each tri-gram in the utterance.
  - 12. The method according to claim 11, wherein the determining the utterance is negative or non-negative based, at least, on the polarity scores for each n-gram, further comprises:
    - comparing the uni-gram, bi-gram, and tri-gram sentiment scores.
  - 13. The method according to claim 12, wherein the determining the utterance is negative or non-negative based, at least, on the polarity scores for each n-gram, further comprises:
    - based on the comparison applying a heuristic to determine that the utterance is negative or non-negative.
  - 14. The method according to claim 13, wherein the applying a heuristic is:
    - searching for a prominent word indicating sentiment, ordetermining the sentiment of other utterances in the dialog transcript.
  - 15. The method according to claim 1, wherein the distinguishing the sentiment of the dialog transcript as negative or non-negative based on the negative or non-negative utterances determined in the dialog transcript, comprises:
    - comparing the number of negative or non-negative utterances in the dialog transcript to a threshold.
  - 16. The method according to claim 1, wherein the distinguishing the sentiment of the dialog transcript as negative or non-negative based on the negative or non-negative utterances determined in the dialog transcript, comprises:
    - determining the positions negative utterances in the dialog transcript.

17. A call center transcription system comprising:
- a call recorder recording conversations between agents and customers in a call center;
  
  a transcriber receiving the recorded conversations and creating dialog transcripts corresponding to the conversations; and
  
  a sentiment classifier comprising a processor and a memory, wherein the processor executes software stored in the memory to;
  
  receive a dialog transcript from the transcriber;
  
  retrieve a trained lexicon from the memory, wherein the trained lexicon is the product of unsupervised machine learning, wherein the unsupervised machine learning comprises;
  
  analyzing, using a seed, a set of unannotated dialog transcripts to identify negative words and phrases;
  
  adding identified negative words and phrases to the seed to produce an expanded lexicon;
  
  analyzing the set of unannotated dialog transcripts using the expanded lexicon to find new negative words and phrases;
  
  adding the new negative words and phrases to the expanded lexicon to produce a further expanded lexicon; and
  
  repeating the analyzing and adding to produce the machine-generated lexicon that is larger than the seed;
  
  select an utterance from the dialog transcript;
  
  compute, using the trained lexicon, polarities for each uni-gram, bi-gram, and tri-gram in the utterance;
  
  calculate a uni-gram sentiment score, a bi-gram sentiment score, and a tri-gram sentiment score that are based on the sum of the polarity scores for each uni-gram, bi-gram, and tri-gram respectively;
  
  determine that the utterance is negative or non-negative based on the uni-gram, bi-gram and tri-gram sentiment scores;
  
  repeat selecting, computing, calculating, and determining for other utterances in the dialog transcript;
  
  tag the dialog transcript as negative or non-negative based on the determined negative utterances; and
  
  store the tagged dialog transcripts to the memory.
- View Dependent Claims (18)
- - 18. The call center transcription system according to claim 17, wherein an utterance is determined negative when the uni-gram, bi-gram, and tri-gram sentiment scores are all negative.

19. A non-transitory computer readable medium containing program instructions that upon execution by a computer processor cause the processor to:
- receive a training set of dialog transcripts;
  
  split the training set into a negative set and a non-negative set based on a seed;
  
  identify n-grams in the dialog transcripts;
  
  compute, for each n-gram, a polarity score that corresponds to the likelihood of the n-gram having a negative or a non-negative sentiment, wherein the computing the polarity score for a particular n-gram comprises comparing the frequency of the particular n-gram in the negative set to the frequency of the particular n-gram in the non-negative set;
  
  identify prominent n-grams based on each n-gram'"'"'s polarity score;
  
  expand the lexicon by adding the prominent n-grams, which are not already in the lexicon, to the lexicon; and
  
  repeat the splitting, computing, identifying, and expanding for a plurality of iterations to obtain a trained lexicon, wherein the splitting for each iteration uses the expanded lexicon from the previous iteration.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Verint Systems Incorporated
Original Assignee
Verint Systems Limited (Verint Systems Incorporated)
Inventors
Winter, Yaron, Carmi, Saar
Primary Examiner(s)
Leland, III, Edwin S

Application Number

US15/428,599
Publication Number

US 20180226071A1
Time in Patent Office

964 Days
Field of Search

704235
US Class Current
CPC Class Codes

G06F 40/284   Lexical analysis, e.g. toke...

G06F 40/30   Semantic analysis

G06F 40/35   Discourse or dialogue repre...

G10L 15/063   Training

G10L 15/197   Probabilistic grammars, e.g...

H04M 2201/40   using speech recognition sp...

H04M 2203/2061   Language aspects

H04M 3/5183   Call or contact centers wit...

Classification of transcripts by sentiment

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

39 Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Classification of transcripts by sentiment

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

39 Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links