Classification of transcripts by sentiment
First Claim
Patent Images
1. A method for classifying a sentiment of a dialog transcript, the method comprising:
- training a lexicon, wherein the training comprises;
receiving a training set of dialog transcripts;
splitting the training set into a negative set and a non-negative set based on a seed;
identifying n-grams in the dialog transcripts;
computing, for each n-gram, a polarity score that corresponds to the likelihood of the n-gram having either a negative or a non-negative sentiment, wherein the computing the polarity score for a particular n-gram comprises comparing the frequency of the particular n-gram in the negative set to the frequency of the particular n-gram in the non-negative set;
identifying prominent n-grams based on each n-gram'"'"'s polarity score;
expanding the lexicon by adding the prominent n-grams, which are not already in the lexicon, to the lexicon; and
repeating the splitting, computing, identifying, and expanding for a plurality of iterations to obtain a trained lexicon, wherein the splitting for each iteration uses the expanded lexicon from the previous iteration; and
classifying the sentiment of the dialog transcript using the trained lexicon wherein the classifying comprises;
receiving a dialog transcript;
selecting an utterance in the dialog transcript;
identifying n-grams in the utterance;
obtaining a polarity score for each n-gram using the trained lexicon;
determining the utterance is negative or non-negative based, at least, on the polarity scores for each n-gram;
repeating the selecting, identifying, computing, and determining for other utterances in the dialog transcript; and
distinguishing the sentiment of the dialog transcript as negative or non-negative based on the negative or non-negative utterances determined in the dialog transcript.
2 Assignments
0 Petitions
Accused Products
Abstract
A system and method for distinguishing the sentiment of utterances in a dialog is disclosed. The system utilizes a lexicon that is expanded from a seed using unsupervised machine learning. What results is a sentiment classifier that may be optimized for a variety of environments (e.g., conversation, chat, email, etc.), each of which may communicate sentiment differently.
39 Citations
19 Claims
-
1. A method for classifying a sentiment of a dialog transcript, the method comprising:
-
training a lexicon, wherein the training comprises; receiving a training set of dialog transcripts; splitting the training set into a negative set and a non-negative set based on a seed; identifying n-grams in the dialog transcripts; computing, for each n-gram, a polarity score that corresponds to the likelihood of the n-gram having either a negative or a non-negative sentiment, wherein the computing the polarity score for a particular n-gram comprises comparing the frequency of the particular n-gram in the negative set to the frequency of the particular n-gram in the non-negative set; identifying prominent n-grams based on each n-gram'"'"'s polarity score; expanding the lexicon by adding the prominent n-grams, which are not already in the lexicon, to the lexicon; and repeating the splitting, computing, identifying, and expanding for a plurality of iterations to obtain a trained lexicon, wherein the splitting for each iteration uses the expanded lexicon from the previous iteration; and classifying the sentiment of the dialog transcript using the trained lexicon wherein the classifying comprises; receiving a dialog transcript; selecting an utterance in the dialog transcript; identifying n-grams in the utterance; obtaining a polarity score for each n-gram using the trained lexicon; determining the utterance is negative or non-negative based, at least, on the polarity scores for each n-gram; repeating the selecting, identifying, computing, and determining for other utterances in the dialog transcript; and distinguishing the sentiment of the dialog transcript as negative or non-negative based on the negative or non-negative utterances determined in the dialog transcript. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A call center transcription system comprising:
-
a call recorder recording conversations between agents and customers in a call center; a transcriber receiving the recorded conversations and creating dialog transcripts corresponding to the conversations; and a sentiment classifier comprising a processor and a memory, wherein the processor executes software stored in the memory to; receive a dialog transcript from the transcriber; retrieve a trained lexicon from the memory, wherein the trained lexicon is the product of unsupervised machine learning, wherein the unsupervised machine learning comprises; analyzing, using a seed, a set of unannotated dialog transcripts to identify negative words and phrases; adding identified negative words and phrases to the seed to produce an expanded lexicon; analyzing the set of unannotated dialog transcripts using the expanded lexicon to find new negative words and phrases; adding the new negative words and phrases to the expanded lexicon to produce a further expanded lexicon; and repeating the analyzing and adding to produce the machine-generated lexicon that is larger than the seed; select an utterance from the dialog transcript; compute, using the trained lexicon, polarities for each uni-gram, bi-gram, and tri-gram in the utterance; calculate a uni-gram sentiment score, a bi-gram sentiment score, and a tri-gram sentiment score that are based on the sum of the polarity scores for each uni-gram, bi-gram, and tri-gram respectively; determine that the utterance is negative or non-negative based on the uni-gram, bi-gram and tri-gram sentiment scores; repeat selecting, computing, calculating, and determining for other utterances in the dialog transcript; tag the dialog transcript as negative or non-negative based on the determined negative utterances; and store the tagged dialog transcripts to the memory. - View Dependent Claims (18)
-
-
19. A non-transitory computer readable medium containing program instructions that upon execution by a computer processor cause the processor to:
-
receive a training set of dialog transcripts; split the training set into a negative set and a non-negative set based on a seed; identify n-grams in the dialog transcripts; compute, for each n-gram, a polarity score that corresponds to the likelihood of the n-gram having a negative or a non-negative sentiment, wherein the computing the polarity score for a particular n-gram comprises comparing the frequency of the particular n-gram in the negative set to the frequency of the particular n-gram in the non-negative set; identify prominent n-grams based on each n-gram'"'"'s polarity score; expand the lexicon by adding the prominent n-grams, which are not already in the lexicon, to the lexicon; and repeat the splitting, computing, identifying, and expanding for a plurality of iterations to obtain a trained lexicon, wherein the splitting for each iteration uses the expanded lexicon from the previous iteration.
-
Specification