Unsupervised automated topic detection, segmentation and labeling of conversations
First Claim
1. A method for information processing, comprising:
- receiving in a computer a corpus of recorded conversations, with two or more speakers participating in each conversation;
computing, by the computer, respective frequencies of occurrence of multiple words in each of a plurality of chunks in each of the recorded conversations;
based on the frequencies of occurrence of the words over the conversations in the corpus, deriving autonomously by the computer an optimal set of topics to which the chunks can be assigned such that the optimal set maximizes a likelihood that the chunks will be generated by the topics in the set;
segmenting a recorded conversation from the corpus, using the derived topics into a plurality of segments, such that each segment is classified as belonging to a particular topic in the optimal set; and
outputting a distribution of the segments and respective classifications of the segments into the topics over a duration of the recorded conversation.
2 Assignments
0 Petitions
Accused Products
Abstract
A method for information processing includes receiving in a computer a corpus of recorded conversations, with two or more speakers participating in each conversation. Respective frequencies of occurrence of multiple words in each of a plurality of chunks in each of the recorded conversations are computed. Based on the frequencies of occurrence of the words over the conversations in the corpus, an optimal set of topics to which the chunks can be assigned is derived, such that the optimal set maximizes a likelihood that the chunks will be generated by the topics in the set. A recorded conversation from the corpus is segmented using the derived topics into a plurality of segments, such that each segment is classified as belonging to a particular topic in the optimal set.
31 Citations
42 Claims
-
1. A method for information processing, comprising:
-
receiving in a computer a corpus of recorded conversations, with two or more speakers participating in each conversation; computing, by the computer, respective frequencies of occurrence of multiple words in each of a plurality of chunks in each of the recorded conversations; based on the frequencies of occurrence of the words over the conversations in the corpus, deriving autonomously by the computer an optimal set of topics to which the chunks can be assigned such that the optimal set maximizes a likelihood that the chunks will be generated by the topics in the set; segmenting a recorded conversation from the corpus, using the derived topics into a plurality of segments, such that each segment is classified as belonging to a particular topic in the optimal set; and outputting a distribution of the segments and respective classifications of the segments into the topics over a duration of the recorded conversation. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. An information processing system, comprising:
-
a memory, which is configured to store a corpus of recorded conversations, with two or more speakers participating in each conversation; and a processor, which is configured to compute respective frequencies of occurrence of multiple words in each of a plurality of chunks in each of the recorded conversations, and to derive autonomously, based on the frequencies of occurrence of the words over the conversations in the corpus, an optimal set of topics to which the chunks can be assigned such that the optimal set maximizes a likelihood that any given chunk will be assigned to a single topic in the set, and to segment a recorded conversation from the corpus, using the derived topics into a plurality of segments, such that each segment is classified as belonging to a particular topic in the optimal set, and to output a distribution of the segments and respective classifications of the segments into the topics over a duration of the recorded conversation. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
-
- 29. A computer software product, comprising a non-transitory computer-readable medium in which program instructions are stored, which instructions, when read by a computer, cause the computer to store a corpus of recorded conversations, with two or more speakers participating in each conversation, to compute respective frequencies of occurrence of multiple words in each of a plurality of chunks in each of the recorded conversations, and to derive autonomously, based on the frequencies of occurrence of the words over the conversations in the corpus, an optimal set of topics to which the chunks can be assigned such that the optimal set maximizes a likelihood that any given chunk will be assigned to a single topic in the set, and to segment a recorded conversation from the corpus, using the derived topics into a plurality of segments, such that each segment is classified as belonging to a particular topic in the optimal set, and to output a distribution of the segments and respective classifications of the segments into the topics over a duration of the recorded conversation.
Specification