SYSTEM FOR AUTOMATIC EXTRACTION OF STRUCTURE FROM SPOKEN CONVERSATION USING LEXICAL AND ACOUSTIC FEATURES
First Claim
1. A computer-implemented method for extracting structure from a spoken conversation, comprising:
- obtaining, by a computer system comprising a set of processors, a voice record of the spoken conversation;
classifying the voice record into at least three sequential utterances spoken by two different speakers;
extracting a lexical feature from a respective utterance in the voice record using an automatic speech recognition (ASR) method;
extracting a non-verbal acoustic feature from a respective utterance in the voice record; and
determining, via a machine learning method and based on the extracted lexical and acoustic features, a coarse-level conversational structure of the spoken conversation comprising at least a first coarse-level conversational activity associated with two sequential utterances spoken by the two different speakers, and a second coarse-level conversational activity associated with a third utterance.
9 Assignments
0 Petitions
Accused Products
Abstract
Embodiments of the present invention provide a system for automatically extracting conversational structure from a voice record based on lexical and acoustic features. The system also aggregates business-relevant statistics and entities from a collection of spoken conversations. The system may infer a coarse-level conversational structure based on fine-level activities identified from extracted acoustic features. The system improves significantly over previous systems by extracting structure based on lexical and acoustic features. This enables extracting conversational structure on a larger scale and finer level of detail than previous systems, and can feed an analytics and business intelligence platform, e.g. for customer service phone calls. During operation, the system obtains a voice record. The system then extracts a lexical feature using automatic speech recognition (ASR). The system extracts an acoustic feature. The system then determines, via machine learning and based on the extracted lexical and acoustic features, a coarse-level structure of the conversation.
-
Citations
20 Claims
-
1. A computer-implemented method for extracting structure from a spoken conversation, comprising:
-
obtaining, by a computer system comprising a set of processors, a voice record of the spoken conversation; classifying the voice record into at least three sequential utterances spoken by two different speakers; extracting a lexical feature from a respective utterance in the voice record using an automatic speech recognition (ASR) method; extracting a non-verbal acoustic feature from a respective utterance in the voice record; and determining, via a machine learning method and based on the extracted lexical and acoustic features, a coarse-level conversational structure of the spoken conversation comprising at least a first coarse-level conversational activity associated with two sequential utterances spoken by the two different speakers, and a second coarse-level conversational activity associated with a third utterance. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A non-transitory computer-readable storage medium storing instructions that when executed by a computer cause the computer to perform a method for extracting structure from a spoken conversation, the method comprising:
-
obtaining a voice record of the spoken conversation; classifying the voice record into at least three sequential utterances spoken by two different speakers; extracting a lexical feature from a respective utterance in the voice record using an automatic speech recognition (ASR) method; extracting a non-verbal acoustic feature from a respective utterance in the voice record; and determining, via a machine learning method and based on the extracted lexical and acoustic features, a coarse-level conversational structure of the spoken conversation comprising at least a first coarse-level conversational activity associated with two sequential utterances spoken by the two different speakers, and a second coarse-level conversational activity associated with a third utterance. - View Dependent Claims (11, 12, 13, 14, 15)
-
-
16. A computing system for extracting structure from a spoken conversation, the system comprising:
-
a set of processors; and a non-transitory computer-readable medium coupled to the set of processors storing instructions thereon that, when executed by the processors, cause the processors to perform a method for extracting structure from a spoken conversation, the method comprising; obtaining a voice record of the spoken conversation; extracting a lexical feature from the voice record using an automatic speech recognition (ASR) method; extracting an acoustic feature from the voice record; and determining, via a machine learning method and based on the extracted lexical feature and acoustic feature, a coarse-level conversational structure of the spoken conversation. - View Dependent Claims (17, 18, 19, 20)
-
Specification