Emotion detection device & method for use in distributed systems
First Claim
1. In a method for performing real-time speech recognition distributed across a client device:
- and a server device, and which transfers speech data from an utterance to be recognized using a packet stream of extracted acoustic feature data including at least some cepstral coefficients, the improvement comprising;
extracting prosodic features from the utterance to generate extracted prosodic data;
transferring said extracted prosodic data with said extracted acoustic feature data to the server device;
recognizing an emotion state of a speaker of the utterance based on at least said extracted prosodic data;
wherein operations associated with recognition of prosodic features in the utterance are also distributed across the client device and server device.
3 Assignments
0 Petitions
Accused Products
Abstract
A prosody analyzer enhances the interpretation of natural language utterances. The analyzer is distributed over a client/server architecture, so that the scope of emotion recognition processing tasks can be allocated on a dynamic basis based on processing resources, channel conditions, client loads etc. The partially processed prosodic data can be sent separately or combined with other speech data from the client device and streamed to a server for a real-time response. Training of the prosody analyzer with real world expected responses improves emotion modeling and the real-time identification of potential features such as emphasis, intent, attitude and semantic meaning in the speaker'"'"'s utterances.
693 Citations
34 Claims
-
1. In a method for performing real-time speech recognition distributed across a client device:
- and a server device, and which transfers speech data from an utterance to be recognized using a packet stream of extracted acoustic feature data including at least some cepstral coefficients, the improvement comprising;
extracting prosodic features from the utterance to generate extracted prosodic data;
transferring said extracted prosodic data with said extracted acoustic feature data to the server device;
recognizing an emotion state of a speaker of the utterance based on at least said extracted prosodic data;
wherein operations associated with recognition of prosodic features in the utterance are also distributed across the client device and server device. - View Dependent Claims (2, 3, 4, 5, 6, 7)
- and a server device, and which transfers speech data from an utterance to be recognized using a packet stream of extracted acoustic feature data including at least some cepstral coefficients, the improvement comprising;
-
8. A method for performing real-time emotion detection comprising:
-
extracting selected acoustic features of a speech utterance;
extracting syntactic cues relating to an emotion state of a speaker of said speech utterance;
classifying inputs from said prosody analyzer and said parts-of-speech analyzer and processing the same to output an emotion cue data value corresponding to said emotion state.
-
-
9. A method for training a real-time emotion detector comprising:
-
presenting a series of questions to a first group of persons concerning a first topic;
wherein said questions are configured to elicit a plurality of distinct emotion states from said first group of persons;
recording a set of responses from said first group of persons to said series of questions;
annotating said set of responses to include a corresponding emotion state;
training an emotion modeler based on said set of responses and corresponding emotion state annotations;
wherein said emotion modeler is adapted to be used in an emotion detector distributed between a client device and a server device. - View Dependent Claims (10, 11, 12, 13)
-
-
14. A real-time emotion detector system comprising:
-
a prosody analyzer adapted to extract selected acoustic features of a speech utterance;
a parts-of-speech analyzer adapted to extract syntactic cues relating to an emotion state of a speaker of said speech utterance;
a classifier adapted to receive inputs from said prosody analyzer and said parts-of-speech analyzer and process the same to output an emotion cue data value corresponding to said emotion state. - View Dependent Claims (15, 16, 17, 18)
-
-
19. In a system for performing real-time speech recognition which is distributed across a client device and a server device, and which transfers speech data from an utterance to be recognized using a packet stream of extracted acoustic feature data including at least some cepstral coefficients, the improvement comprising:
-
a first routine executing on the client device configured to extract prosodic features from the utterance and to generate extracted prosodic data;
a second routine executing on the client device configured to transfer said extracted prosodic data with said extracted acoustic feature data to the server device;
a third routine executing on the server device configured to recognize an emotion state of a speaker of the utterance based on at least said extracted prosodic data;
wherein operations associated with recognition of prosodic features in the utterance are also distributed across the client device and server device. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34)
-
Specification