Emotion Detection Device and Method for Use in Distributed Systems
First Claim
1. In a method for performing real-time continuous speech recognition distributed across a client device and a server device, and which transfers speech data from an utterance to be recognized using a packet stream of extracted acoustic feature data including at least some cepstral coefficients, the improvement comprising:
- extracting prosodic features at the client device from the utterance to generate extracted prosodic data;
transferring said extracted prosodic data with said extracted acoustic feature data to the server device;
recognizing words spoken continuously in said utterance at said server device based on said extracted acoustic feature data; and
recognizing an emotion state of a speaker of the utterance at said server device based on at least said extracted prosodic data;
wherein operations associated with recognition of prosodic features as well as words in the utterance are distributed across the client device and server device, such that a first set of prosodic recognition operations take place at the client device, and a second set of prosodic recognition operations take place at the server device to recognize said emotion state; and
wherein said prosodic data and acoustic feature data are transmitted using different priorities.
3 Assignments
0 Petitions
Accused Products
Abstract
A prosody analyzer enhances the interpretation of natural language utterances. The analyzer is distributed over a client/server architecture, so that the scope of emotion recognition processing tasks can be allocated on a dynamic basis based on processing resources, channel conditions, client loads etc. The partially processed prosodic data can be sent separately or combined with other speech data from the client device and streamed to a server for a real-time response. Training of the prosody analyzer with real world expected responses improves emotion modeling and the real-time identification of potential features such as emphasis, intent, attitude and semantic meaning in the speaker'"'"'s utterances.
454 Citations
40 Claims
-
1. In a method for performing real-time continuous speech recognition distributed across a client device and a server device, and which transfers speech data from an utterance to be recognized using a packet stream of extracted acoustic feature data including at least some cepstral coefficients, the improvement comprising:
-
extracting prosodic features at the client device from the utterance to generate extracted prosodic data; transferring said extracted prosodic data with said extracted acoustic feature data to the server device; recognizing words spoken continuously in said utterance at said server device based on said extracted acoustic feature data; and recognizing an emotion state of a speaker of the utterance at said server device based on at least said extracted prosodic data; wherein operations associated with recognition of prosodic features as well as words in the utterance are distributed across the client device and server device, such that a first set of prosodic recognition operations take place at the client device, and a second set of prosodic recognition operations take place at the server device to recognize said emotion state; and wherein said prosodic data and acoustic feature data are transmitted using different priorities. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method for performing real-time emotion detection comprising:
-
generating representative speech data from a speech utterance; recognizing words in said utterance; extracting syntactic cues from said words relating to an emotion state of a speaker of said speech utterance; extracting prosodic features from the utterance to generate extracted prosodic data, such that a first set of prosodic recognition operations take place at a client device, and a second set of prosodic recognition operations take place at a server device to recognize said emotion state; and classifying inputs based on said prosodic features and a parts-of speech analyzer relating to said speech utterance and processing the same along with said syntactic cues to output an emotion cue data value corresponding to said emotion state; wherein said prosodic data and representative speech data are transmitted using different priorities. - View Dependent Claims (12, 13, 14, 15)
-
-
16. A real-time emotion detector system, comprising:
-
a prosody analyzer adapted to extract selected acoustic features of a continuous speech utterance; said prosody analyzer being distributed such that a first set of prosody extraction operations take place at a client device, and a second set of prosody extraction operations take place at a server; a continuous speech recognizer for identifying words presented in said continuous speech utterance; a parts-of-speech analyzer adapted to process said words and extract syntactic cues relating to an emotion state of a speaker of said speech utterance; and a classifier adapted to receive inputs from said prosody analyzer and said parts-of-speech analyzer and process the same to output an emotion cue data value corresponding to said emotion state; wherein said prosodic data and acoustic feature data are transmitted using different priorities. - View Dependent Claims (17, 18, 19, 20)
-
-
21. In a system for performing real-time continuous speech recognition which is distributed across a client device and a server device, and which transfers speech data from an utterance to be recognized using a packet stream of extracted acoustic feature data including at least some cepstral coefficients, the improvement comprising:
-
a first routine executing on the client device configured to extract prosodic features from the utterance and to generate extracted prosodic data; a second routine executing on the client device configured to transfer said extracted prosodic data with said extracted acoustic feature data to the server device; a third routine executing on said server device and adapted to recognize words spoken continuously in said utterance based on said extracted acoustic feature data; and a fourth routine executing on the server device configured to recognize an emotion state of a speaker of the utterance based on at least said extracted prosodic data; wherein operations associated with recognition of prosodic features and words in the utterance are distributed across the client device and server device, such that a first set of operations take place at the client device, and a second set of operations take place at the server device to recognize said emotion state; and wherein said prosodic data and acoustic feature data are transmitted using different priorities. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34)
-
-
35. A real-time emotion detector system, comprising:
-
a prosody analyzer adapted to extract selected acoustic features of a continuous speech utterance; said prosody analyzer being distributed such that a first set of prosody extraction operations take place at a client device, and a second set of prosody extraction operations take place at a server; a continuous speech recognizer for identifying words presented in said continuous speech utterance; a parts-of-speech analyzer adapted to process said words and extract syntactic cues relating to an emotion state of a speaker of said speech utterance; and a classifier adapted to receive inputs from said prosody analyzer and said parts-of-speech analyzer and process the same to output an emotion cue data value corresponding to said emotion state; wherein said prosodic data and representative speech data are transmitted using different priorities. - View Dependent Claims (36, 37, 38, 39, 40)
-
Specification