Emotion detection device & method for use in distributed systems

US 20060122834A1
Filed: 12/05/2005
Published: 06/08/2006
Est. Priority Date: 12/03/2004
Status: Abandoned Application

First Claim

Patent Images

1. In a method for performing real-time speech recognition distributed across a client device:

and a server device, and which transfers speech data from an utterance to be recognized using a packet stream of extracted acoustic feature data including at least some cepstral coefficients, the improvement comprising;

extracting prosodic features from the utterance to generate extracted prosodic data;

transferring said extracted prosodic data with said extracted acoustic feature data to the server device;

recognizing an emotion state of a speaker of the utterance based on at least said extracted prosodic data;

wherein operations associated with recognition of prosodic features in the utterance are also distributed across the client device and server device.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A prosody analyzer enhances the interpretation of natural language utterances. The analyzer is distributed over a client/server architecture, so that the scope of emotion recognition processing tasks can be allocated on a dynamic basis based on processing resources, channel conditions, client loads etc. The partially processed prosodic data can be sent separately or combined with other speech data from the client device and streamed to a server for a real-time response. Training of the prosody analyzer with real world expected responses improves emotion modeling and the real-time identification of potential features such as emphasis, intent, attitude and semantic meaning in the speaker'"'"'s utterances.

693 Citations

34 Claims

1. In a method for performing real-time speech recognition distributed across a client device:
- and a server device, and which transfers speech data from an utterance to be recognized using a packet stream of extracted acoustic feature data including at least some cepstral coefficients, the improvement comprising;
  
  extracting prosodic features from the utterance to generate extracted prosodic data;
  
  transferring said extracted prosodic data with said extracted acoustic feature data to the server device;
  
  recognizing an emotion state of a speaker of the utterance based on at least said extracted prosodic data;
  
  wherein operations associated with recognition of prosodic features in the utterance are also distributed across the client device and server device.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein said operations are distributed across the client device and server device on a case-by-case basis.
  - 3. The method of claim 1 further including a parts-of-speech analyzer for identifying a first set of emotion cues based on evaluating a syntax structure of the utterance.
  - 4. The method of claim 1 further including a real-time classifier for identifying the emotion state based on said first set of emotion cues and a second set of emotion cues derived from said extracted prosodic data.
  - 5. The method of claim 1, wherein said prosodic features include data values which are related to one or more acoustic measures including one of PITCH, DURATION &
    - ENERGY.
  - 6. The method of claim 1, wherein said emotion state includes at least one of STRESS &
    - NON-STRESS.
  - 7. The method of claim 1, wherein said emotion state includes at least one of CERTAINTY, UNCERTAINTY and/or DOUBT.

8. A method for performing real-time emotion detection comprising:
- extracting selected acoustic features of a speech utterance;
  
  extracting syntactic cues relating to an emotion state of a speaker of said speech utterance;
  
  classifying inputs from said prosody analyzer and said parts-of-speech analyzer and processing the same to output an emotion cue data value corresponding to said emotion state.

9. A method for training a real-time emotion detector comprising:
- presenting a series of questions to a first group of persons concerning a first topic;
  
  wherein said questions are configured to elicit a plurality of distinct emotion states from said first group of persons;
  
  recording a set of responses from said first group of persons to said series of questions;
  
  annotating said set of responses to include a corresponding emotion state;
  
  training an emotion modeler based on said set of responses and corresponding emotion state annotations;
  
  wherein said emotion modeler is adapted to be used in an emotion detector distributed between a client device and a server device.
- View Dependent Claims (10, 11, 12, 13)
- - 10. The method of claim 9, wherein visual cues are also used to elicit said distinct emotion states.
  - 11. The method of claim 9, wherein said annotations are derived from Kappa statistics associated with a second group of reviewers.
  - 12. The method of claim 9, further including a step:
    - transferring said emotion modeler in electronic form to a client device or a server device.
  - 13. The method of claim 9 further including a step:
    - determining an emotion state of a speaker of an utterance based on said emotion modeler.

14. A real-time emotion detector system comprising:
- a prosody analyzer adapted to extract selected acoustic features of a speech utterance;
  
  a parts-of-speech analyzer adapted to extract syntactic cues relating to an emotion state of a speaker of said speech utterance;
  
  a classifier adapted to receive inputs from said prosody analyzer and said parts-of-speech analyzer and process the same to output an emotion cue data value corresponding to said emotion state.
- View Dependent Claims (15, 16, 17, 18)
- - 15. The system of claim 14 wherein the classifier is a trained Classification and Regression Tree classifier.
  - 16. The system of claim 14 wherein said classifier is trained with data obtained during an off-line training phase.
  - 17. The system of claim 16 wherein said classifier uses a history file containing data values for emotion cues derived from a sample population of test subjects and using a set of sample utterances common to content associated with the real-time recognition system.
  - 18. The system of claim 14 wherein said emotion cue data value is in the form of a data variable suitable for inclusion within a SQL construct.

19. In a system for performing real-time speech recognition which is distributed across a client device and a server device, and which transfers speech data from an utterance to be recognized using a packet stream of extracted acoustic feature data including at least some cepstral coefficients, the improvement comprising:
- a first routine executing on the client device configured to extract prosodic features from the utterance and to generate extracted prosodic data;
  
  a second routine executing on the client device configured to transfer said extracted prosodic data with said extracted acoustic feature data to the server device;
  
  a third routine executing on the server device configured to recognize an emotion state of a speaker of the utterance based on at least said extracted prosodic data;
  
  wherein operations associated with recognition of prosodic features in the utterance are also distributed across the client device and server device.
- View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34)
- - 20. The system of claim 19 further including a fourth routine executing on the server device configured to extract syntax information from the utterance and generate a set of emotion cues which are used by said third routine in combination with said extracted prosodic data to determine said emotion state.
  - 21. The system of claim 19, wherein said emotion state is used to formulate a response by an interactive agent in a real-time natural language processing system.
  - 22. The system of claim 19, wherein said emotion state is used by an interactive agent to control dialog content and/or a dialog sequence with a user of a speech recognition system.
  - 23. The system of claim 19 wherein said emotion state is used to control visual feedback presented to a user of the real-time speech recognition system.
  - 24. The system of claim 19 wherein said emotion state is used to control non-verbal audio feedback presented to a user of the real-time speech recognition system.
  - 25. The system of claim 24 wherein said non-verbal audio feedback is one of a selected set of audio recordings associated with different user emotion states.
  - 26. The system of claim 19, wherein an amount of prosodic data to be transferred to said server device is determined on a case by case basis in accordance with one or more of the following parameters:
    - a) computational capabilities of the respective devices;
      
      b) communications capability of a network coupling the respective devices;
      
      c) loading of said server device;
      
      d) a performance requirement of a speech recognition task associated with a user query.
  - 27. The system of claim 19, wherein both prosodic data and acoustic feature data are packaged within a common data stream as received at the server device.
  - 28. The system of claim 19, wherein prosodic data and acoustic feature data are packaged within different data streams as received at the server device.
  - 29. The system of claim 19, wherein said prosodic data and acoustic feature data are transmitted using different priorities.
  - 30. The system of claim 29, wherein said prosodic data is transmitted with a higher priority than said acoustic feature data.
  - 31. The system of claim 30, wherein said prosodic data is selected and configured to have a data content which is significantly less than said acoustic feature data.
  - 32. The system of claim 19, wherein said prosodic data and acoustic feature data are configured with different payload formats within their respective packets by a transport routine.
  - 33. The system of claim 19, wherein said emotion state is determined by evaluating both individual words and an entire sentence of words uttered by the user.
  - 34. The system of claim 19, further including a calibration routine.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Bennett, Ian M.

Application Number

US11/294,918
Publication Number

US 20060122834A1
Time in Patent Office

Days
Field of Search
US Class Current

704/256
CPC Class Codes

G06F 2203/011   Emotion or mood input deter...

G10L 15/1822   Parsing for meaning underst...

G10L 15/30   Distributed recognition, e....

G10L 17/26   Recognition of special voic...

Emotion detection device & method for use in distributed systems

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

693 Citations

34 Claims

Specification

Solutions

Use Cases

Quick Links

Emotion detection device & method for use in distributed systems

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

693 Citations

34 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links