Emotion detecting method, emotion detecting apparatus, emotion detecting program that implements the same method, and storage medium that stores the same program

US 8,386,257 B2
Filed: 09/13/2007
Issued: 02/26/2013
Est. Priority Date: 09/13/2006
Status: Active Grant

First Claim

Patent Images

1. An emotion detecting method that performs an emotion detecting processing based on an audio feature of input audio signal data, comprising:

an audio feature extracting step of extracting, as an audio feature vector, one or more of a fundamental frequency, a sequence of a temporal variation characteristic of the fundamental frequency, a power, a sequence of a temporal variation characteristic of the power, and a temporal variation characteristic of a speech rate from the audio signal data for each analysis frame, and storing the audio feature vector in a storage part;

an audio feature appearance probability calculating step of reading the audio feature vector for each analysis frame and calculating the audio feature appearance probability that the audio feature vector appears on condition of sequences of predetermined emotional states corresponding to one or more types of emotions using a first statistical model constructed based on previously input learning audio signal data;

an emotional state transition probability calculating step of calculating the probability of temporal transition of sequences of the predetermined emotional states as the emotional state transition probability using a second statistical model;

an emotional state probability calculating step of calculating the emotional state probability based on the audio feature appearance probability and the emotional state transition probability; and

an information outputting step of outputting information about the emotional state for each section including one or more analysis frames based on the calculated emotional state probability.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An audio feature is extracted from audio signal data for each analysis frame and stored in a storage part. Then, the audio feature is read from the storage part, and an emotional state probability of the audio feature corresponding to an emotional state is calculated using one or more statistical models constructed based on previously input learning audio signal data. Then, based on the calculated emotional state probability, the emotional state of a section including the analysis frame is determined.

Citations

21 Claims

1. An emotion detecting method that performs an emotion detecting processing based on an audio feature of input audio signal data, comprising:
- an audio feature extracting step of extracting, as an audio feature vector, one or more of a fundamental frequency, a sequence of a temporal variation characteristic of the fundamental frequency, a power, a sequence of a temporal variation characteristic of the power, and a temporal variation characteristic of a speech rate from the audio signal data for each analysis frame, and storing the audio feature vector in a storage part;
  
  an audio feature appearance probability calculating step of reading the audio feature vector for each analysis frame and calculating the audio feature appearance probability that the audio feature vector appears on condition of sequences of predetermined emotional states corresponding to one or more types of emotions using a first statistical model constructed based on previously input learning audio signal data;
  
  an emotional state transition probability calculating step of calculating the probability of temporal transition of sequences of the predetermined emotional states as the emotional state transition probability using a second statistical model;
  
  an emotional state probability calculating step of calculating the emotional state probability based on the audio feature appearance probability and the emotional state transition probability; and
  
  an information outputting step of outputting information about the emotional state for each section including one or more analysis frames based on the calculated emotional state probability.
- View Dependent Claims (4, 5, 8, 9, 10, 11)
- - 4. The emotion detecting method according to claim 1 or 3, wherein the audio feature appearance probability calculating step is a step of calculating the audio feature appearance probability for a section including one or more analysis frames based on the first statistical model, which is constructed as a conditional appearance probability of an audio feature vector in a case where a sequence of an emotional state is given.
  - 5. The emotion detecting method according to claim 1 or 3, wherein the emotional state transition probability calculating step is a step of calculating the emotional state transition probability based on the second statistical model, which is constructed to determine the probability of appearance of a sequence of an emotional state in a section including one or more analysis frames as a conditional transition probability of the sequence of the emotional state in a current analysis frame on condition that a sequence of an emotional state for a section that precedes said section by at least one analysis frame is given.
  - 8. The emotion detecting method according to claim 1 or 2, further comprising:
    - a step of inputting an audio feature vector and learning audio signal data with a label indicating an emotional state for each analysis frame.
  - 9. The emotion detecting method according to claim 1 or 2, further comprising:
    - a step of determining whether each analysis frame is a speech frame or not, forming a speech section from one or more successive speech frames, forming an audio sub-paragraph from one or more successive speech sections, calculating the emotional level of the emotional state based on the emotional state probability of the analysis frames included in the audio sub-paragraph for each audio sub-paragraph, and extracting a summary of a content from the input audio signal data based on the emotional level.
  - 10. The emotion detecting method according to claim 1 or 2, further comprising:
    - a step of determining the signal periodicity of the audio signal data in units of an analysis frame, forming a section of one or more successive analysis frames based on the periodicity, calculating the emotional level of the emotional state based on the emotional state probability for each section, and extracting a summary of a content from the input audio signal data based on the emotional level.
  - 11. A non-transitory computer-readable recording medium in which a program capable of making a computer implement an emotion detecting method according to claim 1 or 2 is recorded.

2. An emotion detecting method that performs an emotion detecting processing based on an audio feature of input audio signal data, comprising:
- an audio feature extracting step of extracting, as an audio feature vector, one or more of a fundamental frequency, a sequence of a temporal variation characteristic of the fundamental frequency, a power, a sequence of a temporal variation characteristic of the power, and a temporal variation characteristic of a speech rate from the audio signal data for each analysis frame, and storing the audio feature vector in a storage part;
  
  an emotional state probability processing step of reading the audio feature vector for each analysis frame and calculating the emotional state probability on condition of the audio feature vector for sequences of predetermined emotional states corresponding to one or more types of emotions using one or more statistical models constructed based on previously input learning audio signal data;
  
  an emotional state determining step of determining the emotional state of a section including the analysis frame based on the emotional state probability; and
  
  a step of outputting information about the determined emotional state.
- View Dependent Claims (3, 6, 7)
- - 3. The emotion detecting method according to claim 2, wherein the emotional state probability processing step comprises:
    - an audio feature appearance probability calculating step of calculating the audio feature appearance probability that the audio feature vector appears on condition of sequences of the predetermined emotional states using a first statistical model among the one or more statistical models;
      
      an emotional state transition probability calculating step of calculating the probability of temporal transition of sequences of the predetermined emotional states as the emotional state transition probability using a second statistical model among the one or more statistical models; and
      
      a step of calculating the emotional state probability based on the audio feature appearance probability and the emotional state transition probability.
  - 6. The emotion detecting method according to claim 2 or 3, wherein the emotional state determining step comprises a step of selecting emotional states in descending order of the emotional state probability and determining that the selected emotional states are the emotional states of the section including the analysis frame.
  - 7. The emotion detecting method according to claim 2 or 3, wherein the emotional state determining step comprises a step of calculating the difference between the probability of each of the emotional states and a convex combination of the probabilities of other emotional states, selecting predetermined emotional states corresponding to one or more emotions in descending order of the difference, and determining that the selected emotional states are the emotional states of the section including the analysis frame.

12. An emotion detecting apparatus that performs an emotion detecting processing based on an audio feature of input audio signal data, comprising:
- an audio feature extracting means for extracting as, an audio feature vector, one or more of a fundamental frequency, a sequence of a temporal variation characteristic of the fundamental frequency, a power, a sequence of a temporal variation characteristic of the power, and a temporal variation characteristic of a speech rate from the audio signal data for each analysis frame, and storing the audio feature vector in a storage part;
  
  an audio feature appearance probability calculating means for reading the audio feature vector for each analysis frame and calculating the audio feature appearance probability that the audio feature vector appears on condition of sequences of predetermined emotional states corresponding to one or more types of emotions using a first statistical model constructed based on previously input learning audio signal data;
  
  an emotional state transition probability calculating means for calculating the probability of temporal transition of sequences of the predetermined emotional states as the emotional state transition probability using a second statistical model;
  
  an emotional state probability calculating means for calculating the emotional state probability based on the audio feature appearance probability and the emotional state transition probability; and
  
  an information outputting means for outputting information about the emotional state for each section including one or more analysis frames based on the calculated emotional state probability.
- View Dependent Claims (15, 16, 19, 20, 21)
- - 15. The emotion detecting apparatus according to claim 12 or 14, wherein the audio feature appearance probability calculating means calculates the audio feature appearance probability for a section including one or more analysis frames based on the first statistical model, which is constructed as a conditional appearance probability of an audio feature vector in a case where a sequence of an emotional state is given.
  - 16. The emotion detecting apparatus according to claim 12 or 14, wherein the emotional state transition probability calculating means calculates the emotional state transition probability based on the second statistical model, which is constructed to determine the probability of appearance of a sequence of an emotional state in a section including one or more analysis frames as a conditional transition probability of the sequence of the emotional state in a current analysis frame on condition that a sequence of an emotional state for a section that precedes said section by at least one analysis frame is given.
  - 19. The emotion detecting apparatus according to claim 12 or 13, further comprising:
    - an input means for inputting an audio feature vector and learning audio signal data with a label indicating an emotional state for each analysis frame.
  - 20. The emotion detecting apparatus according to claim 12 or 13, further comprising:
    - an extracting means for determining whether each analysis frame is a speech frame or not, forming a speech section from one or more successive speech frames, forming an audio sub-paragraph from one or more successive speech sections, calculating the emotional level of the emotional state based on the emotional state probability of the analysis frames included in the audio sub-paragraph for each audio sub-paragraph, and extracting a summary of a content from the input audio signal data based on the emotional level.
  - 21. The emotion detecting apparatus according to claim 12 or 13, further comprising:
    - an extracting means for determining the signal periodicity of the audio signal data in units of an analysis frame, forming a section of one or more successive analysis frames based on the periodicity, calculating the emotional level of the emotional state based on the emotional state probability for each section, and extracting a summary of a content from the input audio signal data based on the emotional level.

13. An emotion detecting apparatus that performs an emotion detecting processing based on an audio feature of input audio signal data, comprising:
- an audio feature extracting means for extracting, as an audio feature vector, one or more of a fundamental frequency, a sequence of a temporal variation characteristic of the fundamental frequency, a power, a sequence of a temporal variation characteristic of the power, and a temporal variation characteristic of a speech rate from the audio signal data for each analysis frame, and storing the audio feature vector in a storage part;
  
  an emotional state probability processing means for reading the audio feature vector for each analysis frame and calculating the emotional state probability on condition of the audio feature vector for sequences of predetermined emotional states corresponding to one or more types of emotions using one or more statistical models constructed based on previously input learning audio signal data;
  
  an emotional state determining means for determining the emotional state of a section including the analysis frame based on the emotional state probability; and
  
  an information outputting means for outputting information about the determined emotional state.
- View Dependent Claims (14, 17, 18)
- - 14. The emotion detecting apparatus according to claim 13, wherein the emotional state probability processing means comprises:
    - an audio feature appearance probability calculating means for calculating the audio feature appearance probability that the audio feature vector appears on condition of sequences of the predetermined emotional states using a first statistical model as one of the one or more statistical models;
      
      an emotional state transition probability calculating means for calculating the probability of temporal transition of sequences of the predetermined emotional states as the emotional state transition probability using a second statistical model as another one of the one or more statistical models; and
      
      an emotional state probability calculating means for calculating the emotional state probability based on the audio feature appearance probability and the emotional state transition probability.
  - 17. The emotion detecting apparatus according to claim 13 or 14, wherein the emotional state determining means comprises a selecting means for selecting emotional states in descending order of the emotional state probability and determining that the selected emotional states are the emotional states of the section including the analysis frame.
  - 18. The emotion detecting apparatus according to claim 13 or 14, wherein the emotional state determining means comprises a difference calculating means for calculating the difference between the probability of each of the emotional states and a convex combination of the probabilities of the other emotional states, selecting predetermined emotional states corresponding to one or more types of emotions in descending order of the difference, and determining that the selected emotional states are the emotional states of the section including the analysis frame.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nippon Telegraph and Telephone Corporation
Original Assignee
Nippon Telegraph and Telephone Corporation
Inventors
Irie, Go, Satou, Takashi, Taniguchi, Yukinobu, Nakajima, Shinya, Hidaka, Kota
Primary Examiner(s)
ALBERTALLI, BRIAN LOUIS

Application Number

US12/439,051
Publication Number

US 20090265170A1
Time in Patent Office

1,993 Days
Field of Search

None
US Class Current

704/270
CPC Class Codes

G10L 17/26 Recognition of special voic...

Emotion detecting method, emotion detecting apparatus, emotion detecting program that implements the same method, and storage medium that stores the same program

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

Emotion detecting method, emotion detecting apparatus, emotion detecting program that implements the same method, and storage medium that stores the same program

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links