AUDIO ANALYSIS SYSTEM, AUDIO ANALYSIS APPARATUS, AUDIO ANALYSIS TERMINAL

US 20130080169A1
Filed: 02/10/2012
Published: 03/28/2013
Est. Priority Date: 09/27/2011
Status: Active Grant

First Claim

Patent Images

1. An audio analysis system comprising:

a terminal apparatus that is to be worn by a user; and

a host system that acquires information from the terminal apparatus,wherein the terminal apparatus includesa first audio acquisition device that acquires a sound and converts the sound into a first audio signal, the sound containing an utterance of the user and an utterance of another person who is different from the user,a discriminator that discriminates between a portion that corresponds to the utterance of the user and a portion that corresponds to the utterance of the other person which are contained in the first audio signal,an utterance feature detector that detects an utterance feature of the user or the other person, on the basis of the portion that corresponds to the utterance of the user or the portion that corresponds to the utterance of the other person, anda transmission unit that transmits to the host system utterance information that contains at least a discrimination result obtained by the discriminator and a detection result obtained by the utterance feature detector, andwherein the host system includesa reception unit that receives the utterance information that has been transmitted from the transmission unit,a conversation information detector that detects a part corresponding to a first conversation between the user and the other person from the utterance information that has been received by the reception unit, and detects portions of the part of the utterance information that correspond to the user and the other person who are related to the first conversation,a relation information holding unit that holds relation information on a relation between a predetermined emotion name and a combination of a plurality of the utterance features of a plurality of speakers who participated in a past conversation,an emotion estimator that compares, with the relation information, a combination of a plurality of the utterance features that correspond to the portions of the part of the utterance information of the user and the other person who are related to the first conversation, and estimates an emotion of at least one of the user and the other person, andan output unit that outputs information that is based on an estimation result obtained by the emotion estimator.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An audio analysis system includes a terminal apparatus and a host system. The terminal apparatus acquires an audio signal of a sound containing utterances of a user and another person, discriminates between portions of the audio signal corresponding to the utterances of the user and the other person, detects an utterance feature based on the portion corresponding to the utterance of the user or the other person, and transmits utterance information including the discrimination and detection results to the host system. The host system detects a part corresponding to a conversation from the received utterance information, detects portions of the part of the utterance information corresponding to the user and the other person, compares a combination of plural utterance features corresponding to the portions of the part of the utterance information of the user and the other person with relation information to estimate an emotion, and outputs estimation information.

64 Citations

View as Search Results

25 Claims

1. An audio analysis system comprising:
- a terminal apparatus that is to be worn by a user; and
  
  a host system that acquires information from the terminal apparatus,wherein the terminal apparatus includesa first audio acquisition device that acquires a sound and converts the sound into a first audio signal, the sound containing an utterance of the user and an utterance of another person who is different from the user,a discriminator that discriminates between a portion that corresponds to the utterance of the user and a portion that corresponds to the utterance of the other person which are contained in the first audio signal,an utterance feature detector that detects an utterance feature of the user or the other person, on the basis of the portion that corresponds to the utterance of the user or the portion that corresponds to the utterance of the other person, anda transmission unit that transmits to the host system utterance information that contains at least a discrimination result obtained by the discriminator and a detection result obtained by the utterance feature detector, andwherein the host system includesa reception unit that receives the utterance information that has been transmitted from the transmission unit,a conversation information detector that detects a part corresponding to a first conversation between the user and the other person from the utterance information that has been received by the reception unit, and detects portions of the part of the utterance information that correspond to the user and the other person who are related to the first conversation,a relation information holding unit that holds relation information on a relation between a predetermined emotion name and a combination of a plurality of the utterance features of a plurality of speakers who participated in a past conversation,an emotion estimator that compares, with the relation information, a combination of a plurality of the utterance features that correspond to the portions of the part of the utterance information of the user and the other person who are related to the first conversation, and estimates an emotion of at least one of the user and the other person, andan output unit that outputs information that is based on an estimation result obtained by the emotion estimator.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The audio analysis system according to claim 1,wherein the terminal apparatus further includesa second audio acquisition device disposed at a position where a sound pressure of an utterance-based sound that arrives from the mouth of the user differs from a sound pressure of the utterance-based sound that arrives at the first audio acquisition device, the second audio acquisition device acquiring the sound and converting the sound into a second audio signal,wherein the discriminator discriminates between a portion that corresponds to the utterance of the user and a portion that corresponds to the utterance of the other person which are contained in the first audio signal, on the basis of a result of comparing the first audio signal with the second audio signal, andwherein the utterance feature detector detects an utterance feature of the user or the other person, on the basis of the portion that corresponds to the utterance of the user or the portion that corresponds to the utterance of the other person which is contained in the first audio signal or the second audio signal.
  - 3. The audio analysis system according to claim 2, wherein the shortest distance between the mouth of the user and the first audio acquisition device differs from the shortest distance between the mouth of the user and the second audio acquisition device in a state where the terminal apparatus is worn by the user.
  - 4. The audio analysis system according to claim 2,wherein the terminal apparatus includesa main body, anda strap that is to be connected to the main body and hung around the neck of the user, andwherein in a state where the strap is hung around the neck of the user,the first audio acquisition device is located in the main body or at part of the strap that is separate from the mouth of the user by approximately 30 to 40 centimeters, andthe second audio acquisition device is located at part of the strap that is separate from the mouth of the user by approximately 10 to 20 centimeters.
  - 5. The audio analysis system according to claim 1, wherein the emotion estimator of the host system determines a probability that corresponds to an index representing an emotion of the user or the other person who is related to the first conversation, and estimates the index representing the emotion on the basis of the probability.
  - 6. The audio analysis system according to claim 5, wherein the output unit of the host system outputs information that is based on the probability of the index representing the emotion that has been estimated by the emotion estimator.
  - 7. The audio analysis system according to claim 1, wherein the utterance feature detector of the terminal apparatus detects the utterance feature on the basis of a feature value of the sound that has been acquired by the first audio acquisition device, the feature value being sound pressure or pitch.
  - 8. The audio analysis system according to claim 2, wherein the utterance feature detector of the terminal apparatus detects the utterance feature on the basis of a feature value of the sound that has been acquired by at least one of the first audio acquisition device and the second audio acquisition device, the feature value being sound pressure or pitch.
  - 9. The audio analysis system according to claim 1, wherein the utterance feature detector of the terminal apparatus detects the utterance feature on the basis of a difference between a feature value of the audio signal of the sound that has been acquired by the first audio acquisition device and an average of predetermined feature values of a plurality of audio signals of sounds that were acquired by the first audio acquisition device during a predetermined past period.
  - 10. The audio analysis system according to claim 2, wherein the utterance feature detector of the terminal apparatus detects the utterance feature on the basis of a difference between a feature value of the audio signal of the sound that has been acquired by at least one of the first audio acquisition device and the second audio acquisition device and an average of predetermined feature values of a plurality of audio signals of sounds that were acquired by at least one of the first audio acquisition device and the second audio acquisition device during a predetermined past period.

11. An audio analysis system comprising:
- a first terminal apparatus that is to be worn by a first user;
  
  a second terminal apparatus that is to be worn by a second user; and
  
  a host system that acquires information from the first terminal apparatus and the second terminal apparatus,wherein the first terminal apparatus includesa first audio acquisition device that acquires a sound and converts the sound into a first audio signal, the sound containing an utterance of the first user and an utterance of another person who is different from the first user,a first discriminator that discriminates between a portion that corresponds to the utterance of the first user and a portion that corresponds to the utterance of the other person which are contained in the first audio signal,a first utterance feature detector that detects a first utterance feature of the first user, on the basis of the portion that corresponds to the utterance of the first user or the portion that corresponds to the utterance of the other person which is contained in the first audio signal, anda first transmission unit that transmits to the host system first utterance information that contains at least a discrimination result obtained by the first discriminator and a detection result regarding the first utterance feature obtained by the first utterance feature detector,wherein the second terminal apparatus includesa second audio acquisition device that acquires a sound and converts the sound into a second audio signal,a second discriminator that discriminates between a portion that corresponds to an utterance of the second user and a portion that corresponds to an utterance of another person who is different from the second user, the portions being contained in the second audio signal,a second utterance feature detector that detects a second utterance feature of the second user, on the basis of the portion that corresponds to the utterance of the second user or the portion that corresponds to the utterance of the other person which is contained in the second audio signal, anda second transmission unit that transmits to the host system second utterance information that contains at least a discrimination result obtained by the second discriminator and a detection result regarding the second utterance feature obtained by the second utterance feature detector, andwherein the host system includesa reception unit that receives the first utterance information and the second utterance information that have been transmitted from the first and second transmission units, respectively,a conversation information detector that detects a first part corresponding to a first conversation between the first user and the other person who is different from the first user from the first utterance information that has been received by the reception unit, and detects portions of the first part of the first utterance information that correspond to the first user and the other person who are related to the first conversation, and that detects a second part corresponding to a second conversation between the second user and the other person who is different from the second user from the second utterance information that has been received by the reception unit, and detects portions of the second part of the second utterance information that correspond to the second user and the other person who are related to the second conversation,wherein the conversation information detector determines whether or not the first conversation and the second conversation are the same conversation between the first user and the second user on the basis of a comparison of the portions of the first part of the first utterance information that correspond to the first user and the other person who is different from the first user with the portions of the second part of the second utterance information that correspond to the second user and the other person who is different from the second user,a relation information holding unit that holds relation information on a relation between a predetermined emotion name and a combination of a plurality of utterance features of a plurality of speakers who participated in a past conversation,an emotion estimator that compares, with the relation information, a combination of the first and second utterance features related to the conversation between the first user and the second user, and estimates an emotion of at least one of the first user and the second user, andan output unit that outputs information that is based on an estimation result obtained by the emotion estimator.
- View Dependent Claims (12, 13, 14)
- - 12. The audio analysis system according to claim 11,wherein the first terminal apparatus further includesa third audio acquisition device disposed at a position where a sound pressure of an utterance-based sound that arrives from the mouth of a user differs from a sound pressure of the utterance-based sound that arrives at the first audio acquisition device, the third audio acquisition device acquiring the sound and converting the sound into a third audio signal,wherein the first discriminator discriminates between a portion that corresponds to an utterance of the user and a portion that corresponds to an utterance of another person who is different from the user, the portions being contained in the first audio signal, on the basis of a result of comparing the first audio signal with the third audio signal, andwherein the first utterance feature detector detects an utterance feature of the user or the other person, on the basis of the portion that corresponds to the utterance of the user or the portion that corresponds to the utterance of the other person which is contained in the first audio signal or the third audio signal.
  - 13. The audio analysis system according to claim 12, wherein the shortest distance between the mouth of the user and the first audio acquisition device differs from the shortest distance between the mouth of the user and the third audio acquisition device in a state where the first terminal apparatus is worn by the user.
  - 14. The audio analysis system according to claim 12,wherein the first terminal apparatus includesa main body, anda strap that is to be connected to the main body and hung around the neck of the user, andwherein in a state where the strap is hung around the neck of the user,one of the first and third audio acquisition devices is located in the main body or at part of the strap that is separate from the mouth of the user by approximately 30 to 40 centimeters, andthe other of the first and third audio acquisition devices is located at part of the strap that is separate from the mouth of the user by approximately 10 to 20 centimeters.

15. An audio analysis apparatus comprising:
- an acquisition unit that acquires information on an utterance feature which is detected on the basis of an audio signal of a sound containing an utterance of a speaker;
  
  a relation information holding unit that holds relation information on a relation between a predetermined emotion name and a plurality of utterance features corresponding to a plurality of parts of utterance information of the speaker;
  
  an emotion estimator that compares, with the relation information, a plurality of utterance features of the speaker related to a specific conversation, and estimates an emotion of the speaker; and
  
  an output unit that outputs information that is based on an estimation result obtained by the emotion estimator.
- View Dependent Claims (16, 17)
- - 16. The audio analysis apparatus according to claim 15, wherein the emotion estimator determines a probability that corresponds to an index representing the emotion of the speaker related to the specific conversation, and estimates the index representing the emotion on the basis of the probability.
  - 17. The audio analysis apparatus according to claim 16, wherein the output unit outputs information that is based on the probability of the index representing the emotion that has been estimated by the emotion estimator.

18. An audio analysis terminal comprising:
- a first audio acquisition device that acquires a sound and converts the sound into a first audio signal, the sound containing an utterance of a user and an utterance of another person who is different from the user;
  
  a discriminator that discriminates between a portion that corresponds to the utterance of the user and a portion that corresponds to the utterance of the other person which are contained in the first audio signal;
  
  an utterance feature detector that detects an utterance feature of the user or the other person, on the basis of the portion that corresponds to the utterance of the user or the portion that corresponds to the utterance of the other person; and
  
  a transmission unit that transmits to a host system utterance information that contains at least a discrimination result obtained by the discriminator and a detection result obtained by the utterance feature detector.
- View Dependent Claims (19, 20, 21, 22, 23, 24, 25)
- - 19. The audio analysis terminal according to claim 18, further comprisinga second audio acquisition device disposed at a position where a sound pressure of an utterance-based sound that arrives from the mouth of the user differs from a sound pressure of the utterance-based sound that arrives at the first audio acquisition device, the second audio acquisition device acquiring the sound and converting the sound into a second audio signal,wherein the discriminator discriminates between a portion that corresponds to the utterance of the user and a portion that corresponds to the utterance of the other person which are contained in the first audio signal, on the basis of a result of comparing the first audio signal with the second audio signal, andwherein the utterance feature detector detects an utterance feature of the user or the other person, on the basis of the portion that corresponds to the utterance of the user or the portion that corresponds to the utterance of the other person which is contained in the first audio signal or the second audio signal.
  - 20. The audio analysis terminal according to claim 19, wherein the shortest distance between the mouth of the user and the first audio acquisition device differs from the shortest distance between the mouth of the user and the second audio acquisition device in a state where the audio analysis terminal is worn by the user.
  - 21. The audio analysis terminal according to claim 19, further comprising:
    - a main body; and
      
      a strap that is to be connected to the main body and hung around the neck of the user, andwherein in a state where the strap is hung around the neck of the user,the first audio acquisition device is located in the main body or at part of the strap that is separate from the mouth of the user by approximately 30 to 40 centimeters, andthe second audio acquisition device is located at part of the strap that is separate from the mouth of the user by approximately 10 to 20 centimeters.
  - 22. The audio analysis terminal according to claim 18, wherein the utterance feature detector detects the utterance feature on the basis of a feature value of the sound that has been acquired by the first audio acquisition device, the feature value being sound pressure or pitch.
  - 23. The audio analysis terminal according to claim 19, wherein the utterance feature detector detects the utterance feature on the basis of a predetermined feature value of the sound that has been acquired by at least one of the first audio acquisition device and the second audio acquisition device, the feature value being sound pressure or pitch.
  - 24. The audio analysis terminal according to claim 18, wherein the utterance feature detector detects the utterance feature on the basis of a difference between a feature value of the audio signal of the sound that has been acquired by the first audio acquisition device and an average of predetermined feature values of a plurality of audio signals of sounds that were acquired by the first audio acquisition device during a predetermined past period.
  - 25. The audio analysis terminal according to claim 19, wherein the utterance feature detector detects the utterance feature on the basis of a difference between a feature value of the audio signal of the sound that has been acquired by at least one of the first audio acquisition device and the second audio acquisition device and an average of predetermined feature values of a plurality of audio signals of sounds that were acquired by at least one of the first audio acquisition device and the second audio acquisition device during a predetermined past period.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Fujifilm Business Innovation Corp. (Fujifilm Holdings Corporation)
Original Assignee
Fuji Xerox Company Limited (Xerox Holdings Corp.)
Inventors
HARADA, Haruo, YONEYAMA, Hirohito, SHIMOTANI, Kei, NISHINO, Yohei, IIDA, Kiyoshi, NAITO, Takao

Granted Patent

US 8,892,424 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/249
CPC Class Codes

G10L 17/00   Speaker identification or v...

G10L 25/63   for estimating an emotional...

H04R 2420/07   Applications of wireless lo...

H04R 3/005   for combining the signals o...

H04R 5/027   Spatial or constructional a...

AUDIO ANALYSIS SYSTEM, AUDIO ANALYSIS APPARATUS, AUDIO ANALYSIS TERMINAL

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

64 Citations

25 Claims

Specification

Solutions

Use Cases

Quick Links

AUDIO ANALYSIS SYSTEM, AUDIO ANALYSIS APPARATUS, AUDIO ANALYSIS TERMINAL

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

64 Citations

25 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links