Method of recognizing gender or age of a speaker according to speech emotion or arousal

US 9,123,342 B2
Filed: 07/27/2012
Issued: 09/01/2015
Est. Priority Date: 04/10/2012
Status: Active Grant

First Claim

Patent Images

1. A method of recognizing gender or age of a speaker according to speech emotion or arousal, comprising steps of:

A) segmentalizing speech signals into a plurality of speech segments;

B) fetching the first speech segment from the speech segments to further acquire an arousal degree of the speech segment;

B-1) after the first speech segment is fetched from the speech segments, applying a first classification to the arousal degree of the speech segment to enable the arousal to be classified as a high degree or a low degree of arousal;

C) if a determination condition is set at a greater-than-threshold condition, proceeding the step D) when the arousal degree of the speech segment is determined greater than the specific threshold, or returning to the step B) when the arousal degree of the speech segment is determined less than or equal to the specific threshold; and

if the determination condition is set at a less-than-threshold condition, proceeding to step D) when the arousal degree of the speech segment is determined less than the specific threshold, or returning to the step B) when the arousal degree of the speech segment is determined greater than or equal to the specific threshold;

D) fetching a feature indicative of gender or age from the speech segment to further acquire at least one feature parameter corresponding to gender or age; and

E) applying recognition to the at least one feature parameter according to a gender or age recognition measure to further determine the gender or age of the speaker in the currently-processed speech segment;

next, apply the step B) to the next speech segment, whereinthe steps A)-E) are executed by a computer.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of recognizing gender or age of a speaker according to speech emotion or arousal includes the following steps of A) segmentalizing speech signals into a plurality of speech segments; B) fetching the first speech segment from the plural speech segments to further acquire at least one of emotional features or arousal degree in the speech segment; C) determining whether at least one of the emotional feature and the arousal degree conforms to some condition; if yes, proceed to the step D); if no, return to the step B) and then fetch the next speech segment; D) fetching the feature indicative of gender or age from the speech segment to further acquire at least one feature parameter; and E) recognizing the at least one feature parameter to further determine the gender or age of the speaker at the currently-processed speech segment.

21 Citations

View as Search Results

13 Claims

1. A method of recognizing gender or age of a speaker according to speech emotion or arousal, comprising steps of:
- A) segmentalizing speech signals into a plurality of speech segments;
  
  B) fetching the first speech segment from the speech segments to further acquire an arousal degree of the speech segment;
  
  B-1) after the first speech segment is fetched from the speech segments, applying a first classification to the arousal degree of the speech segment to enable the arousal to be classified as a high degree or a low degree of arousal;
  
  C) if a determination condition is set at a greater-than-threshold condition, proceeding the step D) when the arousal degree of the speech segment is determined greater than the specific threshold, or returning to the step B) when the arousal degree of the speech segment is determined less than or equal to the specific threshold; and
  
  if the determination condition is set at a less-than-threshold condition, proceeding to step D) when the arousal degree of the speech segment is determined less than the specific threshold, or returning to the step B) when the arousal degree of the speech segment is determined greater than or equal to the specific threshold;
  
  D) fetching a feature indicative of gender or age from the speech segment to further acquire at least one feature parameter corresponding to gender or age; and
  
  E) applying recognition to the at least one feature parameter according to a gender or age recognition measure to further determine the gender or age of the speaker in the currently-processed speech segment;
  
  next, apply the step B) to the next speech segment, whereinthe steps A)-E) are executed by a computer.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The method as defined in claim 1, wherein speech signals in the step A) is segmentalized by a segmentation unit.
  - 3. The method as defined in claim 1, wherein in the step B), the first speech segment is fetched by a first acquisition unit and the first classification is done via a first classifier.
  - 4. The method as defined in claim 1, wherein in the step C), the arousal is the presentation of degree of the excitement.
  - 5. The method as defined in claim 1, wherein in the step C), whether the arousal degree of the speech segment is greater or less than a specific threshold is determined by a determination unit.
  - 6. The method as defined in claim 1, wherein in the step D), after at least one feature parameter is acquired, apply a second classification to the at least one feature parameter.
  - 7. The method as defined in claim 6, wherein in the step D), the at least one feature parameter is fetched via a parameter acquisition unit and the second classification is done via a second classifier.
  - 8. The method as defined in claim 6 wherein in the step E), the gender or age recognition measure is based on the at least one feature parameter and then to determine the gender or age of the speaker according to the at least one feature parameter.
  - 9. The method as defined in claim 8, wherein in the step E), when multiple feature parameters are considered, the feature parameters are integrated and used to recognize the gender or age of the speaker.
  - 10. The method as defined in claim 6, wherein in the step D), whether the at least one feature parameter is remarkable or not in time domain or frequency domain is determined by whether it is greater than a specific mean or a specific standard deviation, where the mean and standard deviation of the feature parameter are computed from speech signals of multiple speakers.
  - 11. The method as defined in claim 1, wherein the at least one feature parameter is one of spectral centroid (SC), spectral spread (SS), zero crossing rate (ZCR), duration fast Fourier transformation (FFT) coefficients, jitter, and fundamental frequency (F0);
    - when the at least one feature parameter is plural in number, each of the feature parameters is one of SC, SS, ZCR, FFT coefficients, jitter, and FO and the feature parameters are different from each other.
  - 12. The method as defined in claim 11, wherein SC, SS, FFT coefficients, jitter, and F0 belong to the frequency domain, and ZCR and duration belong to the time domain.
  - 13. The method as defined in claim 11, wherein SC, SS, ZCR, duration, FFT coefficients, and jitter are adopted for age recognition;
    - F0 and FFT coefficients are adopted for gender recognition.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
National Chung Cheng University
Original Assignee
National Chung Cheng University
Inventors
Chen, Oscal Tzyh-Chiang, Lu, Ping-Tsung, Ke, Jia-You
Primary Examiner(s)
ROBERTS, SHAUN A

Application Number

US13/560,596
Publication Number

US 20130268273A1
Time in Patent Office

1,131 Days
Field of Search

704/231, 704/243, 704/246, 704/249, 704/250, 704/270
US Class Current

1/1
CPC Class Codes

G10L 17/26 Recognition of special voic...

G10L 25/63 for estimating an emotional...

Method of recognizing gender or age of a speaker according to speech emotion or arousal

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

21 Citations

13 Claims

Specification

Use Cases

Quick Links

Others

Method of recognizing gender or age of a speaker according to speech emotion or arousal

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

21 Citations

13 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others