Voice recognition with dynamic filter bank adjustment based on speaker categorization determined from runtime pitch

US 7,778,831 B2
Filed: 02/21/2006
Issued: 08/17/2010
Est. Priority Date: 02/21/2006
Status: Active Grant

First Claim

Patent Images

1. A method for voice recognition, the method comprising:

obtaining a voice signal for an utterance of a speaker;

determining a runtime pitch from the voice signal for the utterance;

categorizing the speaker as male, female or child based on the runtime pitch;

using the categorization as a basis for dynamically adjusting a maximum frequency f_maxand a minimum frequency f_minof a filter bank used for processing the input utterance to produce an output, and using corresponding gender or age specific acoustic models to perform voice recognition based on the filter bank output.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Voice recognition methods and systems are disclosed. A voice signal is obtained for an utterance of a speaker. A runtime pitch is determined from the voice signal for the utterance. The speaker is categorized based on the runtime pitch and one or more acoustic model parameters are adjusted based on a categorization of the speaker. The parameter adjustment may be performed at any instance of time during the recognition. A voice recognition analysis of the utterance is then performed based on the acoustic model.

133 Citations

13 Claims

1. A method for voice recognition, the method comprising:
- obtaining a voice signal for an utterance of a speaker;
  
  determining a runtime pitch from the voice signal for the utterance;
  
  categorizing the speaker as male, female or child based on the runtime pitch;
  
  using the categorization as a basis for dynamically adjusting a maximum frequency f_maxand a minimum frequency f_minof a filter bank used for processing the input utterance to produce an output, and using corresponding gender or age specific acoustic models to perform voice recognition based on the filter bank output.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13)
- - 2. The method of claim 1 wherein determining the runtime pitch includes determining a moving average pitch p_avg(t) at time t given by $p_{avg}$
    - ( t ) = 1 NP ⁢
      
      ∑
      
      t i ⁢
      
      p ⁡
      
      ( t i ) ⁢
      
      , where the sum is taken over a number NP of pitch measurements taken at times t_iduring a time window.
  - 3. The method of claim 2 wherein each of the pitches p(t_i) is above a predetermined threshold.
  - 4. The method of claim 2 wherein determining the runtime pitch includes a calculation of the type:
    - p_run(t)=c·
      
      p_run(t−
      
      1)+(1−
      
      c)·
      
      p(t), where c is a constant between 0 and 1 and p(t) is a current pitch value at time t.
  - 5. The method of claim 1 wherein categorizing the speaker includes determining the speaker'"'"'s age and/or gender.
  - 6. The method of claim 5 wherein determining the speaker'"'"'s age and/or gender includes determining whether the runtime pitch falls into a range, wherein the range depends on the speakers age and/or gender.
  - 7. The method of claim 5 wherein determining the speaker'"'"'s age and/or gender includes determining from the pitch whether the speaker is a male, female or child speaker.
  - 8. The method of claim 1, further comprising storing the speaker categorization and/or the one or more acoustic model parameters based on the categorization of the speaker, and associating the speaker categorization of the speaker and/or the one or more acoustic model parameters based on the categorization of the speaker with a particular speaker.
  - 9. The method of claim 8, further comprising using the stored speaker categorization and/or the one or more acoustic model parameters based on the categorization of the speaker during a subsequent voice recognition analysis for the speaker.
  - 11. The method of claim 1, wherein the minimum frequency f_minis about 70 Hz and the maximum frequency f_maxis about 3800 Hz if the speaker is categorized as a man.
  - 12. The method of claim 1, wherein the minimum frequency f_minis about 70 Hz and the maximum frequency f_maxis about 4200 Hz if the speaker is categorized as a woman.
  - 13. The method of claim 1, wherein the minimum frequency f_minis about 90 Hz and the maximum frequency f_maxis about 4400 Hz if the speaker is categorized as a child.

10. A voice recognition system, comprising:
- an interface adapted to obtain a voice signal;
  
  one or more processors coupled to the interface; and
  
  a memory coupled to the interface and the processor, the memory having embodied therein a set of processor readable instructions for configured to implement a method for voice recognition, the processor readable instructions including;
  
  an instruction for obtaining a voice signal for an utterance of a speaker;
  
  an instruction for determining a runtime pitch from the voice signal for the utterance;
  
  an instruction for categorizing the speaker as male, female or child based on the runtime pitch;
  
  an instruction for using the categorization as a basis for dynamically adjusting a maximum frequency f_maxand a minimum frequency f_minof a filter bank used for processing the input utterance to produce an output, and using corresponding gender or age specific acoustic models to perform voice recognition based on the filter bank output.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sony Interactive Entertainment Inc. (Sony Group Corp.)
Original Assignee
Sony Computer Entertainment Incorporated (Sony Group Corp.)
Inventors
Chen, Ruxin
Primary Examiner(s)
Hudspeth; David R
Assistant Examiner(s)
GUERRA-ERAZO, EDGAR X

Application Number

US11/358,001
Publication Number

US 20070198263A1
Time in Patent Office

1,638 Days
Field of Search

704/207, 704/208, 704/246, 704/256, 704/256.2, 704/250, 704/255
US Class Current

704/246
CPC Class Codes

G10L 15/065   Adaptation

G10L 17/00   Speaker identification or v...

G10L 25/90   Pitch determination of spee...

Voice recognition with dynamic filter bank adjustment based on speaker categorization determined from runtime pitch

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

133 Citations

13 Claims

Specification

Solutions

Use Cases

Quick Links

Voice recognition with dynamic filter bank adjustment based on speaker categorization determined from runtime pitch

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

133 Citations

13 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links