Speech recognition system and method for generating a mask of the system

US 8,392,185 B2
Filed: 08/19/2009
Issued: 03/05/2013
Est. Priority Date: 08/20/2008
Status: Active Grant

First Claim

Patent Images

1. A speech recognition system comprising:

multiple sound sources;

a sound source separating section which separates mixed speeches from the multiple sound sources; and

at least one processor configured to;

generate a soft mask which can take continuous values between 0 and 1 for each separated speech according to reliability of separation in separating operation of the sound source separating section, andrecognize speeches separated by the sound source separating section using the soft masks,wherein the reliability of separation R(f,t) is defined as

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The speech recognition system of the present invention includes: a sound source separating section which separates mixed speeches from multiple sound sources; a mask generating section which generates a soft mask which can take continuous values between 0 and 1 for each separated speech according to reliability of separation in separating operation of the sound source separating section; and a speech recognizing section which recognizes speeches separated by the sound source separating section using soft masks generated by the mask generating section.

22 Citations

View as Search Results

10 Claims

1. A speech recognition system comprising:
- multiple sound sources;
  
  a sound source separating section which separates mixed speeches from the multiple sound sources; and
  
  at least one processor configured to;
  
  generate a soft mask which can take continuous values between 0 and 1 for each separated speech according to reliability of separation in separating operation of the sound source separating section, andrecognize speeches separated by the sound source separating section using the soft masks,wherein the reliability of separation R(f,t) is defined as
- View Dependent Claims (2, 3, 10)
- - 2. A speech recognition system according to claim 1, wherein the soft masks are determined using a sigmoid function
    1/(1+exp(−
    - a(R−
      
      b))where R represents reliability of separation and a and b represent constants.
  - 3. A speech recognition system according to claim 1, wherein the soft masks are determined using a probability density function of a normal distribution, which has a variable R which represents reliability of separation.
  - 10. A speech recognition system according to claim 1, wherein assuming thatμ
    - 1 and μ
      
      2 (μ
      
      1<
      
      μ
      
      2)indicate mean values,σ
      
      1 and σ
      
      2indicate standard deviations, and R indicates the reliability of separation, the mean values and standard deviationsμ
      
      1, μ
      
      2, σ
      
      1 and σ
      
      2 are estimated by fitting a histogram of the reliability of separation R with a first probability density function of normal distribution f1(R) which has(μ
      
      1,σ
      
      1)and a second probability density function of normal distribution f2(R) which has(μ
      
      2,σ
      
      2)and the soft mask is generated using f1(R), f2(R),μ
      
      1 and μ
      
      2.

4. A method for generating a soft mask for a speech recognition system, the method comprising:
- separating, at a sound source separating section of the speech recognition system, mixed speeches from multiple sound sources;
  
  generating, at a mask generating section of the speech recognition system, a soft mask which can take continuous values between 0 and 1 for each separated speech according to reliability of separation in separating operation of the sound source separating section;
  
  recognizing, at a speech recognizing section of the speech recognition system, speeches separated by the sound source separating section using soft masks generated by the mask generating section, the soft mask being determined using a function of the reliability of separation, which has at least one parameter;
  
  determining a search space of said at least one parameter;
  
  obtaining a speech recognition rate of the speech recognition system while changing a value of the speech recognition system in the search space; and
  
  setting the value which maximizes a speech recognition rate of the speech recognition system to said at least one parameter,wherein the reliability of separation R(f,t) is defined as

5. A method for generating a soft mask for a speech recognition system, the method comprising:
- separating, at a sound source separating section of the speech recognition system, mixed speeches from multiple sound sources;
  
  generating, at a mask generating section of the speech recognition system, a soft mask which can take continuous values between 0 and 1 for each separated speech according to reliability of separation in separating operation of the sound source separating section;
  
  recognizing, at a speech recognizing section of the speech recognition system, speeches separated by the sound source separating section using soft masks generated by the mask generating section, the soft mask being determined using a function of the reliability of separation, which has at least one parameter;
  
  obtaining a histogram of the reliability of separation; and
  
  determining a value of said at least one parameter from a form of the histogram of the reliability of separation,wherein the reliability of separation R(f,t) is defined as
- View Dependent Claims (6, 7, 8, 9)
- - 6. A method for generating a soft mask for a speech recognition system according to claim 5, wherein assuming thatμ
    - 1 and μ
      
      2 (μ
      
      1<
      
      μ
      
      2)indicate mean values andσ
      
      1 and σ
      
      2indicate standard deviations and R indicates reliability of separation, the mean values and standard deviationsμ
      
      1, μ
      
      2, σ
      
      1 and σ
      
      2are estimated by fitting the histogram of reliability of separation R with a first probability density function of normal distribution f1(R) which has(μ
      
      1,σ
      
      1)and a second probability density function of normal distribution f2(R) which has(μ
      
      2,σ
      
      2)and the soft mask is generated using f1(R), f2(R),μ
      
      1 and μ
      
      2.
  - 7. A method for generating a soft mask for a speech recognition system according to claim 6, wherein assuming that a value of the soft mask is S(R) and f(R)=f1(R)+f2(R),S(R)=0 when R<
    - μ
      
      1,S(R)=f2(R)/f(R) when μ
      
      1≦
      
      R≦
      
      μ
      
      2S(R)=1 when μ
      
      2<
      
      R.
  - 8. A method for generating a soft mask for a speech recognition system according to claim 6, wherein assuming that a value of the soft mask is S(R),
  - 9. A method for generating a soft mask for a speech recognition system according to claim 6, wherein a value of R at the intersection of f1(R) and f2(R) which satisfies
    μ
    - 1<
      
      R<
      
      μ
      
      2is set to b and a is determined such that
      1/(1+exp(−
      
      a(R−
      
      b))is fit to
      f2(R)/f(R)and the value of the missing feature mask (MFM) S(R) is determined by
      S(R)=1/(1+exp(−
      
      a(R−
      
      b)).

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Honda Motor Co., Ltd. (Honda Motor Company)
Original Assignee
Honda Motor Co., Ltd. (Honda Motor Company)
Inventors
Nakadai, Kazuhiro, Takahashi, Toru, Okuno, Hiroshi
Primary Examiner(s)
GUERRA-ERAZO, EDGAR X

Application Number

US12/543,759
Publication Number

US 20100082340A1
Time in Patent Office

1,294 Days
Field of Search

704/243, 704/233, 704/234, 704/231, 704/256, 704/255, 704/236, 704/240, 704/239, 704/270, 704/270.1, 704/275, 704/235, 704/246
US Class Current

704/233
CPC Class Codes

G10L 15/20 Speech recognition techniqu...

G10L 21/0272 Voice signal separating

Speech recognition system and method for generating a mask of the system

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

22 Citations

10 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition system and method for generating a mask of the system

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

22 Citations

10 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links