Apparatus for calculating a posterior probability of phoneme symbol, and speech recognition apparatus

US 6,041,299 A
Filed: 03/11/1998
Issued: 03/21/2000
Est. Priority Date: 03/11/1997
Status: Expired due to Fees

First Claim

Patent Images

1. An apparatus for calculating a posteriori probabilities of phoneme symbols, comprising:

feature extracting means for extracting speech feature parameters from a speech signal of an uttered speech sentence composed of an inputted character series; and

calculating means for calculating a a posteriori probability of a phoneme symbol of the speech signal, by using a bidirectional recurrent neural network,wherein said bidirectional recurrent neural network comprises;

an input layer for receiving, as input signals, the speech feature parameters extracted by the feature extracting means and a plurality of hypothetical phoneme symbol series signals;

an intermediate layer of at least one layer having a plurality of units; and

an output layer for outputting a a posteriori probability of each phoneme symbol,wherein said input layer comprises;

a first input neuron group having a plurality of units, for receiving, as input signals, a plurality of speech feature parameters and a plurality of phoneme symbol series signals;

a forward module; and

a backward module,wherein said forward module has a forward-in-time feedback connection, and generates and outputs to said intermediate layer, a plurality of parameters of a timing delayed by a predetermined unit time from a plurality of parameters outputted from said first input neuron group, based on a plurality of speech feature parameters and a plurality of phoneme symbol series signals, and wherein said backward module has a backward-in-time feedback connection, and generates and outputs to said intermediate layer, a plurality of parameters of a timing inversely delayed by a predetermined unit timing from a plurality of parameters outputted from said first input neuron group, based on a plurality of speech feature parameters.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

There are disclosed an apparatus for calculating a posteriori probabilities of phoneme symbols and a speech recognition apparatus using the apparatus for calculating a posteriori probabilities of phoneme symbols. A feature extracting section extracts speech feature parameters from a speech signal of an uttered speech sentence composed of an inputted character series, and a calculating section calculates a a posteriori probability of a phoneme symbol of the speech signal, by using a bidirectional recurrent neural network. The bidirectional recurrent neural network includes (a) an input layer for receiving the speech feature parameters extracted by the feature extracting means and a plurality of hypothetical phoneme symbol series signals, (b) an intermediate layer of at least one layer having a plurality of units, and (c) an output layer for outputting a a posteriori probability of each phoneme symbol. The input layer includes (a) a first input neuron group having a plurality of units, for receiving a plurality of speech feature parameters and a plurality of phoneme symbol series signals, (b) a forward module, and (c) a backward module.

Citations

16 Claims

1. An apparatus for calculating a posteriori probabilities of phoneme symbols, comprising:
- feature extracting means for extracting speech feature parameters from a speech signal of an uttered speech sentence composed of an inputted character series; and
  
  calculating means for calculating a a posteriori probability of a phoneme symbol of the speech signal, by using a bidirectional recurrent neural network,wherein said bidirectional recurrent neural network comprises;
  
  an input layer for receiving, as input signals, the speech feature parameters extracted by the feature extracting means and a plurality of hypothetical phoneme symbol series signals;
  
  an intermediate layer of at least one layer having a plurality of units; and
  
  an output layer for outputting a a posteriori probability of each phoneme symbol,wherein said input layer comprises;
  
  a first input neuron group having a plurality of units, for receiving, as input signals, a plurality of speech feature parameters and a plurality of phoneme symbol series signals;
  
  a forward module; and
  
  a backward module,wherein said forward module has a forward-in-time feedback connection, and generates and outputs to said intermediate layer, a plurality of parameters of a timing delayed by a predetermined unit time from a plurality of parameters outputted from said first input neuron group, based on a plurality of speech feature parameters and a plurality of phoneme symbol series signals, and wherein said backward module has a backward-in-time feedback connection, and generates and outputs to said intermediate layer, a plurality of parameters of a timing inversely delayed by a predetermined unit timing from a plurality of parameters outputted from said first input neuron group, based on a plurality of speech feature parameters.
- View Dependent Claims (2, 3, 4)
- - 2. The apparatus as claimed in claim 1,wherein said forward module comprises:
    - a second input neuron group having a plurality of units, for receiving, as input signals, a plurality of speech feature parameters and a plurality of phoneme symbol series signals;
      
      a first intermediate neuron group having a plurality of units, for receiving, as input signals, a plurality of parameters outputted from a second intermediate neuron group with a delay of a predetermined unit timing; and
      
      said second intermediate neuron group having a plurality of units, which is connected to said second input neuron group and said first intermediate neuron group so that a plurality of parameters outputted from said second input neuron group and a plurality of parameters outputted from said first intermediate neuron group are multiplied by respective weighting coefficients and respective multiplied values are respectively inputted to the plurality of units of said second intermediate neutron group,wherein said backward module comprises;
      
      a third input neuron group having a plurality of units, for receiving, as input signals, a plurality of speech feature parameters;
      
      a third intermediate neuron group having a plurality of units, for receiving, as input signals, a plurality of parameters outputted from a fourth intermediate neuron group with an inverse delay of a predetermined unit timing; and
      
      said fourth intermediate neuron group having a plurality of units, which is connected to said third input neuron group and said third intermediate neuron group so that a plurality of parameters outputted from said third input neuron group and a plurality of parameters outputted from said third intermediate neuron group are multiplied by respective weighting coefficients and respective multiplied values are respectively inputted to the plurality of units of said fourth intermediate neutron group,wherein said second intermediate neuron group is connected to the plurality of units of said intermediate layer so that a plurality of parameters outputted from said second intermediate neuron group are multiplied by respective weighting coefficients and respective multiplied values are respectively inputted to the plurality of units of said intermediate layer,wherein said first input neuron group is connected to the plurality of units of said intermediate layer so that a plurality of parameters outputted from said first input neuron group are multiplied by respective weighting coefficients and respective multiplied values are respectively inputted to the plurality of units of said intermediate layer,wherein said fourth intermediate neuron group is connected to the plurality of units of said intermediate layer so that a plurality of parameters outputted from said fourth intermediate neuron group are multiplied by respective weighting coefficients and respective multiplied values are respectively inputted to the plurality of units of said intermediate layer, andwherein said intermediate layer is connected to the plurality of units of said output layer so that a plurality of parameters outputted from said intermediate layer are multiplied by respective weighting coefficients and respective multiplied values are respectively inputted to the plurality of units of said output layer.
  - 3. The apparatus as claimed in claim 1, further comprising:
    - encoding means for encoding the plurality of phoneme symbol series signals and outputting the encoded signals to said first, second and third input neuron groups.
  - 4. The apparatus as claimed in claim 2, further comprising:
    - encoding means for encoding the plurality of phoneme symbol series signals and outputting the encoded signals to said first, second and third input neuron groups.

5. An apparatus for calculating a posteriori probabilities of phoneme symbols, comprising:
- feature extracting means for extracting speech feature parameters from a speech signal of an uttered speech sentence composed of an inputted character series; and
  
  calculating means for calculating a a posteriori probability of a phoneme symbol of the speech signal, by using a bidirectional recurrent neural network,wherein said bidirectional recurrent neural network comprises;
  
  an input layer for receiving, as input signals, the speech feature parameters extracted by the feature extracting means and a plurality of hypothetical phoneme symbol series signals;
  
  an intermediate layer of at least one layer, having a plurality of units; and
  
  an output layer for outputting a a posteriori probability of each phoneme symbol,wherein said input layer comprises;
  
  a first input neuron group having a plurality of units, for receiving, as input signals, a plurality of speech feature parameters and a plurality of phoneme symbol series signals;
  
  a forward module; and
  
  a backward module,wherein the forward module has a forward-in-time feedback connection, and generates and outputs to said intermediate layer, a plurality of parameters of a timing delayed by a predetermined unit timing from a plurality of parameters outputted from said first input neuron group, based on a plurality of speech feature parameters, andwherein said backward module has a backward-in-time feedback connection, and generates and outputs to said intermediate layer, a plurality of parameters of a timing inversely delayed by a predetermined unit timing from a plurality of parameters outputted from said first input neuron group, based on a plurality of speech feature parameters and a plurality of phoneme symbol series signals.
- View Dependent Claims (6, 7, 8)
- - 6. The apparatus as claimed in claim 5,wherein said forward module comprises:
    - a second input neuron group having a plurality of units, for receiving, as input signals, a plurality of speech feature parameters;
      
      a first intermediate neuron group having a plurality of units, for receiving, as input signals, a plurality of parameters outputted from a second intermediate neuron group with a delay of a predetermined unit timing; and
      
      said second intermediate neuron group having a plurality of units, which is connected to said second input neuron group and said first intermediate neuron group so that a plurality of parameters outputted from said second input neuron group and a plurality of parameters outputted from said first intermediate neuron group are multiplied by respective weighting coefficients and respective multiplied values are respectively inputted to the plurality of units of said second intermediate neutron group,wherein said backward module comprises;
      
      a third input neuron group having a plurality of units, for receiving, as input signals, a plurality of speech feature parameters and a plurality of phoneme symbol series signals;
      
      a third intermediate neuron group having a plurality of units, for receiving, as input signals, a plurality of parameters outputted from a fourth intermediate neuron group with an inverse delay of a predetermined unit timing; and
      
      said fourth intermediate neuron group having a plurality of units, which is connected to said third input neuron group and said third intermediate neuron group so that a plurality of parameters outputted from said third input neuron group and a plurality of parameters outputted from said third intermediate neuron group are multiplied by respective weighting coefficients and respective multiplied values are respectively inputted to the plurality of units of said fourth intermediate neuron group,wherein said second intermediate neuron group are connected to the plurality of units of said intermediate layer so that a plurality of parameters outputted from said second intermediate neuron group are multiplied by respective weighting coefficients and respective multiplied values are respectively inputted to the plurality of units of said intermediate layer,wherein said first input neuron group are connected to the plurality of units of said intermediate layer so that a plurality of parameters outputted from said first input neuron group are multiplied by respective weighting coefficients and respective multiplied values are respectively inputted to the plurality of units of said intermediate layer,wherein said fourth intermediate neuron group are connected to the plurality of units of said intermediate layer so that a plurality of parameters outputted from said fourth intermediate neuron group are multiplied by respective weighting coefficients and respective multiplied values are respectively inputted to the plurality of units of said intermediate layer, andwherein said intermediate layer are connected to the plurality of units of said output layer so that a plurality of parameters outputted from said intermediate layer are multiplied by respective weighting coefficients and respective multiplied values are respectively inputted to the plurality of units of said output layer.
  - 7. The apparatus as claimed in claim 5, further comprising:
    - encoding means for encoding the plurality of phoneme symbol series signals and outputting the encoded signals to said first, second and third input neuron groups.
  - 8. The apparatus as claimed in claim 6, further comprising:
    - encoding means for encoding the plurality of phoneme symbol series signals and outputting the encoded signals to said first, second and third input neuron groups.

9. A speech recognition apparatus comprising:
- feature extracting means for extracting speech feature parameters from a speech signal of an uttered speech sentence composed of an inputted character series;
  
  calculating means for calculating a a posteriori probability of a phoneme symbol of the speech signal, by using a bidirectional recurrent neural network; and
  
  speech recognition means for selecting, as a detected phoneme, a phoneme symbol having the greatest a posteriori probability out of phoneme symbols having a posteriori probabilities calculated by said calculating means, based on the feature parameters extracted by the feature extracting means, thereby performing speech recognition on the speech signal,wherein said bidirectional recurrent neural network comprises;
  
  an input layer for receiving, as input signals, the speech feature parameters extracted by the feature extracting means and a plurality of hypothetical phoneme symbol series signals;
  
  an intermediate layer of at least one layer having a plurality of units; and
  
  an output layer for outputting a a posteriori probability of each phoneme symbol,wherein said input layer comprises;
  
  a first input neuron group having a plurality of units, for receiving, as input signals, a plurality of speech feature parameters and a plurality of phoneme symbol series signals;
  
  a forward module; and
  
  a backward module,wherein said forward module has a forward-in-time feedback connection, and generates and outputs to said intermediate layer, a plurality of parameters of a timing delayed by a predetermined unit time from a plurality of parameters outputted from said first input neuron group, based on a plurality of speech feature parameters and a plurality of phoneme symbol series signals, andwherein said backward module has a backward-in-time feedback connection, and generates and outputs to said intermediate layer, a plurality of parameters of a timing inversely delayed by a predetermined unit timing from a plurality of parameters outputted from said first input neuron group, based on a plurality of speech feature parameters.
- View Dependent Claims (10, 13, 14)
- - 10. The apparatus as claimed in claim 9,wherein said forward module comprises:
    - a second input neuron group having a plurality of units, for receiving, as input signals, a plurality of speech feature parameters and a plurality of phoneme symbol series signals;
      
      a first intermediate neuron group having a plurality of units, for receiving, as input signals, a plurality of parameters outputted from a second intermediate neuron group with a delay of a predetermined unit timing; and
      
      said second intermediate neuron group having a plurality of units, which is connected to said second input neuron group and said first intermediate neuron group so that a plurality of parameters outputted from said second input neuron group and a plurality of parameters outputted from said first intermediate neuron group are multiplied by respective weighting coefficients and respective multiplied values are respectively inputted to the plurality of units of said second intermediate neutron group,wherein said backward module comprises;
      
      a third input neuron group having a plurality of units, for receiving, as input signals, a plurality of speech feature parameters;
      
      a third intermediate neuron group having a plurality of units, for receiving, as input signals, a plurality of parameters outputted from a fourth intermediate neuron group with an inverse delay of a predetermined unit timing; and
      
      said fourth intermediate neuron group having a plurality of units, which is connected to said third input neuron group and said third intermediate neuron group so that a plurality of parameters outputted from said third input neuron group and a plurality of parameters outputted from said third intermediate neuron group are multiplied by respective weighting coefficients and respective multiplied values are respectively inputted to the plurality of units of said fourth intermediate neutron group,wherein said second intermediate neuron group is connected to the plurality of units of said intermediate layer so that a plurality of parameters outputted from said second intermediate neuron group are multiplied by respective weighting coefficients and respective multiplied values are respectively inputted to the plurality of units of said intermediate layer,wherein said first input neuron group is connected to the plurality of units of said intermediate layer so that a plurality of parameters outputted from said first input neuron group are multiplied by respective weighting coefficients and respective multiplied values are respectively inputted to the plurality of units of said intermediate layer,wherein said fourth intermediate neuron group is connected to the plurality of units of said intermediate layer so that a plurality of parameters outputted from said fourth intermediate neuron group are multiplied by respective weighting coefficients and respective multiplied values are respectively inputted to the plurality of units of said intermediate layer, andwherein said intermediate layer is connected to the plurality of units of said output layer so that a plurality of parameters outputted from said intermediate layer are multiplied by respective weighting coefficients and respective multiplied values are respectively inputted to the plurality of units of said output layer.
  - 13. The apparatus as claimed in claim 9, further comprising:
    - encoding means for encoding the plurality of phoneme symbol series signals and outputting the encoded signals to said first, second and third input neuron groups.
  - 14. The apparatus as claimed in claim 10, further comprising:
    - encoding means for encoding the plurality of phoneme symbol series signals and outputting the encoded signals to said first, second and third input neuron groups.

11. A speech recognition apparatus comprising:
- feature extracting means for extracting speech feature parameters from a speech signal of an uttered speech sentence composed of an inputted character series;
  
  calculating means for calculating a a posteriori probability of a phoneme symbol of the speech signal, by using a bidirectional recurrent neural network; and
  
  speech recognition means for selecting, as a detected phoneme, a phoneme symbol having the greatest a posteriori probability out of phoneme symbols having a posteriori probabilities calculated by said calculating means, based on the feature parameters extracted by the feature extracting means, thereby performing speech recognition on the speech signal,wherein said bidirectional recurrent neural network comprises;
  
  an input layer for receiving, as input signals, the speech feature parameters extracted by the feature extracting means and a plurality of hypothetical phoneme symbol series signals;
  
  an intermediate layer of at least one layer, having a plurality of units; and
  
  an output layer for outputting a a posteriori probability of each phoneme symbol,wherein said input layer comprises;
  
  a first input neuron group having a plurality of units, for receiving, as input signals, a plurality of speech feature parameters and a plurality of phoneme symbol series signals;
  
  a forward module; and
  
  a backward module,wherein the forward module has a forward-in-time feedback connection, and generates and outputs to said intermediate layer, a plurality of parameters of a timing delayed by a predetermined unit timing from a plurality of parameters outputted from said first input neuron group, based on a plurality of speech feature parameters, andwherein said backward module has a backward-in-time time feedback connection, and generates and outputs to said intermediate layer, a plurality of parameters of a timing inversely delayed by a predetermined unit timing from a plurality of parameters outputted from said first input neuron group, based on a plurality of speech feature parameters and a plurality of phoneme symbol series signals.
- View Dependent Claims (12, 15, 16)
- - 12. The apparatus as claimed in claim 11,wherein said forward module comprises:
    - a second input neuron group having a plurality of units, for receiving, as input signals, a plurality of speech feature parameters;
      
      a first intermediate neuron group having a plurality of units, for receiving, as input signals, a plurality of parameters outputted from a second intermediate neuron group with a delay of a predetermined unit timing; and
      
      said second intermediate neuron group having a plurality of units, which is connected to said second input neuron group and said first intermediate neuron group so that a plurality of parameters outputted from said second input neuron group and a plurality of parameters outputted from said first intermediate neuron group are multiplied by respective weighting coefficients and respective multiplied values are respectively inputted to the plurality of units of said second intermediate neutron group,wherein said backward module comprises;
      
      a third input neuron group having a plurality of units, for receiving, as input signals, a plurality of speech feature parameters and a plurality of phoneme symbol series signals;
      
      a third intermediate neuron group having a plurality of units, for receiving, as input signals, a plurality of parameters outputted from a fourth intermediate neuron group with an inverse delay of a predetermined unit timing; and
      
      said fourth intermediate neuron group having a plurality of units, which is connected to said third input neuron group and said third intermediate neuron group so that a plurality of parameters outputted from said third input neuron group and a plurality of parameters outputted from said third intermediate neuron group are multiplied by respective weighting coefficients and respective multiplied values are respectively inputted to the plurality of units of said fourth intermediate neuron group,wherein said second intermediate neuron group are connected to the plurality of units of said intermediate layer so that a plurality of parameters outputted from said second intermediate neuron group are multiplied by respective weighting coefficients and respective multiplied values are respectively inputted to the plurality of units of said intermediate layer,wherein said first input neuron group are connected to the plurality of units of said intermediate layer so that a plurality of parameters outputted from said first input neuron group are multiplied by respective weighting coefficients and respective multiplied values are respectively inputted to the plurality of units of said intermediate layer,wherein said fourth intermediate neuron group are connected to the plurality of units of said intermediate layer so that a plurality of parameters outputted from said fourth intermediate neuron group are multiplied by respective weighting coefficients and respective multiplied values are respectively inputted to the plurality of units of said intermediate layer, andwherein said intermediate layer are connected to the plurality of units of said output layer so that a plurality of parameters outputted from said intermediate layer are multiplied by respective weighting coefficients and respective multiplied values are respectively inputted to the plurality of units of said output layer.
  - 15. The apparatus as claimed in claim 11, further comprising:
    - encoding means for encoding the plurality of phoneme symbol series signals and outputting the encoded signals to said first, second and third input neuron groups.
  - 16. The apparatus as claimed in claim 12, further comprising:
    - encoding means for encoding the plurality of phoneme symbol series signals and outputting the encoded signals to said first, second and third input neuron groups.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
ATR Interpreting Telephony Research Laboratories
Original Assignee
ATR Interpreting Telephony Research Laboratories
Inventors
Fukada, Toshiaki, Schuster, Mike
Primary Examiner(s)
Knepper, David D.

Application Number

US09/038,128
Time in Patent Office

741 Days
Field of Search

704/202, 704/232, 704/259, 706/30, 706/31, 706/41-43, 706/22, 706/21, 706/25
US Class Current

704/232
CPC Class Codes

G10L 15/16 using artificial neural net...

Apparatus for calculating a posterior probability of phoneme symbol, and speech recognition apparatus

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

Apparatus for calculating a posterior probability of phoneme symbol, and speech recognition apparatus

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links