Apparatus for calculating a posterior probability of phoneme symbol, and speech recognition apparatus
First Claim
1. An apparatus for calculating a posteriori probabilities of phoneme symbols, comprising:
- feature extracting means for extracting speech feature parameters from a speech signal of an uttered speech sentence composed of an inputted character series; and
calculating means for calculating a a posteriori probability of a phoneme symbol of the speech signal, by using a bidirectional recurrent neural network,wherein said bidirectional recurrent neural network comprises;
an input layer for receiving, as input signals, the speech feature parameters extracted by the feature extracting means and a plurality of hypothetical phoneme symbol series signals;
an intermediate layer of at least one layer having a plurality of units; and
an output layer for outputting a a posteriori probability of each phoneme symbol,wherein said input layer comprises;
a first input neuron group having a plurality of units, for receiving, as input signals, a plurality of speech feature parameters and a plurality of phoneme symbol series signals;
a forward module; and
a backward module,wherein said forward module has a forward-in-time feedback connection, and generates and outputs to said intermediate layer, a plurality of parameters of a timing delayed by a predetermined unit time from a plurality of parameters outputted from said first input neuron group, based on a plurality of speech feature parameters and a plurality of phoneme symbol series signals, and wherein said backward module has a backward-in-time feedback connection, and generates and outputs to said intermediate layer, a plurality of parameters of a timing inversely delayed by a predetermined unit timing from a plurality of parameters outputted from said first input neuron group, based on a plurality of speech feature parameters.
1 Assignment
0 Petitions
Accused Products
Abstract
There are disclosed an apparatus for calculating a posteriori probabilities of phoneme symbols and a speech recognition apparatus using the apparatus for calculating a posteriori probabilities of phoneme symbols. A feature extracting section extracts speech feature parameters from a speech signal of an uttered speech sentence composed of an inputted character series, and a calculating section calculates a a posteriori probability of a phoneme symbol of the speech signal, by using a bidirectional recurrent neural network. The bidirectional recurrent neural network includes (a) an input layer for receiving the speech feature parameters extracted by the feature extracting means and a plurality of hypothetical phoneme symbol series signals, (b) an intermediate layer of at least one layer having a plurality of units, and (c) an output layer for outputting a a posteriori probability of each phoneme symbol. The input layer includes (a) a first input neuron group having a plurality of units, for receiving a plurality of speech feature parameters and a plurality of phoneme symbol series signals, (b) a forward module, and (c) a backward module.
-
Citations
16 Claims
-
1. An apparatus for calculating a posteriori probabilities of phoneme symbols, comprising:
-
feature extracting means for extracting speech feature parameters from a speech signal of an uttered speech sentence composed of an inputted character series; and calculating means for calculating a a posteriori probability of a phoneme symbol of the speech signal, by using a bidirectional recurrent neural network, wherein said bidirectional recurrent neural network comprises; an input layer for receiving, as input signals, the speech feature parameters extracted by the feature extracting means and a plurality of hypothetical phoneme symbol series signals; an intermediate layer of at least one layer having a plurality of units; and an output layer for outputting a a posteriori probability of each phoneme symbol, wherein said input layer comprises; a first input neuron group having a plurality of units, for receiving, as input signals, a plurality of speech feature parameters and a plurality of phoneme symbol series signals; a forward module; and a backward module, wherein said forward module has a forward-in-time feedback connection, and generates and outputs to said intermediate layer, a plurality of parameters of a timing delayed by a predetermined unit time from a plurality of parameters outputted from said first input neuron group, based on a plurality of speech feature parameters and a plurality of phoneme symbol series signals, and wherein said backward module has a backward-in-time feedback connection, and generates and outputs to said intermediate layer, a plurality of parameters of a timing inversely delayed by a predetermined unit timing from a plurality of parameters outputted from said first input neuron group, based on a plurality of speech feature parameters. - View Dependent Claims (2, 3, 4)
-
-
5. An apparatus for calculating a posteriori probabilities of phoneme symbols, comprising:
-
feature extracting means for extracting speech feature parameters from a speech signal of an uttered speech sentence composed of an inputted character series; and calculating means for calculating a a posteriori probability of a phoneme symbol of the speech signal, by using a bidirectional recurrent neural network, wherein said bidirectional recurrent neural network comprises; an input layer for receiving, as input signals, the speech feature parameters extracted by the feature extracting means and a plurality of hypothetical phoneme symbol series signals; an intermediate layer of at least one layer, having a plurality of units; and an output layer for outputting a a posteriori probability of each phoneme symbol, wherein said input layer comprises; a first input neuron group having a plurality of units, for receiving, as input signals, a plurality of speech feature parameters and a plurality of phoneme symbol series signals; a forward module; and a backward module, wherein the forward module has a forward-in-time feedback connection, and generates and outputs to said intermediate layer, a plurality of parameters of a timing delayed by a predetermined unit timing from a plurality of parameters outputted from said first input neuron group, based on a plurality of speech feature parameters, and wherein said backward module has a backward-in-time feedback connection, and generates and outputs to said intermediate layer, a plurality of parameters of a timing inversely delayed by a predetermined unit timing from a plurality of parameters outputted from said first input neuron group, based on a plurality of speech feature parameters and a plurality of phoneme symbol series signals. - View Dependent Claims (6, 7, 8)
-
-
9. A speech recognition apparatus comprising:
-
feature extracting means for extracting speech feature parameters from a speech signal of an uttered speech sentence composed of an inputted character series; calculating means for calculating a a posteriori probability of a phoneme symbol of the speech signal, by using a bidirectional recurrent neural network; and speech recognition means for selecting, as a detected phoneme, a phoneme symbol having the greatest a posteriori probability out of phoneme symbols having a posteriori probabilities calculated by said calculating means, based on the feature parameters extracted by the feature extracting means, thereby performing speech recognition on the speech signal, wherein said bidirectional recurrent neural network comprises; an input layer for receiving, as input signals, the speech feature parameters extracted by the feature extracting means and a plurality of hypothetical phoneme symbol series signals; an intermediate layer of at least one layer having a plurality of units; and an output layer for outputting a a posteriori probability of each phoneme symbol, wherein said input layer comprises; a first input neuron group having a plurality of units, for receiving, as input signals, a plurality of speech feature parameters and a plurality of phoneme symbol series signals; a forward module; and a backward module, wherein said forward module has a forward-in-time feedback connection, and generates and outputs to said intermediate layer, a plurality of parameters of a timing delayed by a predetermined unit time from a plurality of parameters outputted from said first input neuron group, based on a plurality of speech feature parameters and a plurality of phoneme symbol series signals, and wherein said backward module has a backward-in-time feedback connection, and generates and outputs to said intermediate layer, a plurality of parameters of a timing inversely delayed by a predetermined unit timing from a plurality of parameters outputted from said first input neuron group, based on a plurality of speech feature parameters. - View Dependent Claims (10, 13, 14)
-
-
11. A speech recognition apparatus comprising:
-
feature extracting means for extracting speech feature parameters from a speech signal of an uttered speech sentence composed of an inputted character series; calculating means for calculating a a posteriori probability of a phoneme symbol of the speech signal, by using a bidirectional recurrent neural network; and speech recognition means for selecting, as a detected phoneme, a phoneme symbol having the greatest a posteriori probability out of phoneme symbols having a posteriori probabilities calculated by said calculating means, based on the feature parameters extracted by the feature extracting means, thereby performing speech recognition on the speech signal, wherein said bidirectional recurrent neural network comprises; an input layer for receiving, as input signals, the speech feature parameters extracted by the feature extracting means and a plurality of hypothetical phoneme symbol series signals; an intermediate layer of at least one layer, having a plurality of units; and an output layer for outputting a a posteriori probability of each phoneme symbol, wherein said input layer comprises; a first input neuron group having a plurality of units, for receiving, as input signals, a plurality of speech feature parameters and a plurality of phoneme symbol series signals; a forward module; and a backward module, wherein the forward module has a forward-in-time feedback connection, and generates and outputs to said intermediate layer, a plurality of parameters of a timing delayed by a predetermined unit timing from a plurality of parameters outputted from said first input neuron group, based on a plurality of speech feature parameters, and wherein said backward module has a backward-in-time time feedback connection, and generates and outputs to said intermediate layer, a plurality of parameters of a timing inversely delayed by a predetermined unit timing from a plurality of parameters outputted from said first input neuron group, based on a plurality of speech feature parameters and a plurality of phoneme symbol series signals. - View Dependent Claims (12, 15, 16)
-
Specification