Apparatus and Method for Speech Recognition
First Claim
1. A speech recognition apparatus comprising:
- a first storing unit configured to store a first acoustic model invariable regardless of speaker and environment;
a second storing unit configured to store second acoustic models that vary in accordance with at least either one of a specific speaker and a specific environment;
a third storing unit configured to store a classification model that has shared parameters with the first acoustic model and non-shared parameters with the first acoustic model to classify the second acoustic models into different groups;
a first recognizing unit configured to calculate a first likelihood with regard to the input speech by applying the first acoustic model to the input speech and obtain calculation result on the shared parameter and a plurality of candidate words that have relatively large values as the first likelihood;
a calculating unit configured to calculate a second likelihood for each of the groups with regard to the input speech by use of the calculation result on the shared parameters and the non-shared parameters of the classification model;
a selecting unit configured to select a group that has a largest value for the second likelihood from among the groups; and
a second recognizing unit configured to recognize the input speech by applying a second acoustic model that belongs to the selected group onto the input speech to calculate a third likelihood for each of the candidate words and obtain as the recognition result a candidate word that has a largest value as the third likelihood.
1 Assignment
0 Petitions
Accused Products
Abstract
A speech recognition apparatus includes a first storing unit configured to store a first acoustic model invariable regardless of speaker and environment, a second storing unit configured to store a classification model that has shared parameters and non-shared parameters with the first acoustic model to classify second acoustic models, a recognizing unit configured to calculate a first likelihood with regard to the input speech by applying the first acoustic model to the input speech and obtain calculation result on the shared parameter and a plurality of candidate words that have relatively large values as the first likelihood, and a calculating unit configured to calculate a second likelihood for each of the groups with regard to the input speech by use of the calculation result on the shared parameters and the non-shared parameters of the classification model.
-
Citations
12 Claims
-
1. A speech recognition apparatus comprising:
-
a first storing unit configured to store a first acoustic model invariable regardless of speaker and environment; a second storing unit configured to store second acoustic models that vary in accordance with at least either one of a specific speaker and a specific environment; a third storing unit configured to store a classification model that has shared parameters with the first acoustic model and non-shared parameters with the first acoustic model to classify the second acoustic models into different groups; a first recognizing unit configured to calculate a first likelihood with regard to the input speech by applying the first acoustic model to the input speech and obtain calculation result on the shared parameter and a plurality of candidate words that have relatively large values as the first likelihood; a calculating unit configured to calculate a second likelihood for each of the groups with regard to the input speech by use of the calculation result on the shared parameters and the non-shared parameters of the classification model; a selecting unit configured to select a group that has a largest value for the second likelihood from among the groups; and a second recognizing unit configured to recognize the input speech by applying a second acoustic model that belongs to the selected group onto the input speech to calculate a third likelihood for each of the candidate words and obtain as the recognition result a candidate word that has a largest value as the third likelihood. - View Dependent Claims (2, 3, 4, 7)
-
-
5. A speech recognition apparatus comprising:
-
a first storing unit configured to store a first acoustic model invariable regardless of speaker and environment; a second storing unit configured to store second acoustic models that vary in accordance with at least either one of a specific speaker and a specific environment; a third storing unit configured to store a classification model to classify the second acoustic models into different groups, the classification model being a hidden Markov model that has, as an output probability distribution, a mixed normal distribution in which a plurality of normal distributions are weighted in accordance with weighting factors and combined together, having a structure, a state transition probability, and a mean vector and variance-covariance vector of all output probability distributions as shared parameters with the first acoustic model, and weighting factors of distributions as non-shared parameters with the first acoustic model; a first recognizing unit configured to recognize input speech by applying the first acoustic model and the non-shared parameters of the classification model to the input speech to obtain a plurality of candidate words having relatively high first likelihoods with respect to the input speech and second likelihood for the groups of the input speech; a selecting unit configured to select a group that has a largest value for the second likelihood from among the groups; and a second recognizing unit configured to recognize the input speech by applying a second acoustic model that belongs to the selected group onto the input speech to calculate a third likelihood for each of the candidate words and obtain as the recognition result a candidate word that has a largest value for the third likelihood. - View Dependent Claims (6)
-
-
8. A speech recognition method comprising:
-
storing a first acoustic model invariable regardless of speaker and environment; storing second acoustic models that vary in accordance with at least either one of a specific speaker and a specific environment; storing a classification model that has shared parameters with the first acoustic model and non-shared parameters with the first acoustic model to classify the second acoustic models into different groups; calculating a first likelihood with regard to the input speech by applying the first acoustic model to the input speech and obtaining calculation result on the shared parameter and a plurality of candidate words that have relatively large values as the first likelihood; calculating a second likelihood for each of the groups with regard to the input speech by use of the calculation result on the shared parameters and the non-shared parameters of the classification model; selecting a group that has a largest value for the second likelihood from among the groups; and recognizing the input speech by applying a second acoustic model that belongs to the selected group onto the input speech to calculate a third likelihood for each of the candidate words and obtain as the recognition result a candidate word that has a largest value as the third likelihood. - View Dependent Claims (9, 10, 11)
-
-
12. A computer readable storage medium storing instructions of a computer program which when executed by a computer results in performance of steps comprising:
-
storing a first acoustic model invariable regardless of speaker and environment; storing second acoustic models that vary in accordance with at least either one of a specific speaker and a specific environment; storing a classification model that has shared parameters with the first acoustic model and non-shared parameters with the first acoustic model to classify the second acoustic models into different groups; calculating a first likelihood with regard to the input speech by applying the first acoustic model to the input speech and obtaining calculation result on the shared parameter and a plurality of candidate words that have relatively large values as the first likelihood; calculating a second likelihood for each of the groups with regard to the input speech by use of the calculation result on the shared parameters and the non-shared parameters of the classification model; selecting a group that has a largest value for the second likelihood from among the groups; and recognizing the input speech by applying a second acoustic model that belongs to the selected group onto the input speech to calculate a third likelihood for each of the candidate words and obtain as the recognition result a candidate word that has a largest value as the third likelihood.
-
Specification