Apparatus and Method for Speech Recognition

US 20080201136A1
Filed: 09/18/2007
Published: 08/21/2008
Est. Priority Date: 02/19/2007
Status: Active Grant

First Claim

Patent Images

1. A speech recognition apparatus comprising:

a first storing unit configured to store a first acoustic model invariable regardless of speaker and environment;

a second storing unit configured to store second acoustic models that vary in accordance with at least either one of a specific speaker and a specific environment;

a third storing unit configured to store a classification model that has shared parameters with the first acoustic model and non-shared parameters with the first acoustic model to classify the second acoustic models into different groups;

a first recognizing unit configured to calculate a first likelihood with regard to the input speech by applying the first acoustic model to the input speech and obtain calculation result on the shared parameter and a plurality of candidate words that have relatively large values as the first likelihood;

a calculating unit configured to calculate a second likelihood for each of the groups with regard to the input speech by use of the calculation result on the shared parameters and the non-shared parameters of the classification model;

a selecting unit configured to select a group that has a largest value for the second likelihood from among the groups; and

a second recognizing unit configured to recognize the input speech by applying a second acoustic model that belongs to the selected group onto the input speech to calculate a third likelihood for each of the candidate words and obtain as the recognition result a candidate word that has a largest value as the third likelihood.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech recognition apparatus includes a first storing unit configured to store a first acoustic model invariable regardless of speaker and environment, a second storing unit configured to store a classification model that has shared parameters and non-shared parameters with the first acoustic model to classify second acoustic models, a recognizing unit configured to calculate a first likelihood with regard to the input speech by applying the first acoustic model to the input speech and obtain calculation result on the shared parameter and a plurality of candidate words that have relatively large values as the first likelihood, and a calculating unit configured to calculate a second likelihood for each of the groups with regard to the input speech by use of the calculation result on the shared parameters and the non-shared parameters of the classification model.

Citations

12 Claims

1. A speech recognition apparatus comprising:
- a first storing unit configured to store a first acoustic model invariable regardless of speaker and environment;
  
  a second storing unit configured to store second acoustic models that vary in accordance with at least either one of a specific speaker and a specific environment;
  
  a third storing unit configured to store a classification model that has shared parameters with the first acoustic model and non-shared parameters with the first acoustic model to classify the second acoustic models into different groups;
  
  a first recognizing unit configured to calculate a first likelihood with regard to the input speech by applying the first acoustic model to the input speech and obtain calculation result on the shared parameter and a plurality of candidate words that have relatively large values as the first likelihood;
  
  a calculating unit configured to calculate a second likelihood for each of the groups with regard to the input speech by use of the calculation result on the shared parameters and the non-shared parameters of the classification model;
  
  a selecting unit configured to select a group that has a largest value for the second likelihood from among the groups; and
  
  a second recognizing unit configured to recognize the input speech by applying a second acoustic model that belongs to the selected group onto the input speech to calculate a third likelihood for each of the candidate words and obtain as the recognition result a candidate word that has a largest value as the third likelihood.
- View Dependent Claims (2, 3, 4, 7)
- - 2. The apparatus according to claim 1, wherein:
    - the first acoustic model and the classification model are hidden Markov models that have, as output probability distributions, mixed normal distributions in which a plurality of normal distributions are weighted in accordance with weighting factors and combined together, andthe shared parameters include the some parameters of output probability distributions.
  - 3. The apparatus according to claim 1, wherein:
    - the first acoustic model and the classification model are hidden Markov models that have, as output probability distributions, mixed normal distributions in which a plurality of normal distributions are weighted in accordance with weighting factors and combined together,at least one of the output probability distributions has a mean vector and a variance-covariance matrix as the shared parameters; and
      
      the calculating unit multiplies a calculation result obtained from the first speech recognizing unit on a first normal distribution having the shared parameters and a calculation result obtained on a second normal distribution by use of a mean vector and a variance-covariance matrix contained in the non-shared parameters applied on a feature of the input speech, each by the weighting factors, and combines the calculation results into a mixed normal distribution, and multiplies the output of mixed normal distribution by state transition probabilities to obtain the second likelihood.
  - 4. The apparatus according to claim 1, wherein the classification model is a hidden Markov model that has, as an output probability distribution, a mixed normal distribution in which a plurality of normal distributions are weighted in accordance with weighting factors and combined together, has a structure and a state transition probability the same as a structure and a state transition probability of the first acoustic model, and has a mean vector and a variance-covariance matrix as the shared parameters in all output probability distributions.
  - 7. The apparatus according to claim 1, wherein the classification model is a mixed normal distribution model and has the output probability distribution of the first acoustic model and all the distributions as the shared parameters.

5. A speech recognition apparatus comprising:
- a first storing unit configured to store a first acoustic model invariable regardless of speaker and environment;
  
  a second storing unit configured to store second acoustic models that vary in accordance with at least either one of a specific speaker and a specific environment;
  
  a third storing unit configured to store a classification model to classify the second acoustic models into different groups, the classification model being a hidden Markov model that has, as an output probability distribution, a mixed normal distribution in which a plurality of normal distributions are weighted in accordance with weighting factors and combined together, having a structure, a state transition probability, and a mean vector and variance-covariance vector of all output probability distributions as shared parameters with the first acoustic model, and weighting factors of distributions as non-shared parameters with the first acoustic model;
  
  a first recognizing unit configured to recognize input speech by applying the first acoustic model and the non-shared parameters of the classification model to the input speech to obtain a plurality of candidate words having relatively high first likelihoods with respect to the input speech and second likelihood for the groups of the input speech;
  
  a selecting unit configured to select a group that has a largest value for the second likelihood from among the groups; and
  
  a second recognizing unit configured to recognize the input speech by applying a second acoustic model that belongs to the selected group onto the input speech to calculate a third likelihood for each of the candidate words and obtain as the recognition result a candidate word that has a largest value for the third likelihood.
- View Dependent Claims (6)
- - 6. The apparatus according to claim 5, wherein the first speech recognizing unit calculates the output of normal distributions by use of features of the input speech and the shared parameters, multiplies the output of normal distributions by the weighting factors of the first acoustic model to combine, and then multiplies products by state transition probabilities to obtain the first likelihoods, and in parallel, the first speech recognizing unit multiplies weighting factors corresponding to the groups by the normal distributions to combine and then multiplies products by the state transition probabilities to obtain the second likelihood.

8. A speech recognition method comprising:
- storing a first acoustic model invariable regardless of speaker and environment;
  
  storing second acoustic models that vary in accordance with at least either one of a specific speaker and a specific environment;
  
  storing a classification model that has shared parameters with the first acoustic model and non-shared parameters with the first acoustic model to classify the second acoustic models into different groups;
  
  calculating a first likelihood with regard to the input speech by applying the first acoustic model to the input speech and obtaining calculation result on the shared parameter and a plurality of candidate words that have relatively large values as the first likelihood;
  
  calculating a second likelihood for each of the groups with regard to the input speech by use of the calculation result on the shared parameters and the non-shared parameters of the classification model;
  
  selecting a group that has a largest value for the second likelihood from among the groups; and
  
  recognizing the input speech by applying a second acoustic model that belongs to the selected group onto the input speech to calculate a third likelihood for each of the candidate words and obtain as the recognition result a candidate word that has a largest value as the third likelihood.
- View Dependent Claims (9, 10, 11)
- - 9. The method according to claim 8, wherein:
    - the first acoustic model and the classification model are hidden Markov models that have, as output probability distributions, mixed normal distributions in which a plurality of normal distributions are weighted in accordance with weighting factors and combined together, andthe shared parameters include the some parameters of the output probability distributions.
  - 10. The method according to claim 8, wherein:
    - the first acoustic model and the classification model are hidden Markov models that have, as output probability distributions, mixed normal distributions in which a plurality of normal distributions are weighted in accordance with weighting factors and combined together,at least one of the output probability distributions has a mean vector and a variance-covariance matrix as the shared parameters; and
      
      the second likelihood is obtained by multiplying a calculation result of a first normal distribution having the shared parameters and a calculation result obtained on a second normal distribution by use of a mean vector and a variance-covariance matrix contained in the non-shared parameters applied on a feature of the input speech, each by the weighting factors, and combining the calculation results into a mixed normal distribution, and multiplying the output of mixed normal distribution by state transition probabilities.
  - 11. The apparatus according to claim 8, wherein the classification model is a hidden Markov model that has, as an output probability distribution, a mixed normal distribution in which a plurality of normal distributions are weighted in accordance with weighting factors and combined together, has a structure and a state transition probability the same as a structure and a state transition probability of the first acoustic model, and has a mean vector and a variance-covariance matrix as the shared parameters in all output probability distributions.

12. A computer readable storage medium storing instructions of a computer program which when executed by a computer results in performance of steps comprising:
- storing a first acoustic model invariable regardless of speaker and environment;
  
  storing second acoustic models that vary in accordance with at least either one of a specific speaker and a specific environment;
  
  storing a classification model that has shared parameters with the first acoustic model and non-shared parameters with the first acoustic model to classify the second acoustic models into different groups;
  
  calculating a first likelihood with regard to the input speech by applying the first acoustic model to the input speech and obtaining calculation result on the shared parameter and a plurality of candidate words that have relatively large values as the first likelihood;
  
  calculating a second likelihood for each of the groups with regard to the input speech by use of the calculation result on the shared parameters and the non-shared parameters of the classification model;
  
  selecting a group that has a largest value for the second likelihood from among the groups; and
  
  recognizing the input speech by applying a second acoustic model that belongs to the selected group onto the input speech to calculate a third likelihood for each of the candidate words and obtain as the recognition result a candidate word that has a largest value as the third likelihood.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Kabushiki Kaisha Toshiba (Toshiba Corporation)
Original Assignee
Kabushiki Kaisha Toshiba (Toshiba Corporation)
Inventors
Fujimura, Hiroshi, Masuko, Takashi

Granted Patent

US 7,921,012 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/201
CPC Class Codes

G10L 15/32 Multiple recognisers used i...

Apparatus and Method for Speech Recognition

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

12 Claims

Specification

Solutions

Use Cases

Quick Links

Apparatus and Method for Speech Recognition

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

12 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links