Speech recognition apparatus

US 7,437,288 B2
Filed: 03/11/2002
Issued: 10/14/2008
Est. Priority Date: 03/13/2001
Status: Active Grant

First Claim

Patent Images

1. A speech recognition apparatus using a probability model that employs a mixed distribution, said apparatus comprising:

standard pattern storage means for storing a standard pattern of a Hidden Markov Model (HMM) having a plurality of states;

recognition means for outputting recognition results corresponding to an input speech by using said standard pattern;

standard pattern generating means for inputting learning speech and generating said standard pattern; and

standard pattern adjustment means, provided between said standard pattern generating means and said standard pattern storage means, for adjusting the number of element distributions of said mixed distribution of said standard pattern;

wherein said standard pattern adjustment means comprising;

tree structure generating means for generating a tree structure of element distributions for each state of the HMM, andelement distribution selection means for adjusting the number of element distributions of said mixed distribution of said standard pattern for each state of the HMM by selecting element distributions to leaves in said standard pattern by using said tree structure generated by said tree structure generating means after generation of said tree structure,wherein the standard pattern adjustment means calculates a description length for all possible cuts that can be made on the tree structure of element distributions, and wherein the standard pattern adjustment means selects one of the cuts having a minimum description length, in order to divide the tree structure of element distributions into a top part and a bottom part, separated by the one of the cuts.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech recognition apparatus using a probability model that employs a mixed distribution, the apparatus formed by a standard pattern storage means for storing a standard pattern; a recognition means for outputting recognition results corresponding to an input speech by using the standard pattern; a standard pattern generating means for inputting learning speech and generating the standard pattern; and a standard pattern adjustment means, provided between the standard pattern generating means and the standard pattern storage means, for adjusting the number of element distributions of the mixed distribution of the standard pattern.

16 Citations

View as Search Results

14 Claims

1. A speech recognition apparatus using a probability model that employs a mixed distribution, said apparatus comprising:
- standard pattern storage means for storing a standard pattern of a Hidden Markov Model (HMM) having a plurality of states;
  
  recognition means for outputting recognition results corresponding to an input speech by using said standard pattern;
  
  standard pattern generating means for inputting learning speech and generating said standard pattern; and
  
  standard pattern adjustment means, provided between said standard pattern generating means and said standard pattern storage means, for adjusting the number of element distributions of said mixed distribution of said standard pattern;
  
  wherein said standard pattern adjustment means comprising;
  
  tree structure generating means for generating a tree structure of element distributions for each state of the HMM, andelement distribution selection means for adjusting the number of element distributions of said mixed distribution of said standard pattern for each state of the HMM by selecting element distributions to leaves in said standard pattern by using said tree structure generated by said tree structure generating means after generation of said tree structure,wherein the standard pattern adjustment means calculates a description length for all possible cuts that can be made on the tree structure of element distributions, and wherein the standard pattern adjustment means selects one of the cuts having a minimum description length, in order to divide the tree structure of element distributions into a top part and a bottom part, separated by the one of the cuts.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The speech recognition apparatus according to claim 1, wherein said standard pattern adjustment means having a minimax distribution selection means for selecting element distributions by using a minimax method.
  - 3. The speech recognition apparatus according to claim 1, wherein said element distribution selection means uses an amount of learning data corresponding to each element distribution as a criterion in selection of element distributions.
  - 4. The speech recognition apparatus according to claim 1, wherein said element distribution selection means uses a minimum description length as a criterion in selection of element distributions.
  - 5. The speech recognition apparatus according to claim 4, wherein said minimum description length is computed by said element distribution selection means by computing an equation that corresponds to a sum of:
    - a) a logarithmic likelihood with respect to data, b) a logarithmic value corresponding to a complexity of a selected model, and c) a logarithmic description length necessary for the selected model.
  - 6. The speech recognition apparatus according to claim 5, wherein said minimum description length is computed according to the following equation:
    - $I_{MDL (i)} = - \log P_{\hat{θ} (i)} (X^{N}) + \frac{α_{i}}{2} \log N + \log I,$ wherein α
      
      _iis the dimension of the model i, and θ
      
      (i) is the likelihood prediction value of a free parameter θ
      
      ⁽ⁱ⁾=(θ
      
      _l⁽ⁱ⁾, . . . , θ
      
      α
      
      _l⁽ⁱ⁾) of a model i predicted using data X^N.
  - 7. The speech recognition apparatus according to claim 1, wherein said element distribution selection means uses an Akaike information criterion as a criterion in selection of element distributions.
  - 8. The speech recognition apparatus according to claim 1, wherein said tree structure generating means uses divergence as an inter-distribution distance.
  - 9. The speech recognition apparatus according to claim 1, wherein said tree structure generating means uses a likelihood with respect to learning data as an inter-distribution distance.
  - 10. The speech recognition apparatus according to claim 1, wherein a hidden Markov model is used as said probability model.
  - 11. A speech recognition apparatus according to claim 1, wherein the tree structure generated by said tree structure generating means includes a root node representing a single distribution and leaf nodes representing each of the element distributions, andsaid element distribution selection means comprises:
    - variance determining means for detecting a variance of distributions at each node of the tree structure,wherein the variance of distributions is determined from an occupying frequency and Gaussian distribution parameters of distributions of all leaves which govern the variance.
  - 12. The speech recognition apparatus according to claim 1, wherein a bifurcated tree is obtained due to the dividing of the tree structure of element distributions, wherein the standard pattern adjustment means determines an amount of change in distribution length at a time of expansion from a root node to child nodes, and wherein when the amount of change in distribution length is greater than zero, no expansion of parent nodes in the tree structure is performed, and wherein when the amount of change in distribution length is less than zero, expansion of the parent nodes in the tree structure is performed.

13. A method of controlling a speech recognition apparatus using a probability model that employs a mixed distribution, comprising the steps of:
- storing a standard pattern of a Hidden Markov Model (HMM) having a plurality of states by using a standard pattern storage means;
  
  outputting recognition results corresponding to an input speech with said standard pattern by using a recognition means;
  
  inputting learning speech and generating said standard pattern by using a standard pattern generating means; and
  
  adjusting a number of element distributions of said mixed distribution of said standard pattern by using a standard pattern adjustment means that is provided between said standard pattern generating means and said standard pattern storage means;
  
  generating a tree structure of element distributions for each state of the HMM by using a tree structure generating means; and
  
  wherein the adjusting step comprises adjusting the number of element distributions of said mixed distribution of said standard pattern for each state of the HMM by selecting element distributions to leaves in said standard pattern by using said tree structure generated by said tree structure generating means after generation of said tree structure,wherein the adjusting step further comprises;
  
  calculating a description length for all possible cuts that can be made on the tree structure of element distributions;
  
  selecting one of the cuts having a minimum description length; and
  
  dividing the tree structure of element distributions into a top part and a bottom part, separated by the one of the cuts.
- View Dependent Claims (14)
- - 14. The method according to claim 13, further comprising:
    - obtaining a bifurcated tree by dividing of the tree structure of element distributions;
      
      determining an amount of change in distribution length at a time of expansion from a root node to child nodes;
      
      when the amount of change in distribution length is greater than zero, not expanding parent nodes in the tree structure; and
      
      when the amount of change in distribution length is less than zero, expanding the parent nodes in the tree structure.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
NEC Corporation
Original Assignee
NEC Corporation
Inventors
Shinoda, Koichi
Primary Examiner(s)
Dorvil; Rlchemond
Assistant Examiner(s)
Han; Qi

Application Number

US10/093,915
Publication Number

US 20020184020A1
Time in Patent Office

2,409 Days
Field of Search

704/62, 704/257, 704/243, 704/254, 704/238, 704/240, 704/231, 704/256, 704/256.1, 704/256.2, 704/256.3, 704/256.4, 704/256.5, 704/256.6
US Class Current

704/240
CPC Class Codes

G10L 15/07 to the speaker

G10L 15/144 Training of HMMs

Speech recognition apparatus

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

16 Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition apparatus

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

16 Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links