Technique for developing discriminative sound units for speech recognition and allophone modeling

US 6,711,541 B1
Filed: 09/07/1999
Issued: 03/23/2004
Est. Priority Date: 09/07/1999
Status: Expired due to Term

First Claim

Patent Images

1. A method for training a set of speech model decision trees comprising:

a) constructing a first set of decision trees for a plurality of sound units such that each decision tree references at least two speech models;

b) testing speech models associated with the first set of decision trees to identify sound units that are confused by said speech models and thereby generating a set of confusability data; and

c) constructing a second set of decision trees for said plurality of sound units by using confusability data generated by said step(b) from said first set of decision trees as an input to a tree growing algorithm to select questions such that the probability that a speech model for a first sound unit is confused by a second sound unit is minimized, wherein said first and second set of decision trees being adapted to receive a frame input and generate a probability of said frame input corresponding to a given one of said sound units.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A set of models is developed to represent sound units and these models are then used with the incorrect sound units to determine which generate high likelihood scores. The models generating high likelihood scores for the incorrect sound units represent those that are more likely to be confused. The resulting confusability data may then be used in generating more discriminative speech models and in subsequent pruning of the acoustic decision tree. The confusability data may also be used to develop confusability predictors used for rejection during search and in developing continuous speech recognition models that are optimized to minimize confusability.

Citations

13 Claims

1. A method for training a set of speech model decision trees comprising:
- a) constructing a first set of decision trees for a plurality of sound units such that each decision tree references at least two speech models;
  
  b) testing speech models associated with the first set of decision trees to identify sound units that are confused by said speech models and thereby generating a set of confusability data; and
  
  c) constructing a second set of decision trees for said plurality of sound units by using confusability data generated by said step(b) from said first set of decision trees as an input to a tree growing algorithm to select questions such that the probability that a speech model for a first sound unit is confused by a second sound unit is minimized, wherein said first and second set of decision trees being adapted to receive a frame input and generate a probability of said frame input corresponding to a given one of said sound units.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1 wherein among said first and second set of decision trees only said first set of decision trees is constructed by selecting questions for a given sound unit such that the maximum likelihood criterion, the entropy criterion, or the Gini criterion is satisfied.
  - 3. The method of claim 1 wherein said testing step is performed by supplying an example of speech to the speech models for at least one sound unit that is known not to represent said speech example and determining the likelihood that said speech models would generate said example of speech.
  - 4. The method of claim 1 further comprising iteratively repeating steps b) and c).
  - 5. The method of claim 1 wherein said sound units are phonemes and wherein said speech models represent different allophones of a given phoneme.
  - 6. The method of claim 1 wherein said second set of decision trees is constructed such that greater discrimination among sound units is provided for those sound units that have a predetermined high probability of confusing said speech models.
  - 7. The method of claim 1 wherein said second set of decision trees is constructed using information about a first sound unit in constructing the decision tree for a second sound unit.
  - 8. The method of claim 1 wherein said sound units are phoneme Hidden Markov Model (HMM) states.
  - 9. The method of claim 1 wherein the confusability data is indicative of one or more speech models and sound units that are confused by the one or more speech models.

10. A method for training a set of speech model decision trees, comprising:
- constructing a first set of decision trees for a plurality of sound units such that each decision tree references at least two speech models;
  
  submitting acoustic speech samples for known sound units to speech models of other sound units, where the speech models are associated with the first set of decision trees;
  
  identifying sound units that are confused by said speech models, thereby generating a set of confusability data; and
  
  constructing a second set of decision trees for said plurality of sound units using said confusability data and the first set of decision trees generated by said step of identifying as an input to a tree growing algorithm to select questions such that the probability that a speech model for a given sound unit is confused by a different sound unit is minimized, wherein said first and second set of decision trees being adapted to receive a frame input and generate a probability of said frame input corresponding to a given one of said sound units.
- View Dependent Claims (12)
- - 12. The method of claim 10 further comprises calculating a confusion coefficient, C(Y(U)), as follows $C$
    - (Y
      
      (U))=(1/p
      
      (U))*∑
      
      U′
      
      ≠
      
      U
      
      p
      
      (Y|U)*p
      
      (U′
      
      ).

11. A method for training a set of speech model decision trees comprising:
- a) constructing a first set of decision trees for a plurality of sound units such that each decision tree references at least two speech models;
  
  b) testing speech models associated with the first set of decision trees to identify sound units that are confused by said speech models and thereby generating a set of confusability data; and
  
  c) constructing a second set of decision trees for said plurality of sound units by using confusability data as an input to a tree growing algorithm to select questions such that the probability that a speech model for a first sound unit is confused by a second sound unit is minimized, wherein the confusability data is indicative of one or more speech models and sound units that are confused by the one or more speech models, selecting questions for the second set of decision trees by maximizing the criterion $\prod_{Y} p (Y | U) / [p (Y | U) + C (Y (U))]$
  
  where p(Y|U) is the probability of an acoustic observation, Y, given its speech model, U, and C(Y(U)) is a measure that the acoustic observation, Y, belonging to its speech model, U, is misrecognized.

13. A method for training a set of speech model decision trees, comprising:
- constructing a first set of decision trees for a plurality of sound units, each decision tree corresponds to a given sound unit and references at least two speech models for the given sound unit, such that an acoustic speech sample input to a decision tree generates a likelihood score that the acoustic speech sample is the applicable sound unit;
  
  submitting acoustic speech samples for known sound units to speech models of other sound units;
  
  identifying sound units that are confused by said speech models, thereby generating a set of confusability data; and
  
  constructing a second set of decision trees for said plurality of sound units using said confusability data generated by said step of identifying from said first set of decision trees as an input to a tree growing algorithm to select questions such that the probability that a speech model for a given sound unit is confused by a different sound unit is minimized, wherein said first and second set of decision trees being adapted to receive a frame input and generate a probability of said frame input corresponding to a given one of said sound units.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sovereign Peak Ventures, LLC (Dominion Harbor Enterprises, LLC)
Original Assignee
Matsushita Electric Industrial Company Limited (Panasonic Holdings Corporation)
Inventors
Contolini, Matteo, Junqua, Jean-Claude, Kuhn, Roland
Primary Examiner(s)
Chawan, Vijay
Assistant Examiner(s)
Azad, Abul K.

Application Number

US09/390,434
Time in Patent Office

1,659 Days
Field of Search

704/236, 704/237, 704/238, 704/239, 704/240, 704/241, 704/242, 704/243, 704/244, 704/245, 704/254, 704/255, 704/256
US Class Current

704/242
CPC Class Codes

G10L 15/063 Training

G10L 2015/025 Phonemes, fenemes or fenone...

Technique for developing discriminative sound units for speech recognition and allophone modeling

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

13 Claims

Specification

Solutions

Use Cases

Quick Links

Technique for developing discriminative sound units for speech recognition and allophone modeling

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

13 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links