Technique for developing discriminative sound units for speech recognition and allophone modeling
First Claim
Patent Images
1. A method for training a set of speech model decision trees comprising:
- a) constructing a first set of decision trees for a plurality of sound units such that each decision tree references at least two speech models;
b) testing speech models associated with the first set of decision trees to identify sound units that are confused by said speech models and thereby generating a set of confusability data; and
c) constructing a second set of decision trees for said plurality of sound units by using confusability data generated by said step(b) from said first set of decision trees as an input to a tree growing algorithm to select questions such that the probability that a speech model for a first sound unit is confused by a second sound unit is minimized, wherein said first and second set of decision trees being adapted to receive a frame input and generate a probability of said frame input corresponding to a given one of said sound units.
4 Assignments
0 Petitions
Accused Products
Abstract
A set of models is developed to represent sound units and these models are then used with the incorrect sound units to determine which generate high likelihood scores. The models generating high likelihood scores for the incorrect sound units represent those that are more likely to be confused. The resulting confusability data may then be used in generating more discriminative speech models and in subsequent pruning of the acoustic decision tree. The confusability data may also be used to develop confusability predictors used for rejection during search and in developing continuous speech recognition models that are optimized to minimize confusability.
-
Citations
13 Claims
-
1. A method for training a set of speech model decision trees comprising:
-
a) constructing a first set of decision trees for a plurality of sound units such that each decision tree references at least two speech models;
b) testing speech models associated with the first set of decision trees to identify sound units that are confused by said speech models and thereby generating a set of confusability data; and
c) constructing a second set of decision trees for said plurality of sound units by using confusability data generated by said step(b) from said first set of decision trees as an input to a tree growing algorithm to select questions such that the probability that a speech model for a first sound unit is confused by a second sound unit is minimized, wherein said first and second set of decision trees being adapted to receive a frame input and generate a probability of said frame input corresponding to a given one of said sound units. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method for training a set of speech model decision trees, comprising:
-
constructing a first set of decision trees for a plurality of sound units such that each decision tree references at least two speech models;
submitting acoustic speech samples for known sound units to speech models of other sound units, where the speech models are associated with the first set of decision trees;
identifying sound units that are confused by said speech models, thereby generating a set of confusability data; and
constructing a second set of decision trees for said plurality of sound units using said confusability data and the first set of decision trees generated by said step of identifying as an input to a tree growing algorithm to select questions such that the probability that a speech model for a given sound unit is confused by a different sound unit is minimized, wherein said first and second set of decision trees being adapted to receive a frame input and generate a probability of said frame input corresponding to a given one of said sound units. - View Dependent Claims (12)
-
-
11. A method for training a set of speech model decision trees comprising:
-
a) constructing a first set of decision trees for a plurality of sound units such that each decision tree references at least two speech models;
b) testing speech models associated with the first set of decision trees to identify sound units that are confused by said speech models and thereby generating a set of confusability data; and
c) constructing a second set of decision trees for said plurality of sound units by using confusability data as an input to a tree growing algorithm to select questions such that the probability that a speech model for a first sound unit is confused by a second sound unit is minimized, wherein the confusability data is indicative of one or more speech models and sound units that are confused by the one or more speech models, selecting questions for the second set of decision trees by maximizing the criterion
where p(Y|U) is the probability of an acoustic observation, Y, given its speech model, U, and C(Y(U)) is a measure that the acoustic observation, Y, belonging to its speech model, U, is misrecognized.
-
-
13. A method for training a set of speech model decision trees, comprising:
-
constructing a first set of decision trees for a plurality of sound units, each decision tree corresponds to a given sound unit and references at least two speech models for the given sound unit, such that an acoustic speech sample input to a decision tree generates a likelihood score that the acoustic speech sample is the applicable sound unit;
submitting acoustic speech samples for known sound units to speech models of other sound units;
identifying sound units that are confused by said speech models, thereby generating a set of confusability data; and
constructing a second set of decision trees for said plurality of sound units using said confusability data generated by said step of identifying from said first set of decision trees as an input to a tree growing algorithm to select questions such that the probability that a speech model for a given sound unit is confused by a different sound unit is minimized, wherein said first and second set of decision trees being adapted to receive a frame input and generate a probability of said frame input corresponding to a given one of said sound units.
-
Specification