Method for deriving acoustic models for use in speech recognition
First Claim
1. A method of deriving an acoustic model of a first class of speech sounds, which acoustic model is to be compared against a portion of speech to be recognized to determine the likelihood that the portion of speech corresponds to that first class of speech sounds, said method including:
- calculating a first statistic of the first class of speech sounds from acoustic data derived from one or more samples of that class of speech sounds;
calculating a second statistic from acoustic data derived from samples of a second class of speech sounds, at least some of which samples correspond to speech sounds which do not belong to the first class of speech sounds for which said acoustic model is being made;
calculating a combined statistic which is a weighted combination of the first and second statistics; and
using said combined statistic as at least a part of said acoustic model of said first class of speech sounds.
1 Assignment
0 Petitions
Accused Products
Abstract
The invention provides a method of deriving generally improved statistical acoustic model of a first class of speech sounds, given a limited amount of sampling data from that first class. This is done by combining a first statistic calculated from samples of that class of speech sounds with a corresponding second statistic calculated from samples of a second, broader, class of speech sounds. Preferably the second statistic is calculated from many more samples than the first statistic, so it has less sampling error that the first statistic, and preferably the second class is a super-set of the first class, so that the second statistic will provide information about the first class. In one embodiment, the invention combines statistics from the models of a plurality of first classes of speech sounds to reduce the sampling error of such statistics and thus improve the accuracy with which such models can be divided into groups of similar models. The first and second statistics can be measurements of spread, of central tendency, or both. They also can relate to different types of parameters, including spectral parameters and parameters representing the duration of speech sounds.
-
Citations
24 Claims
-
1. A method of deriving an acoustic model of a first class of speech sounds, which acoustic model is to be compared against a portion of speech to be recognized to determine the likelihood that the portion of speech corresponds to that first class of speech sounds, said method including:
-
calculating a first statistic of the first class of speech sounds from acoustic data derived from one or more samples of that class of speech sounds; calculating a second statistic from acoustic data derived from samples of a second class of speech sounds, at least some of which samples correspond to speech sounds which do not belong to the first class of speech sounds for which said acoustic model is being made; calculating a combined statistic which is a weighted combination of the first and second statistics; and using said combined statistic as at least a part of said acoustic model of said first class of speech sounds. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A method of deriving an acoustic model of each of a plurality of speech units as spoken by a first group of speakers, which acoustic models are to be compared against a portion of speech to be recognized to determine the likelihood that the portion of speech corresponds to those speech units, said method including;
-
having the one or more speakers from said first group speak one or more utterances of one or more vocabulary words; generating a sequence of acoustic data representing each of such utterances; time aligning the sequence of acoustic data from the utterances of a vocabulary word by said first group of speakers against an acoustic model of that vocabulary word which is comprised of a sequence of acoustic models representing the speech units normally contained in the word, each of which speech unit models includes a second statistic calculated from acoustic data derived from samples of its corresponding speech unit as spoken by a larger second group of speakers; calculating a first statistic from the acoustic data time aligned against each of the acoustic models of a given speech unit; calculating a combined statistic by taking a weighted combination of the first and second statistic for that speech unit; and using said combined statistic as at least a part of said acoustic model of said speech unit. - View Dependent Claims (24)
-
Specification