Multi-dialect speech recognition method and apparatus
First Claim
1. A computer implemented method of processing voiced speech for modeling speech for a segmented population of speakers, each segment formed by use of a dialect and/or by use of dissimilar channels, comprising the steps of:
- inputting and storing oral utterances of words from each segment,physically extracting a set of features from the oral utterances of each segment,compiling a database of acoustic descriptors of said utterances for each segment,establishing a database of probability density functions associated with said utterances,relating said probability density functions with said acoustic descriptors of all said segments, wherein at least one probability density function is related or mapped to at least two acoustic descriptors in at least two respective segments, andcombining said features, said acoustic descriptors and said probability density functions together to determine the best model for each segment.
16 Assignments
0 Petitions
Accused Products
Abstract
Apparatus and method for improving the speed an accuracy of recognition of speech dialects, or speech tansferred via dissimilar channels is described. The invention provides multiple models tailored to specific segments or dialects, and/or speech channels, of the population. However, there is not a proportional increase in recognition time or computing power or computing resources needed. Probability density functions for the acoustic descriptors are provided which are shared among the various models. Since there is a common pool of probability density functions which are mapped or pointed to for the different acoustic descriptors for each different dialect or speech channel model, the memory requirements for the speech recognition apparatus and method are significantly reduced. Each model is comprised of triphonemes which are modelled by discrete probability distribution functions forming hidden Markov models or statistical word models. Any one probability density function is assigned or mapped to many different triphonemes in many different dialects or different models. The invention provides for an automatic selection of the best model in real time wherein the best fit is determined by a voting process.
150 Citations
12 Claims
-
1. A computer implemented method of processing voiced speech for modeling speech for a segmented population of speakers, each segment formed by use of a dialect and/or by use of dissimilar channels, comprising the steps of:
-
inputting and storing oral utterances of words from each segment, physically extracting a set of features from the oral utterances of each segment, compiling a database of acoustic descriptors of said utterances for each segment, establishing a database of probability density functions associated with said utterances, relating said probability density functions with said acoustic descriptors of all said segments, wherein at least one probability density function is related or mapped to at least two acoustic descriptors in at least two respective segments, and combining said features, said acoustic descriptors and said probability density functions together to determine the best model for each segment. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. Apparatus for modelling different segments, each segment formed by use of a dialect and/or by use of dissimilar channels, of speech comprising:
-
a sound transducer for receiving utterances, an analog to digital converter for converting said received utterances into digital signals suitable for computer operations, means for extracting features from said digital signals, means for compiling a database of acoustic descriptors of said utterances for each segment, establishing a database of probability density functions associated with said utterances, means for relating said probability density functions with said acoustic descriptors, where at least two acoustic descriptors in two respective segments are mapped or pointed towards one probability density function, means for combining said features, acoustic descriptors and probability density functions to determine the best model for each segment. - View Dependent Claims (9, 10, 11)
-
-
12. A computer readable memory having stored thereon a composite database of modeled segments of speech, each segment formed by use of a dialect and/or by use of dissimilar channels wherein oral utterances are input into a computer for each segment and physical features of the utterance are extracted, comprising:
-
a first database of acoustic descriptors, a second database of probability density functions common to all said segments, and a pointer database wherein for each segment a pointer relates each of said acoustic descriptors with one probability density function, and wherein at least two of said acoustic descriptors in two different segments point to one probability density function, such that, together with the features, the best model for each segment is established.
-
Specification