Optimized speech recognition system and method
First Claim
Patent Images
1. A method of modeling the sounds produced by speaking at least first and second portions of speech, said method comprising the steps of:
- uttering at least the first portion of speech N times in a time interval having a series of successive subintervals, where N is an integer greater than or equal to one;
measuring the value of at least one feature of the utterance of the first portion of speech during each of the series of successive subintervals to produce a series of feature vector signals representing the feature values;
estimating the expected number of occurrences of the first portion of speech in the time interval as a combination of the values for each subinterval of a first model function of the measured value of the feature of the utterance of the first portion of speech, said first model function having at least a first parameter having an initial value;
estimating the expected number of occurrences of the second portion of speech in the time interval as a combination of the values for each subinterval of a second model function of the measured value of the feature of the utterance of the first portion of speech, said second model function having at least a second parameter having an initial value;
estimating the probability of exactly N occurrences of the first portion of speech in the time interval given the estimated expected number of occurrences of the first portion of speech;
estimating the probability of exactly zero occurrences of the second portion of speech in the time interval given the estimated expected number of occurrences of the second portion of speech;
calculating revised values of the first and second parameters to improve the value of an objective function comprising a combination of at least the estimated probability of exactly N occurrences of the first portion of speech and the estimated probability of exactly zero occurrences of the second portion of speech;
modeling the first portion of speech with the first model function with the revised value of the first parameter; and
modeling the second portion of speech with the second model function with the revised value of the second parameter.
0 Assignments
0 Petitions
Accused Products
Abstract
A speech recognition system estimates a set of Poisson intensities for a spoken word, each intensity representing a respectively different word from a vocabulary of words. Each of the functions used to calculate these intensities has two variable parameter values. In a training mode, the system changes the values of the respective variable parameters to optimize the likelihood that the results predicted by the estimates correspond to the actual spoken words. These optimized parameter values are then used by the system, in an operational mode, to recognize spoken words.
41 Citations
8 Claims
-
1. A method of modeling the sounds produced by speaking at least first and second portions of speech, said method comprising the steps of:
-
uttering at least the first portion of speech N times in a time interval having a series of successive subintervals, where N is an integer greater than or equal to one; measuring the value of at least one feature of the utterance of the first portion of speech during each of the series of successive subintervals to produce a series of feature vector signals representing the feature values; estimating the expected number of occurrences of the first portion of speech in the time interval as a combination of the values for each subinterval of a first model function of the measured value of the feature of the utterance of the first portion of speech, said first model function having at least a first parameter having an initial value; estimating the expected number of occurrences of the second portion of speech in the time interval as a combination of the values for each subinterval of a second model function of the measured value of the feature of the utterance of the first portion of speech, said second model function having at least a second parameter having an initial value; estimating the probability of exactly N occurrences of the first portion of speech in the time interval given the estimated expected number of occurrences of the first portion of speech; estimating the probability of exactly zero occurrences of the second portion of speech in the time interval given the estimated expected number of occurrences of the second portion of speech; calculating revised values of the first and second parameters to improve the value of an objective function comprising a combination of at least the estimated probability of exactly N occurrences of the first portion of speech and the estimated probability of exactly zero occurrences of the second portion of speech; modeling the first portion of speech with the first model function with the revised value of the first parameter; and modeling the second portion of speech with the second model function with the revised value of the second parameter. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
5. A method as claimed in claim 4, characterized in that the probability of exactly ni occurrences of a word Wi in a time interval T having subintervals Δ
- t given the estimated expected number of occurrences of the word is estimated from a function of the form ##EQU8##
-
6. A method as claimed in claim 2, characterized in that the objective function comprises the product of at least the estimated probability of exactly N occurrences of the first portion of speech and the estimated probability of exactly zero occurrences of the second portion of speech.
-
7. A method as claimed in claim 2, characterized in that the objective function comprises the sum of the logarithms of at least the estimated probability of exactly N occurrences of the first portion of speech and the estimated probability of exactly zero occurrences of the second portion of speech.
-
-
8. A method of modeling the sounds produced by speaking at least a first portion of speech, said method comprising the steps of:
-
uttering at least the first portion of speech N times in a time interval having a series of successive subintervals, where N is an integer greater than or equal to one; measuring the value of at least one feature of the utterance of the first portion of speech during each of the series of successive subintervals to produce a series of feature vector signals representing the feature values; estimating the expected number of occurrences of the first portion of speech in the time interval as a combination of the values for each subinterval of a first model function of the measured value of the feature of the utterance of the first portion of speech, said first model function having at least a first parameter having an initial value; estimating the probability of exactly N occurrences of the first portion of speech in the time interval given the estimated expected number of occurrences of the first portion of speech; calculating a revised value of the first parameter to improve the value of an objective function comprising at least the estimated probability of exactly N occurrences of the first portion of speech; and modeling the first portion of speech with the first model function with the revised value of the first parameter.
-
Specification