Speech recognizer
First Claim
1. A computer system, comprising:
- a speech transducer for capturing speech; and
a voice recognizer coupled to said speech transducer, including;
a voice feature extractor, said voice feature extractor generating labels for said speech;
a dictionary containing an entry for each word in the dictionary, said entry having labels and a context guide;
a word preselector coupled to said voice feature extractor and to said dictionary, said word preselector generating a list of candidate words with similar labels;
a syntax checker coupled to said word preselector, said syntax checker selecting a first representative word from the candidate words based on said context guide; and
a voice user interface coupled to said word preselector and said syntax checker, said voice user interface allowing the user to accept or reject the first representative word, said voice user interface presenting a second representative word selected from said candidate words if the user rejects the first representative word.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention provides a speech transducer which captures sound and delivers the data to the robust and efficient speech recognizer. To minimize power consumption, a voice wake-up indicator detects sounds directed at the voice recognizer and generates a power-up signal to wake up the speech recognizer from a powered-down state. Further, to isolate speech in noisy environments, a robust high order speech transducer comprising a plurality of microphones positioned to collect different aspects of sound is used. Alternatively, the high order speech transducer may consist of a microphone and a noise canceller which characterizes the background noise when the user is not speaking and subtracts the background noise when the user is speaking to the computer to provide a cleaner speech signal.
The user'"'"'s speech signal is next presented to a voice feature extractor which extracts features using linear predictive coding, fast Fourier transform, auditory model, fractal model, wavelet model, or combinations thereof. The input speech signal is compared with word models stored in a dictionary using a template matcher, a fuzzy logic matcher, a neural network, a dynamic programming system, a hidden Markov model, or combinations thereof. The word model is stored in a dictionary with an entry for each word, each entry having word labels and a context guide.
A word preselector receives the output of the voice feature extractor and queries the dictionary to compile a list of candidate words with the most similar phonetic labels. These candidate words are presented to a syntax checker for selecting a first representative word from the candidate words, as ranked by the context guide and the grammar structure, among others. The user can accept or reject the first representative word via a voice user interface. If rejected, the voice user interface presents the next likely word selected from the candidate words. If all the candidates are rejected by the user or if the word does not exist in the dictionary, the system can generate a predicted word based on the labels. Finally, the voice recognizer also allows the user to manually enter the word or spell the word out for the system. In this manner, a robust and efficient human-machine interface is provided for recognizing speaker independent, continuous speech.
253 Citations
26 Claims
-
1. A computer system, comprising:
-
a speech transducer for capturing speech; and
a voice recognizer coupled to said speech transducer, including;
a voice feature extractor, said voice feature extractor generating labels for said speech;
a dictionary containing an entry for each word in the dictionary, said entry having labels and a context guide;
a word preselector coupled to said voice feature extractor and to said dictionary, said word preselector generating a list of candidate words with similar labels;
a syntax checker coupled to said word preselector, said syntax checker selecting a first representative word from the candidate words based on said context guide; and
a voice user interface coupled to said word preselector and said syntax checker, said voice user interface allowing the user to accept or reject the first representative word, said voice user interface presenting a second representative word selected from said candidate words if the user rejects the first representative word. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 23)
-
-
11. A computer system, comprising:
-
a wearable housing;
a speech transducer mounted on said wearable housing;
a voice recognizer coupled to said speech transducer, said voice recognizer recognizing speech using dynamic programming; and
means for securing the computer system to the user. - View Dependent Claims (12, 13, 14, 16, 18, 19, 20, 21, 22, 25, 26)
-
-
15. A computer system, comprising:
-
a wearable housing;
a speech transducer for capturing speech, said speech transducer mounted on said wearable housing;
a voice recognizer coupled to said speech transducer, said voice recognizer recognizing speech using a hidden Markov model; and
means for securing the computer system to the user.
-
-
17. A computer system having a power-down mode to conserve energy, comprising:
-
a speech transducer for capturing speech;
a power-up indicator coupled to said speech transducer, said power-up indicator detecting speech directed at said speech transducer and asserting a wake-up signal; and
a voice recognizer coupled to said speech transducer and said wake-up signal, said voice recognizer waking up from the power-up mode when said wake-up signal is asserted.
-
-
24. A programmable storage device having a computer readable program code embedded therein for recognizing a pronunciation of a word, said program storage device comprising:
-
a feature extracting code, said feature extracting code generating linear predictive coding parameters, Fourier transform parameters, auditory parameters, fractal parameters, or wavelet parameters representative of the pronunciation;
a phoneme identifier code coupled to said feature extracting code, said phoneme identifier code using a template matching, fuzzy logic, a neural network, a dynamic programming, or a hidden Markov model based on said parameters;
an N-gram generator code coupled to said phoneme identifier code, said N-gram generator code generating one or more initial N-grams and inner N-grams from the phoneme sequence;
a preselector code coupled to said N-gram generator code, said preselector code forming one or more candidates based on said N-grams; and
a word generator code coupled to said preselector code, said word generator code selecting the candidate closest to said word based on an N-gram statistical model or a grammar model.
-
Specification