Speech recognizer

US 20020116196A1
Filed: 09/21/2001
Published: 08/22/2002
Est. Priority Date: 11/12/1998
Status: Abandoned Application

First Claim

Patent Images

1. A computer system, comprising:

a speech transducer for capturing speech; and

a voice recognizer coupled to said speech transducer, including;

a voice feature extractor, said voice feature extractor generating labels for said speech;

a dictionary containing an entry for each word in the dictionary, said entry having labels and a context guide;

a word preselector coupled to said voice feature extractor and to said dictionary, said word preselector generating a list of candidate words with similar labels;

a syntax checker coupled to said word preselector, said syntax checker selecting a first representative word from the candidate words based on said context guide; and

a voice user interface coupled to said word preselector and said syntax checker, said voice user interface allowing the user to accept or reject the first representative word, said voice user interface presenting a second representative word selected from said candidate words if the user rejects the first representative word.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention provides a speech transducer which captures sound and delivers the data to the robust and efficient speech recognizer. To minimize power consumption, a voice wake-up indicator detects sounds directed at the voice recognizer and generates a power-up signal to wake up the speech recognizer from a powered-down state. Further, to isolate speech in noisy environments, a robust high order speech transducer comprising a plurality of microphones positioned to collect different aspects of sound is used. Alternatively, the high order speech transducer may consist of a microphone and a noise canceller which characterizes the background noise when the user is not speaking and subtracts the background noise when the user is speaking to the computer to provide a cleaner speech signal.

The user'"'"'s speech signal is next presented to a voice feature extractor which extracts features using linear predictive coding, fast Fourier transform, auditory model, fractal model, wavelet model, or combinations thereof. The input speech signal is compared with word models stored in a dictionary using a template matcher, a fuzzy logic matcher, a neural network, a dynamic programming system, a hidden Markov model, or combinations thereof. The word model is stored in a dictionary with an entry for each word, each entry having word labels and a context guide.

A word preselector receives the output of the voice feature extractor and queries the dictionary to compile a list of candidate words with the most similar phonetic labels. These candidate words are presented to a syntax checker for selecting a first representative word from the candidate words, as ranked by the context guide and the grammar structure, among others. The user can accept or reject the first representative word via a voice user interface. If rejected, the voice user interface presents the next likely word selected from the candidate words. If all the candidates are rejected by the user or if the word does not exist in the dictionary, the system can generate a predicted word based on the labels. Finally, the voice recognizer also allows the user to manually enter the word or spell the word out for the system. In this manner, a robust and efficient human-machine interface is provided for recognizing speaker independent, continuous speech.

253 Citations

26 Claims

1. A computer system, comprising:
- a speech transducer for capturing speech; and
  
  a voice recognizer coupled to said speech transducer, including;
  
  a voice feature extractor, said voice feature extractor generating labels for said speech;
  
  a dictionary containing an entry for each word in the dictionary, said entry having labels and a context guide;
  
  a word preselector coupled to said voice feature extractor and to said dictionary, said word preselector generating a list of candidate words with similar labels;
  
  a syntax checker coupled to said word preselector, said syntax checker selecting a first representative word from the candidate words based on said context guide; and
  
  a voice user interface coupled to said word preselector and said syntax checker, said voice user interface allowing the user to accept or reject the first representative word, said voice user interface presenting a second representative word selected from said candidate words if the user rejects the first representative word.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 23)
- - 2. The computer system of claim 1, wherein said voice feature extractor extracts features using linear predictive coding, fast Fourier transform, auditory, fractal, wavelet, or noise spectral subtraction models.
  - 3. The computer system of claim 1, further comprising a phoneme recognizer coupled to said voice feature extractor.
  - 4. The computer system of claim 3, wherein said phoneme recognizer recognizes phonemes using a template matching, fuzzy logic, a neural network, a dynamic programming, or a hidden Markov model.
  - 5. The computer system of claim 1, wherein said word preselector hashes into a plurality of candidates using similarity count of start trigrams and inner trigrams.
  - 6. The computer system of claim 1, wherein said word preselector further generates a new word based on the label when said label is not found in said dictionary.
  - 7. The computer system of claim 1, wherein said syntax checker recognizes phonemes using an N-gram statistical model or a grammar model.
  - 8. The computer system of claim 1, further comprising a PIM database.
  - 9. The computer system of claim 1, wherein said PIM database comprises an appointment calendar.
  - 10. The computer system of claim 1, wherein said PIM database comprises a telephone directory.
  - 23. The computer system of claim 1, wherein said speech transducer includes a microphone and a noise canceller which characterizes the background noise when a user is not speaking and subtracts the background noise when the user is speaking to the computer.

11. A computer system, comprising:
- a wearable housing;
  
  a speech transducer mounted on said wearable housing;
  
  a voice recognizer coupled to said speech transducer, said voice recognizer recognizing speech using dynamic programming; and
  
  means for securing the computer system to the user.
- View Dependent Claims (12, 13, 14, 16, 18, 19, 20, 21, 22, 25, 26)
- - 12. The computer system of claim 11, further comprising an optical transceiver coupled to said computer.
  - 13. The computer system of claim 11, further comprising a radio receiver coupled to said computer.
  - 14. The computer system of claim 11, further comprising a radio transmitter coupled to said computer.
  - 16. The computer system of claim 15, wherein said hidden Markov model further comprises a neural network.
  - 18. The computer system of claim 17, wherein said power-up indicator includes a low-pass filter.
  - 19. The computer system of claim 17, wherein said power-up indicator includes a comparator.
  - 20. The computer system of claim 17, wherein said power-up indicator includes a half-wave rectifier.
  - 21. The computer system of claim 17, wherein said power-up indicator includes a root-mean-square device.
  - 22. The computer system of claim 17, wherein said power-up indicator includes a neural network.
  - 25. The programmable storage device of claim 24, wherein said candidates are stored in a dictionary.
  - 26. The programmable storage device of claim 24, wherein an unknown word not stored in said dictionary is generated using said phonemes.

15. A computer system, comprising:
- a wearable housing;
  
  a speech transducer for capturing speech, said speech transducer mounted on said wearable housing;
  
  a voice recognizer coupled to said speech transducer, said voice recognizer recognizing speech using a hidden Markov model; and
  
  means for securing the computer system to the user.

17. A computer system having a power-down mode to conserve energy, comprising:
- a speech transducer for capturing speech;
  
  a power-up indicator coupled to said speech transducer, said power-up indicator detecting speech directed at said speech transducer and asserting a wake-up signal; and
  
  a voice recognizer coupled to said speech transducer and said wake-up signal, said voice recognizer waking up from the power-up mode when said wake-up signal is asserted.

24. A programmable storage device having a computer readable program code embedded therein for recognizing a pronunciation of a word, said program storage device comprising:
- a feature extracting code, said feature extracting code generating linear predictive coding parameters, Fourier transform parameters, auditory parameters, fractal parameters, or wavelet parameters representative of the pronunciation;
  
  a phoneme identifier code coupled to said feature extracting code, said phoneme identifier code using a template matching, fuzzy logic, a neural network, a dynamic programming, or a hidden Markov model based on said parameters;
  
  an N-gram generator code coupled to said phoneme identifier code, said N-gram generator code generating one or more initial N-grams and inner N-grams from the phoneme sequence;
  
  a preselector code coupled to said N-gram generator code, said preselector code forming one or more candidates based on said N-grams; and
  
  a word generator code coupled to said preselector code, said word generator code selecting the candidate closest to said word based on an N-gram statistical model or a grammar model.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
MUSE Green Investments LLC (Intellectual Ventures LLC)
Original Assignee
MUSE Green Investments LLC (Intellectual Ventures LLC)
Inventors
Tran, Bao Q.

Application Number

US09/962,759
Publication Number

US 20020116196A1
Time in Patent Office

Days
Field of Search
US Class Current

704/270
CPC Class Codes

G06F 1/3203   Power management, i.e. even...

G10L 15/26   Speech to text systems G10L...

G10L 2015/223   Execution procedure of a sp...

Speech recognizer

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

253 Citations

26 Claims

Specification

Use Cases

Quick Links

Others

Speech recognizer

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

253 Citations

26 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others