Combined speech recognition and text-to-speech generation
First Claim
1. A computing device for performing large vocabulary speech recognition comprising:
- processor readable memory;
one or more processors capable of executing program instructions read from said memory;
a microphone or audio input for providing an electronic signal representing an utterance to be recognized;
a speaker or audio output for enabling an electronic representation of sound produced in said device to be transduced into a corresponding sound;
programming recorded in the memory including;
speech recognition programming for performing large vocabulary speech recognition that responds to the electronic representations of a sequence of one or more utterances received from the microphone or audio input by producing a text output corresponding to the one or more words recognized as corresponding to the utterances; and
TTS programming for providing TTS output to said speaker or audio output saying one or more words of said text recognized by said speech recognition;
shared speech modeling data stored in said memory that is used by said speech recognition programming to recognize words corresponding to spoken utterances and by said TTS programming to generate sounds corresponding to the speaking of a sequence of one or more; and
wherein the computing device is capable of responding to text navigation commands by moving a cursor backward and forward in the one or more words of said text output, and responding to each movement in response to one of said text navigation commands by providing a TTS output to said sneaker or audio output saying one or more words either starting or ending with the location of the cursor after each of said movements.
8 Assignments
0 Petitions
Accused Products
Abstract
Text-to-speech (TTS) generation is used in conjunction with large vocabulary speech recognition to say words selected by the speech recognition. The software for performing the large vocabulary speech recognition can share speech modeling data with the TTS software. TTS or recorded audio can be used to automatically say both recognized text and the names of recognized commands after their recognition. The TTS can automatically repeats text recognized by the speech recognition after each of a succession of end of utterance detections. A user can move a cursor back or forward in recognized text, and the TTS can speak one or more words at the cursor location after each such move. The speech recognition can be used to produces a choice list of possible recognition candidates and the TTS can be used to provide spoken output of one or more of the candidates on the choice list.
-
Citations
33 Claims
-
1. A computing device for performing large vocabulary speech recognition comprising:
-
processor readable memory; one or more processors capable of executing program instructions read from said memory; a microphone or audio input for providing an electronic signal representing an utterance to be recognized; a speaker or audio output for enabling an electronic representation of sound produced in said device to be transduced into a corresponding sound;
programming recorded in the memory including;speech recognition programming for performing large vocabulary speech recognition that responds to the electronic representations of a sequence of one or more utterances received from the microphone or audio input by producing a text output corresponding to the one or more words recognized as corresponding to the utterances; and
TTS programming for providing TTS output to said speaker or audio output saying one or more words of said text recognized by said speech recognition;shared speech modeling data stored in said memory that is used by said speech recognition programming to recognize words corresponding to spoken utterances and by said TTS programming to generate sounds corresponding to the speaking of a sequence of one or more; and wherein the computing device is capable of responding to text navigation commands by moving a cursor backward and forward in the one or more words of said text output, and responding to each movement in response to one of said text navigation commands by providing a TTS output to said sneaker or audio output saying one or more words either starting or ending with the location of the cursor after each of said movements. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computing device for performing large vocabulary speech recognition comprising:
-
computer readable memory; one or more processors capable of executing program instructions read from said memory; a microphone or audio input for providing an electronic signal representing an utterance to be recognized; a speaker or audio output for enabling an electronic representation of sound produced in said device to be transduced into a corresponding sound; and
programming recorded in the memory including instructions for;performing large vocabulary speech recognition upon an electronic representations of utterances received from the microphone or audio input, including responding to certain utterances as text words which are supplied to a text output and responding to other utterances as a recognized commands; providing TTS output to said speaker or audio output saying one or more words of said text output; and providing TTS or recorded audio output to said speaker or audio output saying the name of a recognized command. - View Dependent Claims (9, 10)
-
-
11. A computing device for performing large vocabulary speech recognition comprising:
-
computer readable memory; one or more processors capable of executing program instructions read from said memory; a microphone or audio input for providing an electronic signal representing an utterance to be recognized; a speaker or audio output for enabling an electronic representation of sound produced in said device to be transduced into a corresponding sound; and
programming recorded in the memory including instructions for;performing large vocabulary speech recognition that responds to the electronic representations of each of a sequence of one or more utterances received from the microphone or audio input by; selecting as a best scoring recognition candidate the one or more words recognized by the speech recognition as corresponding to the utterance; detecting the end of the utterance; and
thenresponding to the detection of the end of utterance by providing TTS output to said speaker or audio output saying the one or more words of said best scoring recognition candidate for the utterance whereby the device can generate audio feedback on the one or more words recognized for each of a succession of large vocabulary speech utterances at the end of each such utterance. - View Dependent Claims (12, 13, 14, 15, 16)
-
-
17. A computing device for performing large vocabulary speech recognition comprising:
-
computer readable memory;
one or more processors capable of executing program instructions read from said memory;a microphone or audio input for providing an electronic signal representing an utterance to be recognized; a speaker or audio output for enabling an electronic representation of sound produced in said device to be transduced into a corresponding sound; and programming recorded in the memory including instructions for; performing larger vocabulary speech recognition upon an electronic representation of utterances received from the microphone or audio input to produce a text output; responding to text navigation commands by moving a cursor backward and forward in the one or more words of said text output; and responding to each movement in response to one of said navigational commands by providing a TTS output to said speaker or audio output saying one or more words either starting or ending with the location of the cursor after each of said movements. - View Dependent Claims (18, 19, 20, 21, 22)
-
-
23. A computing device for performing large vocabulary speech recognition comprising:
-
computer readable memory; one or more processors capable of executing program instructions read from said memory; a microphone or audio input for providing an electronic signal representing an utterance to be recognized; a speaker or audio output for enabling an electronic representation of sound produced in said device to be transduced into a corresponding sound; programming recorded in the memory including instructions for; performing large vocabulary speech recognition upon an electronic representations of uttered words received from the microphone or audio input to produce a choice list of recognition candidates, each comprised of a sequence of one or more words, selected by the recognition as scoring best against said uttered sound; using text-to-speech technology to provide spoken output to said speaker or audio output saying a plurality of the recognition candidates in the choice list; enabling the user to select one recognition candidates from among the plurality of such candidates said by said text-to-speech technology. - View Dependent Claims (24, 25, 26, 27, 28, 29, 30, 31, 32, 33)
-
Specification