Extensible speech recognition system that provides a user with audio feedback
First Claim
1. In a computer-implemented speech recognition system that recognizes speech input from a speaker and that includes an audio output device, a method comprising the computer-implemented steps of:
- providing a text-to-speech mechanism for creating a spoken version of text;
for a given word of text, using the text-to-speech mechanism to generate a spoken version of the given word;
outputting the spoken version of the given word on the audio output device so that a user of the speech recognition system knows how the speech recognition system expects the given word to be pronounced; and
providing a user interface element for a user to request a different pronunciation of the given word and wherein the spoken version of the given word is output in response to the user requesting the different pronunciation of the given word via the user interface element.
2 Assignments
0 Petitions
Accused Products
Abstract
A speech recognition system is extensible in that new terms may be added to a list of terms that are recognized by the speech recognition system. The speech recognition system provides audio feedback when new terms are added so that a user may hear how the system expects the word to be pronounced. The user may then accept the pronunciation or provide his own pronunciation. The user may also selectively change the pronunciation of words to avoid misrecognitions by the system. The system may provide appropriate user interface elements for enabling a user to change the pronunciation of words. The system may also include intelligence for automatically changing the pronunciation of words used in recognition based upon empirically derived information.
57 Citations
51 Claims
-
1. In a computer-implemented speech recognition system that recognizes speech input from a speaker and that includes an audio output device, a method comprising the computer-implemented steps of:
-
providing a text-to-speech mechanism for creating a spoken version of text; for a given word of text, using the text-to-speech mechanism to generate a spoken version of the given word; outputting the spoken version of the given word on the audio output device so that a user of the speech recognition system knows how the speech recognition system expects the given word to be pronounced; and providing a user interface element for a user to request a different pronunciation of the given word and wherein the spoken version of the given word is output in response to the user requesting the different pronunciation of the given word via the user interface element. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. In a computer-implemented dictation system for converting spoken input from a user into text, a method comprising the steps of:
-
providing a list of pronunciation for words that are recognized by the dictation system; providing an audible current pronunciation of a selected word stored in the list; receiving a request from a user to change the current pronunciation of the selected word that is stored in the list to a new pronunciation, said request specifying the new pronunciation; and changing the pronunciation stored in the list for the selected word from the current pronunciation to the new pronunciation. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. In a computer-implemented speech recognition system that recognizes speech input from a speaker and that includes an audio output device, a method comprising the steps of:
-
providing a dictionary of terms that the speech recognition system recognizes, said dictionary specifying how the speech recognition system expects each term to be pronounced; receiving a request from a user to add a new term to the dictionary; generating a pronunciation for the new term by the speech recognition system; outputting the pronunciation for the new term on the audio output device so a user can observe and change the pronunciation for the new term; and adding the new term and the generated pronunciation to the dictionary. - View Dependent Claims (15, 16, 17, 18)
-
-
19. In a computer-implemented speech recognition system for recognizing speech spoken from a speaker, said system including an audio output device and a text-to-speech engine for generating speech from text, a method comprising the steps of:
-
storing multiple pronunciations for a selected word in a dictionary that is used by the text-to-speech engine; outputting each of the pronunciations on the audio output device so that a user can hear the pronunciations; and in response to a user selecting one of the pronunciations, using the selected pronunciation by the speech recognition system to recognize speech. - View Dependent Claims (20)
-
-
21. In a computer-implemented speech recognition system for recognizing speech from a speaker, a method comprising the steps of:
-
providing a dictionary of terms having pronunciations for the terms that correspond with how the speech recognition system expects the terms to be pronounced; performing a heuristic to derive alternative pronunciations for the terms; on multiple instances where the speaker speaks a selected one of terms such that the speech recognition system recognizes the selected term, determining which of the alternative pronunciations of the selected terms the user used; and based on the determining step, identifying which of the alternative pronunciations of the selected term the user is most likely using and updating the dictionary to designate the pronunciation that the user is most likely using as how the speech recognition system expects the selected term to be pronounced. - View Dependent Claims (22)
-
-
23. In a computer-implemented speech recognition system for recognizing spoken speech from a speaker, said system having an output device, a method comprising the steps of:
-
receiving a spoken version of a term having a given pronunciation from the speaker; providing an expected pronunciation for the term that corresponds to how the speech recognition system expects the speaker to speak the term; comparing the given pronunciation of the spoken version of the term with the expected pronunciation of the term to determine a degree of difference between the given pronunciation of the spoken version of the term and the expected pronunciation of the term; and where the degree of difference exceeds an acceptable predetermined threshold, generating output on the output device to inform the speaker that the degree of difference exceeds the threshold. - View Dependent Claims (24, 25, 26, 27)
-
-
28. In a computer-implemented speech recognition system that recognizes speech input from a speaker and that includes an audio output device, a computer-readable medium holding computer-executable instructions for performing a method comprising the computer-implemented steps of:
-
providing a text-to-speech mechanism for creating a spoken version of text; for a given word of text, using the text-to-speech mechanism to generate a spoken version of the given word; outputting the spoken version of the given word on the audio output device so that a user of the speech recognition system knows how the speech recognition system expects the given word to be pronounced; and providing a user interface element for a user to request a proper pronunciation of the given word and wherein the spoken version of the given word is output in response to the user requesting the proper pronunciation of the given word via the user interface element. - View Dependent Claims (29, 30, 31, 32, 33)
-
-
34. In a computer-implemented dictation system for converting spoken input from a user into text, a computer-readable medium holding computer-executable instructions for performing a method comprising the steps of:
-
providing a list of pronunciations for words that are recognized by the dictation system; providing an audible current pronunciation of a selected word stored in the list; receiving a request from a user to change the current pronunciation of the selected word that is stored in the list to a new pronunciation, said request specifying the new pronunciation; and changing the pronunciation stored in the list for the selected word from the current pronunciation to the new pronunciation. - View Dependent Claims (35, 36, 37, 38)
-
-
39. In a computer-implemented speech recognition system that recognizes speech input from a speaker and that includes an audio output device, a computer-readable medium holding computer-executable instructions for performing a method comprising the steps of:
-
providing a dictionary of terms that the speech recognition system recognizes, said dictionary specifying how the speech recognition system expects each term to be pronounced; receiving a request from a user to add a new term to the dictionary; generating a pronunciation for the new term by the speech recognition system; outputting the pronunciation for the new term on the audio output device so a user can observe and change the pronunciation for the new term; and adding the new term and the generated pronunciation to the dictionary. - View Dependent Claims (40, 41, 42)
-
-
43. In a computer-implemented speech recognition system for recognizing speech spoken from a speaker, said system including an audio output device and a text-to-speech engine for generating speech from text, a computer-readable medium holding computer-executable instructions for performing a method comprising the steps of:
-
storing multiple pronunciations for a selected word in a dictionary that is used by the text-to-speech engine; outputting each of the pronunciations on the audio output device so that a user can hear the pronunciations; and in response to a user selecting one of the pronunciations, using the selected pronunciation by the speech recognition system to recognize speech. - View Dependent Claims (44)
-
-
45. In a computer-implemented speech recognition system for recognizing speech from a speaker, a computer-readable medium holding computer-executable instructions for performing a method comprising the steps of:
-
providing a dictionary of terms having pronunciations for the terms that correspond with how the speech recognition system expects the terms to be pronounced; deriving alternative pronunciations of the terms by applying a heuristic; on multiple instances where the speaker speaks a selected one of terms such that the speech recognition system recognizes the selected term, determining which of the alternative pronunciations of the selected terms the user used; and based on the determining step, identifying which of the alternative pronunciations of the selected term the user is most likely using and updating the dictionary to designate the pronunciation that the user is most likely using as how the speech recognition system expects the selected term to be pronounced.
-
-
46. In a computer-implemented speech recognition system for recognizing spoken speech from a speaker, said system having an output device, a computer-readable medium holding computer-executable instructions for performing a method comprising the steps of:
-
receiving a spoken version of a term having a given pronunciation from the speaker; providing an expected pronunciation for the term that corresponds to how the speech recognition system expects the speaker to speak the term; comparing the given pronunciation of the spoken version of the term with the expected pronunciation of the term to determine a degree of difference between the given pronunciation of the spoken version of the term and the expected pronunciation of the term; and where the degree of difference exceeds an acceptable predetermined threshold, generating output on the output device to inform the speaker that the degree of difference exceeds the threshold.
-
-
47. In a computer-implemented speech recognition system for recognizing spoken speech from a speaker, said system having a display device, a method comprising the steps of:
-
providing an expected pronunciation of a given word that constitutes how the speech recognition system expects the given word to be pronounced by the speaker; gathering statistics regarding how frequently the given word of spoken speech from the speaker is misrecognized by the speech recognition system; and where the statistics indicate that the given word is misrecognized more frequently than a threshold value, prompting the user by generating output on the display device through a user interface element such that the user can request a different pronunciation to correct the expected pronunciation of the given word, a spoken version of the given word with a corrected expected pronunciation being output by the user interface element.
-
-
48. A speech recognition system for recognizing speech from a speaker, comprising:
-
an input device for receiving speech input from the speaker; a speech recognition engine for recognizing speech in the speech input received from the speaker by the input device wherein the speech recognition engine has expected pronunciations for portions of speech; a text-to-speech engine for producing a spoken representation of text constituting a selected portion of speech; an audio output device for outputting the spoken representation of the text from the text-to-speech engine so that the user knows the expected pronunciation of the selected portion of speech; and an interface component configured to receive a new pronunciation from the user, indicative of a pronunciation more closely conforming to a pronunciation used by the user. - View Dependent Claims (49, 50, 51)
-
Specification