Methods, systems, and programming for performing speech recognition

US 20040049388A1
Filed: 09/06/2002
Published: 03/11/2004
Est. Priority Date: 09/05/2001
Status: Active Grant

First Claim

Patent Images

1. A method of speech recognition comprising:

providing a user interface which allows a user to select between generating a first and a second user input;

responding to the generation of the first user input by performing large vocabulary recognizing on one or more utterances in a prior language context dependent mode, which recognizes at least the first word of such recognition depending in part on a language model context created by a previously recognized word; and

responding to the generation of the second user input by performing large vocabulary recognizing on one or more utterances in a prior language context independent mode, which recognizes at least the first word of such recognition independently of a language model context created by any previously recognized word.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention relates to: speech recognition using selectable recognition modes; using choice lists in large-vocabulary speech recognition; enabling users to select word transformations; speech recognition that automatically turns recognition off in one or more specified ways; phone key control of large-vocabulary speech recognition; speech recognition using phone key alphabetic filtering and spelling: speech recognition that enables a user to perform re-utterance recognition; the combination of speech recognition and text-to-speech (TTS) generation; the combination of speech recognition with handwriting and/or character recognition; and the combination of large-vocabulary speech recognition with audio recording and playback.

Citations

248 Claims

1. A method of speech recognition comprising:
- providing a user interface which allows a user to select between generating a first and a second user input;
  
  responding to the generation of the first user input by performing large vocabulary recognizing on one or more utterances in a prior language context dependent mode, which recognizes at least the first word of such recognition depending in part on a language model context created by a previously recognized word; and
  
  responding to the generation of the second user input by performing large vocabulary recognizing on one or more utterances in a prior language context independent mode, which recognizes at least the first word of such recognition independently of a language model context created by any previously recognized word.

2. A method as in innovation 1 wherein:
- the user interface includes a first button and a second button;
  
  the first user input is generated by pressing the first button; and
  
  the second user input is generated by pressing the second button.

3. A method as in innovation 1 wherein prior language context independent mode uses language model context created by the first and any successively recognized words of an utterance in selecting the second and successive words, if any, recognized for an utterance.

4. A method as in innovation 1 further including providing recognized words output by the prior language context dependent and independent modes as a text input to another program.

5. A method as in innovation 4 wherein said method is performed by a software input panel in Microsoft Windows CE

6. A method of speech recognition comprising:
- providing a user interface which allows a user to select between generating a first and a second user input;
  
  responding to the generation of the first user input by recognizing one or more utterances as one or more words in a given vocabulary in a continuous speech recognition mode; and
  
  responding to the generation of the second user input by recognizing one or more utterances as one or more words in the same given vocabulary in a discrete speech recognition mode.

7. A method as in innovation 6 wherein the given vocabulary is a large vocabulary.

8. A method as in innovation 6 wherein the given vocabulary is an alphabetic input vocabulary.

9. A method as in innovation 6 wherein:
- said user interface allows a user to select between generating a third and a fourth input independently from the selection of the first and second input; and
  
  said method further includes responding to said third and fourth inputs, respectively, by selecting as said given vocabulary a first vocabulary or a second vocabulary.

10. A method as in innovation 9 wherein said first and second vocabulary are a large vocabulary of words and an alphabetic input vocabulary.

11. A method as in innovation 9 wherein said first and second vocabulary are two different alphabetic input vocabularies.

12. A method as in innovation 6 wherein:
- the user interface provided includes a first button and a second button;
  
  the first user input is generated by pressing the first button; and
  
  the second user input is generated by pressing the second button.

13. A method as in innovation 12 wherein:
- pressing the first and second buttons causes their respective recognition mode recognize from substantially the time of the pressing of such a button until the next end of utterance detected;
  
  wherein the discrete recognition is substantially limited to the recognition of one or more candidates for a single word matching said utterance and the continuous recognition mode is not so limited.

14. A method as in innovation 6 wherein acoustic models used to represent words in the discrete recognition mode are different than the acoustic models used to represent the same words in the continuous recognition mode.

15. A method of speech recognition comprising:
- providing a user interface which allows a user to select between generating a first and a second user input;
  
  responding to the generation of the first user input by recognizing one or more utterances as one or more words in a first alphabetic entry vocabulary; and
  
  responding to the generation of the first user input by recognizing one or more utterances as one or more words in a second alphabetic entry vocabulary.

16. A method as in innovation 15 wherein:
- the first alphabetic entry vocabulary includes the names of each letter of the alphabet and the second alphabetic entry vocabulary does not; and
  
  the second alphabetic entry vocabulary includes one or more words that start with each letter of the alphabet and the first alphabetic entry vocabulary does not.

17. A method as in innovation 15 wherein said user interface provides a separate button for generating said first and second inputs.

18. A method as in innovation 17 wherein touching of each of said buttons turns on recognition in the button'"'"'s associated alphabetic entry mode.

19. A method as in innovation 15 wherein said user interface enables:
- a user to select a filtering mode in which word choices for the recognition of a given word are limited to word'"'"'s whose spelling matches a sequence of one or more characters input by the user;
  
  a user to enter said one or more filtering characters by voice recognition using either said first or second alphabetic entry modes; and
  
  said first and second inputs select between whether such recognition of filtering characters is performed using said first or second alphabetic entry modes.

20. A method of speech recognition comprising:
- providing a user interface which allows a user to select between generating a first, a second, and a third user input;
  
  responding to the generation of the first user input by recognizing one or more utterances as one or more words in a first, general purpose large vocabulary; and
  
  responding to the generation of the second user input by recognizing one or more utterances as one or more words in a second, alphabetic entry vocabulary; and
  
  responding to the generation of the third user input by recognizing one or more utterances as one or more words in a third, vocabulary which represent non-spelled text inputs;
  
  sequentially receiving output received from recognition in either of the three vocabularies and placing that output into a common text.

21. A method as in innovation 20 wherein the third vocabulary is a digits vocabulary.

22. A method as in innovation 20 wherein the third vocabulary is a vocabulary of punctuation marks

23. A method as in innovation 20 wherein the user interface provides a different button for the selection of each of first, second, and third inputs.

24. A method as in innovation 23 wherein pressing the button associated with one of said three vocabularies turns on recognition using that vocabulary.

25. A method of performing word recognition comprising:
- receiving a word input signal containing non-textual user input representing a sequence of one or more words;
  
  performing word recognition upon the input signal to produce a choice list of best scoring recognition candidates, each comprised of a sequence of one or more words and/or numbers, found by the recognizer to have a relatively high probability of corresponding to the input signal;
  
  producing user-perceivable output representing a choice list of best scoring recognition candidates, with the candidates being ordered in said choice list according to a character ordering of a sequence of characters corresponding to the one or more words associated with each candidate in the list;
  
  providing a user interface which enables a user to select one of the character-ordered recognition candidates from the choice list;
  
  responding to user selection of one of the recognition candidates from the choice list by treating the selected candidate as the one or more words and/or numbers that correspond to the word input signal.

26. A method as in innovation 25 wherein:
- the word recognition selects a best scoring recognition candidate; and
  
  the best scoring candidate is placed in a position in said user-perceivable output that is independent of where the character sequence corresponding to the one or more words associated with the best scoring candidate would, according to said character ordering, fall in the character-ordered list.

27. A method as in innovation 25 wherein:
- the word input signal is a representation of an utterance of a spoken word; and
  
  the word recognition is speech recognition.

28. A method as in innovation 25 wherein the user perceivable output includes showing a character-ordered list of said best scoring recognition candidates on a visual display.

29. A method as in innovation 28 wherein:
- said choice list includes more recognition candidates than fit on the display at one time; and
  
  the choice list is scrollable, so that a user can select to move the list relative to the display, so as to see more recognition candidates on the list than fit on the display at one time.

30. A method as in innovation 28 wherein:
- the character-ordered list is an alphabetically ordered list; and
  
  the display of an individual recognition candidates in the list includes a sequence of one or more alphabetically spelled words.

31. A method as in innovation 30 wherein:
- said choice list includes more recognition candidates than fit on the display at one time; and
  
  the choice list is scrollable, so that a user can select to move the list relative to the display, so as to see more recognition candidates on the list than fit on the display at one time.

32. A method as in innovation 31 wherein:
- said choice list has to alphabetically ordered sub-lists;
  
  the first sub-list includes the highest scoring choice candidates that fit on the display at one time; and
  
  the second sub-list includes other best scoring choice candidates.

33. A method as in innovation 32 wherein the second sub-list has more candidates than fit one display at one time.

34. A method as in innovation 30 further including:
- providing a user interface that allows the user to select a filtering sequence of one or more letter-indications after said display of the character-ordered list of best scoring recognition candidates; and
  
  responding to the selection of said filtering sequence by generating and showing on said display a new alphabetized choice list of recognition candidates, which new choice list is limited to candidates whose sequence of one or more characters start with said filtering sequence; and
  
  providing a user interface that enables a user to select one of the alphabetized recognition candidates from the new choice list;
  
  responding to a user selection of one of the recognition candidates in the new choice list by treating the selected candidate as the one or more words and/or numbers that correspond to the word input signal.

35. A method as in innovation 34 wherein said responding to the selection of said filtering sequence by generating and showing a new alphabetized choice list includes:
- detecting whether or not if the number of recognition candidates is below a desired number;
  
  when a detection is made that the number of recognition candidates is below the desired number, selecting from a vocabulary list one or more additional candidates that start with the filtering sequence for inclusion in said new alphabetized choice list.

36. A method as in innovation 35 wherein:
- said new alphabetized choice list includes more recognition candidates than fit on the display at one time; and
  
  the choice list is scrollable, so that a user can select to move the list relative to the display, so as to see more recognition candidates on the list than fit on the display at one time.

37. A method as in innovation 34 wherein:
- the method is performed on a telephone having a telephone keypad;
  
  the user interface that allows the user to input said letter-indicating inputs allows the user to enter such inputs by pressing one or more keys of said telephone keypad, with the pressing of a given telephone pad key indicating that corresponding letter in the sequence of one or more characters associated with a desired recognition candidate is one of a set of multiple letters associated with the given key; and
  
  the new candidate list is limited to candidates whose sequence of one or more words start with an initial sequence of letters corresponding to the sequence of letter-indicating inputs, in which each letter of the initial sequence of letters corresponds to one of the set of letters indicated by a corresponding letter-indicating input in said sequence of letter-indicating inputs.

38. A method as in innovation 37 wherein:
- said new choice list includes more recognition candidates than fit on the display at one time; and
  
  the choice list is scrollable, so that a user can select to move the list relative to the display, so as to see more recognition candidates on the list than fit on the display at one time.

39. A method as in innovation 34 wherein:
- the user interface that allows the use to select a sequence of one or more letter-indications allows a user to select a desired number of characters from the start of a string of alphabetic characters contained within a selected one of the recognition candidates displayed in a choice list; and
  
  and said user interface response to such a selection by using the selected one or more characters as all or part of said sequence of one or more letter-indications.

40. A method as in innovation 30 further including:
- providing a user interface that allows the user to indicate the selection of a location on a displayed alphabetized choice between listed candidates or between a listed candidate and the beginning or end of the list; and
  
  responding to such a selection by redisplaying a new alphabetized choice list limited to recognition candidates having spellings between the two candidates or between the candidates and the beginning or end of the alphabet, respectively.

41. A method as in innovation 28 wherein:
- the input signal represents the utterance of one or more sequential numbers; and
  
  the choice list is a numerically ordered list of recognition candidates displayed as numbers.

42. A method as in innovation 30 wherein:
- said input signal represents the utterance of a phone number;
  
  said word recognition is speech recognition; and
  
  said responding to a user selection of a recognition candidate causes the phone number displayed for the selected recognition candidate to be automatically dialed.

43. A method as in innovation 28 wherein:
- the input signal represents the utterance of one or more names from contact information; and
  
  the choice list represents a plurality of best scoring names from the contact information, ordered alphabetically.

44. A method as in innovation 43 wherein:
- said choice list includes more recognition candidates than fit on the display at one time; and
  
  the choice list is scrollable, so that a user can select to move the list relative to the display, so as to see more recognition candidates on the list than fit on the display at one time.

45. A method of performing word recognition comprising:
- receiving a word input signal containing non-textual user input representing a sequence of one or more words;
  
  performing word recognition upon the input signal to produce a choice list of best scoring recognition candidates, each comprised of a sequence of one or more words and/or numbers, found by the recognizer to have a relatively high probability of corresponding to the input signal;
  
  showing the choice list in a user scrollable display, with the choice list having more recognition candidates than fit on the display at one time so that only a sub-portion of the choice list is displayed at one time;
  
  responding to user input selecting to scroll the choice list up or down by moving the choice list relative to the display up or down, respectively, so as to change the portion of the choice list shown on the display.

46. A method as in innovation 45 wherein the word input signal is a representation of an utterance of a spoken word and the word recognition is speech recognition.

47. A method as in innovation 45 wherein:
- said user input selecting to scroll the choice list up or down includes a multiple-candidate scroll input; and
  
  said responding to user input includes responding to each multiple-candidate scroll input by moving the choice list up or down relative to the display by multiple recognition candidates.

48. A method as in innovation 45 wherein:
- the method is performed on a cell phone; and
  
  the display is the display of a cell phone.

49. A method as in innovation 48 wherein:
- the showing of the choice list on cell display includes displaying different number in association with each recognition candidate in the portion of the choice list shown on the display at one time;
  
  providing a user interface which enables a user to select one of the recognition candidates from the choice list by pressing a numbered phone key on said cell phone corresponding to a desired recognition candidate; and
  
  responding to a user selection of one of the recognition candidates from the choice list by treating the selected candidate as the one or more words and/or numbers that correspond to the word input signal.

50. A method as in innovation 45 wherein:
- each recognition candidate has associated with it a character string; and
  
  the recognition candidates in the scrollable choice list are ordered by the character ordering in which their respective character strings occur.

51. A method as in innovation 45 wherein the recognition candidates in the scrollable choice list are ordered by their recognition score against the word signal.

52. A method as in innovation 45 further including responding to user input selecting to scroll the choice list right or left by moving the choice list relative to the display right or left, respectively, so as to change the portion of individual choices in the choice list that are shown on the display.

53. A method of performing word recognition comprising:
- receiving a word input signal containing non-textual user input representing a sequence of one or more words;
  
  receiving a sequence of one or more filter input signal, each containing non-textual user input representing a sequence of one or more characters;
  
  responding to the one or more filter input signals by producing a filter, representing one or more possible character sequences, each having one or more characters, found to have possibly corresponded to the filter input signal;
  
  generating a list of recognition candidates starting with a one of the character sequences represented by the filter, including one or more candidate from word recognition of the input signal when one or more such word recognition candidates starting with a one of the character sequences represented by the filter have a recognition probability above a certain minimum level;
  
  producing user-perceivable output representing;
  
  said list of best scoring recognition candidates; and
  
  a character sequence represented by said filter corresponding to the initial characters of one of the list of best scoring recognition candidates;
  
  enabling a user to select one of the recognition candidates from said list and/or to select a character from said filter;
  
  responding to selection of one of the recognition candidates from the choice list by treating the selected candidate as the one or more words that correspond to the word input signal;
  
  responding to selection of a filter character by displaying a choice list of other characters in the possible character sequences represented by the filter that correspond to the selected character'"'"'s position to the user-perceivable filter;
  
  enabling a user to choose one of the characters in the character choice list;
  
  responding to a choice of a character in the character choice list by;
  
  limiting the possible character sequences represented by the filter to ones having the chosen character in the selected character'"'"'s position; and
  
  repeated said generation of a list of recognition candidates using the filter as limited by the chosen character.

54. A method as in innovation 53 wherein the limiting of the possible character sequences represented by the filter includes limiting such character sequences to ones having the characters, if any, that occur before the selected character in the user-perceivable filter.

55. A method as in innovation 53 wherein:
- said generation of a list of recognition candidates limits the recognition candidates to those starting with only a single character sequence represented by the filter; and
  
  the user-perceivable output representing said candidate list includes said single character sequence as the user-perceivable filter.

56. A method as in innovation 53 wherein said generation of a list of recognition candidates limits the recognition candidates to those starting with any of a plurality of character sequences represented by the filter.

57. A method as in innovation 53 wherein:
- the filter input signals correspond to a sequence of one or more phone key presses, where each pressed phone key has an associated set of letters; and
  
  the responding to the filter input signals produces a filter representing one or more sequences of characters, where each such sequence has one character for each such key press, with each such character corresponding to one of the set of letters associated with the corresponding key press.

58. A method as in innovation 53 wherein:
- the filter input signals correspond to a sequence of one or more utterances each of a sequence of one or more letter indications; and
  
  the responding to the filter input signals includes performing speech recognition upon the sequence of one or more utterances to produce a filter representing a one or more sequences of characters corresponding to the characters recognized from said utterances.

59. A method of performing word recognition comprising:
- receiving a word input signal containing non-textual user input representing a sequence of one or more words;
  
  performing word recognition upon the input signal to produce a choice list of best scoring recognition candidates, each comprised of a sequence of one or more words and/or numbers, found by the recognizer to have a relatively high probability of corresponding to the input signal;
  
  showing the choice list in a user scrollable display;
  
  responding to user input selecting to scroll the choice list right or left by moving the choice list relative to the display right or left, respectively, so as to change the portion of individual choices in the choice list that are shown on the display.

60. A method as in innovation 59 wherein said method is practiced on a cell phone and the user input selecting to scroll horizontally is the pressing of a button or key on the cell phone.

61. A method of performing word recognition comprising:
- receiving a word input signal representing one or more words;
  
  performing word recognition upon the signal to produce one or more best scoring words corresponding to the word input signal;
  
  providing a user interface enabling a user to select from among a plurality of word transformation commands each having different type of transformation associated with it;
  
  responding to the user'"'"'s selection of one of the word transformation commands by transforming a currently selected word to a corresponding, but different, word spelled with a different sequence of letters from a though z using the selected command'"'"'s associated transformation.

62. A method as in innovation 61 wherein at least one of the word transformation commands transforms the currently selected word to a different grammatical form.

63. A method as in innovation 62 wherein at least one of the word transformation commands transforms the currently selected word to a different tense.

64. A method as in innovation 62 wherein at least one of the word transformation commands transforms the currently selected word to a plural or singular form.

65. A method as in innovation 62 wherein at least one of the word transformation commands transforms the currently selected word to a possessive or non-possessive form.

66. A method as in innovation 61 wherein at least one of the word transformation commands transforms the currently selected word to a homonym of the selected word.

67. A method as in innovation 61 wherein at least one of the word transformation commands transforms the currently selected word by changing its ending to one of a set of common word endings.

68. A method as in innovation 61 wherein the word recognition produces a choice list of best scoring recognition candidates, each comprised of one or more words, found by the recognizer to have a relatively high probability of corresponding to the word signal;
- and the user interface outputs the recognition candidates of the choice list in user perceivable form; and
  
  the user interface enables a user to select a choice from one of the recognition candidates output on the choice list and to select have a selected one of the transformation commands performed upon the selected choice, and to have the resulting transformed word produced as output of the recognition process.

69. A method as in innovation 61 wherein the word recognition is speech recognition performed on a telephone;
- and the user interface enables a user to select a selected one of the transformation commands by pressing a phone key.

70. A method of performing word recognition comprising:
- receiving a word input signal representing one or more words;
  
  performing word recognition upon the signal to produce one or more best scoring words corresponding to the word input signal;
  
  providing a user interface enabling a user to select from among a plurality of word transformation commands;
  
  responding to the user'"'"'s selection of one of the word transformation commands by transforming a currently selected word between an alphabetic representation and a non-alphabetic representation.

71. A method as in innovation 71 wherein the word recognition produces a choice list of best scoring recognition candidates, each comprised of one or more words, found by the recognizer to have a relatively high probability of corresponding to the signal;
- and the user interface outputs the recognition candidates of the choice list in user perceivable form; and
  
  the user interface enables a user to select a word from one of the recognition candidates output on the choice and to select have the transforming for changing between an alphabetic and a non-alphabetic representation performed upon that selected word, and to have the resulting transformed word produced as output of the recognition process.

72. A method of performing word recognition comprising:
- receiving a word input signal representing one or more words;
  
  performing word recognition upon the signal to produce one or more best scoring words corresponding to the word input signal;
  
  providing a user interface enabling a user to select to display of list of transformations upon a word produced by said recognition;
  
  responding to the user'"'"'s selection by producing a choice list of said transformed words corresponding to the recognized word;
  
  the user interface enables a user to select one of the transformed words in the choice list; and
  
  responding to the selection of a transformed word by having the selected transformed word produced as output of the recognition process.

73. A method a in innovation 72 wherein:
- the choice list of transformed words is shown on a user scrollable display, with the choice list having more transformed words than fit on the display at one time so that only a sub-portion of the choice list is displayed at one time;
  
  responding to user input selecting to scroll the choice list up or down by moving the choice list relative to the display up or down, respectively, so as to change the portion of the choice list shown on the display.

74. A method as in innovation 72 wherein the user interface:
- places words output by the recognition process into a text; and
  
  allows the user to select from among one or more words in the text the word for which the transformation choice list is to be produced.

75. A method as in innovation 72 wherein the user interface:
- produces a choice list of best scoring word candidates from a word recognition; and
  
  allows the user to select from among one or more words in the best scoring choice list the word for which the transformation choice list is to be produced.

76. A method as in innovation 72 wherein the words in the transformed word list include the one or more homonyms, if any, of the word for which the transformation choice list is produced.

77. A method as in innovation 72 wherein the words in the transformed word list include one or more different representations, if any, of the word for which the transformation choice list is produced.

78. A method as in innovation 72 wherein the words in the transformed word list include one or more different grammatical forms, if any, of the word for which the transformation choice list is produced.

79. A method of performing word recognition comprising:
- responding to a command input from a user to start recognition by;
  
  turning large vocabulary speech recognition on after the receipt of the command;
  
  subsequently automatically turning the large vocabulary speech recognition off and leaving it off until receiving another command input from a user to start recognition.

80. A method as in innovation 79 wherein the turning off a speech recognition occurs automatically after the lapsing of the given period of time.

81. A method as in innovation 79 wherein the turning off a speech recognition occurs automatically after the detection of the first end of utterance after the turning on of the speech recognition.

82. A method as in innovation 79 wherein the command input which causes the turning on of speech recognition is a non-acoustic input.

83. A method as in innovation 82 wherein the speech recognition is turned off in response to the next end of utterance detection made by the speech recognition and is left off until the next non-acoustic user input to start recognition.

84. A method as in innovation 83 wherein the speech recognition is continuous speech recognition.

85. A method as in innovation 83 wherein the speech recognition is discrete speech recognition.

86. A method as in innovation 83 further comprising:
- outputting a user perceivable representation of the one or more words recognized as a best choice for the utterance preceding the end of utterance detection;
  
  providing a user interface allowing a user to provide correction input to correct errors in the best choice output in response to the recognition of an utterance;
  
  responding to receipt of a start recognition command input after the outputting of the best choice recognized for an utterance before any correction input has been received for said best choice by;
  
  confirming said best choice as correct; and
  
  repeating said method again for a new utterance starting with receipt of the start recognition command.

87. A method as in innovation 86 further including responding to such a confirmation of an utterances by including one or more of the recognized words as being part of the current context used to calculate a language model score for subsequent speech recognition.

88. A method as in innovation 86 further include responding to such a confirmation of an utterances by using one or more of the recognized words as data for altering the language model.

89. A method as in innovation 86 further including responding to such a confirmation of an utterance as corresponding to a given recognized word by labeling acoustic data from the utterance for use in updating one or more acoustic models used in the recognition in of the given recognized word.

90. A method as in innovation 83 further including allowing a user to select between a first mode in which recognition turns off after the next end of utterance detected after receiving the non-acoustic input, and second mode which does not turn off recognition after said next end of utterance detection.

91. A method as in innovation 90 wherein, in said second mode, recognition is automatically turned off in response to lapse of time longer than the normal lapse between utterances in conversation.

92. A method as in innovation 83 wherein:
- the method is performed by software running on a handheld computing device; and
  
  the non-acoustic input is the pressing of a button, including a GUI button.

93. A method as in innovation 92 wherein the handheld computing device is a cellphone;
- and the buttons are cellphone buttons.

94. A method as in innovation 83 wherein the method is performed by software running on a computer which is part of an automotive vehicle.

95. A method as in innovation 82 wherein the start recognition command input is the pressing of a hardware or software button;
- and the recognition is automatically turned off within less than a second after the pressing of the button ceases.

96. A method as in innovation 82 wherein:
- said method provides a user interface having a plurality of speech mode selection buttons, each for selecting a different speech recognition mode, available for selection by the user at one time; and
  
  the non-acoustic input which causes the turning of speech recognition is the pressing of one of said buttons; and
  
  the method responds to the pressing of a speech mode button by turning on speech recognition in its associated mode and subsequently automatically turning of said recognition.

97. A method as in innovation 96 wherein:
- the speech recognition mode associated with one of said buttons is said large vocabulary recognition;
  
  the recognition mode associated with another of said buttons is a mode which performs recognition with a vocabulary for alphabetic entry.

98. A method as in innovation 96 wherein:
- the speech recognition mode associated with one of said buttons is continuous recognition;
  
  the recognition mode associated with another of said buttons is discrete recognition.

99. A method as in innovation 96 wherein the handheld computing device is a cellphone;
- and the buttons are cellphone buttons.

100. A method of speech recognition comprising:
- providing a user interface which provides a button which responds to touch lasting less than a first duration as a click, and a touch lasting longer than a second duration as a press;
  
  responding to a press by causing speech recognition to be performed on sound for a duration that varies as a function of the length of the press; and
  
  responding to a click by causing speech recognition to be performed on sound for a duration that is independent of the length of the click.

101. A method as in innovation 100 wherein:
- said responding to a click causes speech recognition to be performed on sound received from substantially the time of the click until the next end of utterance detected; and
  
  said responding to a press causes speech recognition to be performed on sound received during the period of the press.

102. A method as in innovation 101 wherein recognition performed in response to a click is discrete recognition and recognition performed in response to a press is continuous recognition.

103. A method as in innovation 102 wherein the user interface allows a user to select between:
- a mode in which recognition in response to a click and recognition in response to a press are both either continuous or discrete; and
  
  a mode wherein recognition performed in response to a click is discrete recognition and recognition performed in response to a press is continuous.

104. A method as in innovation 100 wherein:
- said responding to a click causes speech recognition to be performed on sound received from substantially the time of the click for a period of at least one minute; and
  
  said responding to a press causes speech recognition to be performed on sound received during the period of the press and for not more than one second afterward.

105. A method as in innovation 100 wherein:
- the user interface has a plurality of speech mode selection buttons, each for selecting a different speech recognition mode, available for selection by the user at one time;
  
  the user interface responds to a touch of each of the mode selection buttons lasting less than a first duration as a click, and a touch of such a button lasting longer than a second duration as a press;
  
  the method responds to a press of a mode button by causing speech recognition to be performed in the button'"'"'s associated mode on sound for a duration that varies as a function of the length of the press; and
  
  responding to a click of a mode button by causing speech recognition to be performed in the button'"'"'s associated mode on sound for a duration that is independent of the length of the click.

106. A method as in innovation 105 wherein:
- the recognition mode associated with a first of said mode buttons is a mode which performs recognition with a large vocabulary; and
  
  the recognition mode associated with a second of said mode buttons is a mode which performs recognition with an alphabetic entry vocabulary.

107. A method as in innovation 105 wherein the speech recognition mode associated with one of said mode buttons is continuous recognition;
- and the recognition mode associated with another of said mode buttons is discrete recognition.

108. A method as in innovation 105 wherein:
- the method is practiced on a cellphone; and
  
  numbered cellphone buttons act as said mode buttons.

109. A computing device that functions as a telephone comprising:
- a user perceivable output device;
  
  a set of phone keys including at least a standard twelve key phone key pad;
  
  one or more microprocessors;
  
  microprocessor readable memory;
  
  a microphone or audio input from which said telephone can receive electronic representations of sound;
  
  a speaker or audio output for enabling an electric representation of sound produced in said telephone to be transduced into a corresponding sound;
  
  transmitting and receiving circuitry;
  
  programming recorded in the memory including;
  
  telephone programming having instructions for performing telephone functions including making and receiving calls; and
  
  speech recognition programming including instructions for;
  
  performing large vocabulary speech recognition upon an electronic representations of sound received from the microphone or microphone input; and
  
  responding to presses of one or more of the phone keys to control the operation of the speech recognition.

110. A a computing device as in innovation 109 wherein the device is a cellphone.

111. A computing device as in innovation 109 wherein the device is a cordless phone.

112. A computing device as in innovation 109 wherein the device is a is a landline phone.

113. A computing device as in innovation 109 wherein the speech recognition programming includes instructions for:
- responding to a given utterance by performing speech recognition to produce a choice list of best scoring speech recognition candidates each comprised of one or more words found by the recognizer to have a relatively high probability of corresponding to the given utterance or part of an utterance;
  
  producing user perceivable output indicating a plurality of the choice list candidates and associating a separate phone key with each of such choice; and
  
  responding to a press of a phone key associated with a choice list candidate by selecting the associated candidate as the output for the given utterance.

114. A computing device as in innovation 113 wherein the speech recognition programming includes instructions for using a plurality of numbered phone keys as said phone keys associated with choice list candidates.

115. A computing device as in innovation 114 wherein the speech recognition programming includes instructions for, at the same time some of the numbered phone keys are associated with choice list candidates, using other numbered phone keys for other speech recognition functions.

116. A computing device as in innovation 113 wherein the speech recognition programming includes instructions for:
- operating in a first mode which responds to presses of each of a set of phone key by selecting an associated choice list candidate; and
  
  operating in a second mode which responds to presses of each of the same set of phone key as a letter identification input.

117. A computing device as in innovation 116 wherein the speech recognition programming includes instructions for using said letter identifications for alphabetic filtering of the choice list.

118. A computing device as in innovation 109 wherein the speech recognition programming includes instructions for:
- producing a recognition output corresponding to a sequence of one or more recognized words in response to the recognition of a given utterance;
  
  placing the recognition output into a text sequence previously containing a sequence of zero or more words stored in the memory at a current cursor location in the text sequence; and
  
  moving the cursor location forward and backward, respectively, in the text sequence in response to the pressing of different ones out of the phone keys.

119. A computing device as in innovation 118 wherein the instructions for moving the current text location include instructions for moving the current text location forward and backward a whole word at time, respectively, in response to the pressing of one of two phone keys associated with word-at-a-time motion, one associated with word forward motion and one associated with word backward motion.

120. A computing device as in innovation 119 wherein the instructions for moving the current text location forward and backward a whole word at a time includes instructions for:
- responding, under a first condition, to the pressing of the key associated with word forward or backward motion, respectively, by selecting the whole word after or before the prior cursor location; and
  
  responding, under a second condition, to the pressing of the key associated with word forward or backward motion by placing a non-selection cursor immediately behind or before, respectively, the prior cursor location;
  
  whereby the same two keys can be used to move a word at a time in text, and either to make the cursor correspond to the selection of a whole word or non-selection cursor before or after a word.

121. A computing device as in innovation 120 wherein said second condition includes one in which the pressing of one of said word-at-a-time keys is received as the next input after the pressing of the other of said two word-at-a-time keys.

122. A computing device as in innovation 118 wherein:
- the user perceivable output device is a display;
  
  the speech recognition programming includes instructions for displaying all or a portion of the text sequence across successive lines on the display; and
  
  the instructions for moving the current text location include instructions for moving the current text location up a line and down a line, respectively, in response to the pressing of different ones of the phone keys.

123. A computing device as in innovation 118 wherein the instructions for moving the current text location include instructions for moving the current text location to the start and to the end of a sequence of words including all or part of the words in the text sequence, respectively, in response to the pressing of different ones of the phone keys.

124. A computing device as in innovation 118 wherein the speech recognition programming includes instructions for:
- responding to the press of one phone key by starting an extendable selection at the current text location; and
  
  responding to the pressing of different ones of the phone keys associated with moving the current text location forward and backward, respectively, by extending the selection forward and backward, respectively, by the amount associated with such keys.

125. A computing device as in innovation 118 wherein the programming includes instructions for generating an audio output by a text-to-speech process of one or more words at the current text location after that current location has been moved in response to the pressing of one of the phone keys.

126. A computing device as in innovation 118 wherein:
- the user perceivable output device is a display;
  
  the speech recognition programming includes instructions for showing on the display one or more words at the current location after that current location has been moved in response to the pressing of one of the phone keys.

127. A computing device as in innovation 109 wherein the speech recognition programming includes instructions for responding to a selection of a given one of the phone keys by entering a help mode which responds to a subsequent phone key press by provided in user perceivable form an explanation about the function associated with the subsequently pressed phone key before entering the help mode.

128. A computing device as in innovation 127 wherein:
- the instructions for responding to presses of one or more phone keys to control operation of speech recognition define a hierarchical command structure in which a user can navigate and select commands by a sequence of one or more phone keys; and
  
  the instructions for entering a help mode include instructions for responding to a each key press in a sequence of two or more key presses after entering said help mode by providing, in user perceivable form, an explanation about the function the key press would have in a similar sequence of key press in the hierarchical command structure if that key sequence had been entered before entering the help mode.

129. A computing device as in innovation 109 wherein the speech recognition programming includes instructions for responding to a pressing of a first phone key by outputting a user perceivable list indicating the functions associated with each of a plurality of individual phone keys at the current time.

130. A computing device as in innovation 129 wherein the user perceivable output includes the generation of an audio signal saying the list of function indications.

131. A computing device as in innovation 129 wherein:
- the phone keys include said first key and a set of one or more navigation keys; and
  
  the speech recognition programming includes instructions for operating in a text mode where;
  
  the navigation keys allow user perceivable navigation of recognized text;
  
  other phone keys have a set of functions mapped to them for controlling entry and editing of said text; and
  
  a press of the first key is responded to by entering command list mode where navigation keys allow user perceivable navigation of a list of the functions associated with each of a plurality of phone keys in the text mode.

132. A computing device as in innovation 131 wherein:
- the command list mode'"'"'s user-perceivable list of functions include the associations of phone key numbers with a plurality of functions in the list; and
  
  speech recognition programming includes instructions for responding to pressing of a numbered phone key associated with a function in said list during operation of the command list mode by returning to the text mode and selecting its associated function.

133. A computing device as in innovation 131 wherein:
- the speech recognition programming includes instructions for use in the command list mode for;
  
  responding to one or more presses of navigational keys by moving a function selection relative to the user-perceivable list of functions; and
  
  responding to a press of a selection phone key by returning to the text mode and selecting its associated function.

134. A computing device as in innovation 133 wherein the command list includes functions in addition to those that can be selected by pressing of phone keys in the text mode, which additional functions can be selected in the command list mode by said navigation and selection.

135. A computing device as in innovation 133 wherein:
- the command list lists functions that are associated with the navigation keys in the text mode;
  
  said text-mode navigational key functions are different than those associated with the navigation keys in command list mode; and
  
  the text mode navigational key functions can be selected in the command list mode by said navigation and selection.

136. A computing device as in innovation 131 wherein:
- said phone keys include a menu key;
  
  said programming recorded in the memory includes instructions for responding to a press of the menu key in each of a plurality modes other than said text mode by displaying a list of functions selectable by phone key that were not selectable by the same phone keys immediately before the pressing of the menu key; and
  
  said first key used in said text mode to select the command list mode is the menu key.

137. A computing device as in innovation 109 wherein the speech recognition programming includes instructions for operating in a text mode during which:
- the navigation keys allow user perceivable navigation of recognized text; and
  
  a plurality of the numbered phone keys function at one time as key mapping keys, each of which selects a different key mapping mode that maps a different set of functions to a plurality of said numbered phone keys;
  
  whereby a user can quickly select a desired key mapping from a plurality of such mappings by pressing a numbered phone keys, greatly increasing the speed with which the user can select one from a among a relatively large number of commands from the text mode.

138. A computing device as in innovation 137 wherein the speech recognition programming includes instructions for responding to the pressing of one of said key mapping keys by entering an associated menu mode where navigation keys allow user-perceivable navigation of a menu that indicates the functions associated with each of a plurality of numbered phone keys in the pressed mapping key'"'"'s associated key mapping mode.

139. A method of performing large vocabulary speech recognition comprising:
- receiving a filtering sequence of one or more key-press signals each of which indicates which of a plurality of keys has been selected by a user, where each of the keys represents two or more letters;
  
  receiving an acoustic representation of a sound;
  
  performing speech recognition upon the acoustic representation which scores word candidates as a function of the match between the acoustic representation of the sound and acoustic models of words;
  
  wherein;
  
  the scoring of word candidates favors word candidates containing a sequence of one or more alphabetic characters corresponding to the filtering sequence of key-press signals, where a candidate word is considered to contain a characters sequence corresponding to the filtering sequence if each sequential character in the characters sequence corresponds to one of the letters represented by it corresponding sequential key-press signal.

140. A method as in innovation 139 further including:
- responding to an additional utterance made in association with a given key press signal in said filtering sequence by performing speech recognition upon the associated utterance; and
  
  responding to the recognition of the key press'"'"'s associated utterance as a letter identifying word by causing the set of letters represented by the key press in the filtering sequence to be limited to a letter identified by the recognized letter identifying word.

141. A method as in innovation 140 further including:
- responding to a key press signal by displaying in user-perceivable form a set of words containing one or more words starting with each letter represented by the pressed key; and
  
  favoring the recognition of an utterance made after the display of the pressed key'"'"'s associated letter identifying words as corresponding to one of said displayed words.

142. A method as in innovation 139 further including providing a user interface which:
- outputs a plurality of the word candidates produced by said speech recognition in a user-perceivable form in a choice list; and
  
  allows a user to select one of the output candidates as the desired words; and
  
  responding to the user selection of one of the output candidates by selecting it as the recognized word for the recognition.

143. A method as in innovation 139 wherein said receiving of a filtering sequence and said performing of speech recognition favoring candidates containing characters corresponding to the filter sequence can be performed repeatedly for a given acoustic representation in response to the receipt of successive key-press signals in said filtering sequence.

144. A method as in innovation 139 wherein the preferential scoring of word candidates is performed by selecting from word candidates previously selected by the recognition process those candidates that contain a sequence of one or more characters corresponding the filtering sequence.

145. A method as in innovation 139 wherein the preferential scoring of word candidates is performed by performing the speech recognition upon the acoustic representation a second time during which word candidates are favored which contain a sequence of one or more characters corresponding to the received filtering sequence.

146. A method as in innovation 139 wherein the sequence of key press signals is received before the initial recognition of the acoustic representation is complete and the alphabetic favoring of word candidates its performed during the initial recognition.

147. A method as in innovation 139 wherein the method is performed by software running on a telephone and the keys are keys of a telephone keypad

148. A method as in innovation 139 wherein the telephone is a cell phone.

149. A method as in innovation 139 wherein the preferential scoring of word candidates is performed by performing the speech recognition upon an acoustic representation of a second utterance of the desired word in which word candidates are favored which contain a sequence of one or more characters corresponding to the received filtering sequence.

150. A method as in innovation 149 wherein the preferential scoring of word candidates is performed by scoring word candidate against both the original and second utterance of a desired word.

151. A method as in innovation 139 wherein the scoring of word candidates not only favors word candidates containing a sequence of one or more alphabetic characters corresponding to the filtering sequence, but also language models scores.

152. A method as in innovation 151 wherein the language models used in conjunction with such filtering sequences in the scoring of word candidates are context dependent language models.

153. A method of performing large vocabulary speech recognition comprising:
- receiving a key-press sequence of one or more telephone key-press signals, each of which indicates which of a plurality of keys has been selected by a user;
  
  decoding the key-press sequence by using the number of presses of a given key which occur within a given time of each other to select which of multiple letters associated with the given key as a desired letter;
  
  storing the sequence of one or more letters decoded from said key-press sequence as an alphabetic filtering sequence;
  
  receiving an acoustic representation of a sound;
  
  performing speech recognition upon the acoustic representation which scores word candidates as a function of the match between the acoustic representation of the sound and acoustic models of words;
  
  wherein;
  
  the scoring of word candidates favors word candidates containing a sequence of one or more alphabetic characters corresponding to the letters of said alphabetic filtering sequence.

154. A method of performing large vocabulary speech recognition to input a sequence of one or more alphabetic characters comprising:
- pressing a sequence of one or more selected phone keys, each of which represents two or more letters;
  
  uttering a corresponding sequence of one or more letter identifying word;
  
  performing speech recognition upon the utterance of each of the letter identifying words, with the recognition of each such utterance favoring the recognition of a letter identifying words identifying one of the two or more letters represented by the utterance'"'"'s associated phone key; and
  
  treating the sequence of one or more letters identified by the letter identifying word associated with each phone key press as alphabetic input from the user.

155. A method as in innovation 154 wherein:
- the method is used in conjunction with a large vocabulary recognition system; and
  
  a majority of the words which starts with a given letter in the vocabulary of the large vocabulary recognition system can be used as a letter identifying word for the given letter.

156. A method as in innovation 154 wherein:
- the letter identifying word associated with each of a majority of letters belongs to a limited set of five or less letter identifying words which starts with that given letter; and
  
  the recognition of an utterance of a letter identifying words favors the recognition of a one of the limited set of letter identifying words identifying one of the two or more letters represented by the utterance'"'"'s associated phone key.

157. A method as in innovation 156 further including:
- responding to a key press signal by displaying in user-perceivable form a set of letter identifying words containing one or more words starting with each letter represented by the pressed key; and
  
  favoring the recognition of an utterance made after the display of the pressed key'"'"'s associated letter identifying words as corresponding to one of said displayed words.

158. A method as in innovation 156 wherein:
- the method is performed on a telephone having a display; and
  
  the outputting of the subset of letter identifying words is performed by displaying such words on the telephone'"'"'s display.

159. A method of performing large vocabulary speech recognition on a device having telephone keys, said method comprising:
- performing large vocabulary speech recognition upon one or more utterances to produce a corresponding output text containing one or more words which have been recognized by said speech recognition;
  
  receiving a sequence of one or more phone key presses signals and interpreting said sequence of presses as corresponding to a sequence of one or more alphabetic characters; and
  
  outputting said sequence of one or more alphabetic characters into said output text.

160. A method as in innovation 159 wherein the telephone is a cellphone.

161. A method as in innovation 159 wherein:
- the sequence of one or more key-press signals, by itself, is treated by the process as being ambiguous, in a sense that individual key press signals each represents two or more letters; and
  
  information from sources other than such key presses are used to select which of the one or more letters associated with a key press in the sequence is to be interpreted as corresponding to each such key press.

162. A method as in innovation 161 wherein the information from sources other than such key presses includes language model information.

163. A method as in innovation 162 wherein the information from sources other than such key presses includes context dependent language model information.

164. A method as in innovation 159:
- wherein the sequence of one or more key-press signals, by itself, is treated by the process as being ambiguous, in a sense that individual key press signals each represents two or more letters; and
  
  further including;
  
  outputting a plurality of the word candidates whose spellings correspond to the key-press signal in a user-perceivable form in a choice lists;
  
  allowing a user to select one of the output candidates as the desired words; and
  
  responding to the user selection of one of the output candidates by selecting is as the recognized word for the recognition.

165. A method as in innovation 159 wherein the interpretation of the sequence of key presses includes decoding the key-press sequence by using the number of presses of a given key which occur within a given time of each other to select which of the multiple letters associated with the given key as a desired letter.

166. A method of speech recognition comprising:
- receiving an original utterance of one or more words;
  
  performing an original speech recognition upon the original utterance;
  
  producing a user perceivable output representing one or more sequences of one or more words selected by the recognition as most likely corresponding to the utterance;
  
  providing a user interface that allows a user to select to perform a re-utterance recognition upon a part of the original utterance corresponding to all or a selected part of the user perceivable output; and
  
  responding to a user selection to perform a re-utterance recognition upon all or a part of the original utterance by;
  
  treating a second utterance received in association with the selection as a re-utterance of the selected portion of the original utterance; and
  
  performing speech recognition upon the re-utterance to select one or more sequences of one or more words considered to most likely match the re-utterance based on the scoring of the one or more words against both the re-utterance and the selected portion of the original utterance.

167. A method as in innovation 166 wherein:
- the original recognition of the original utterance is by continuous speech recognition; and
  
  the re-utterance is recognized by discrete speech recognition.

168. A method as in innovation 167 wherein the number of utterances detected with a re-utterance recognized by discrete recognition is used to determine the number of words allowable in sequences of one or more words recognized for the original utterance after the re-utterance.

169. A method as in innovation 166 wherein both the original utterance and the re-utterance are recognized by discrete speech recognition.

170. A method as in innovation 166 wherein both the original utterance and the re-utterance are recognized by continuous speech recognition.

171. A method as in innovation 166 wherein the selection of a sequences of one or more words considered to most likely match both the re-utterance and the selected portion of the original utterance is used to update acoustic models with data from the selected portion of the original utterance.

172. A method as in innovation 166 wherein:
- the user interface allows a user to select one or more word filtering inputs, each indicating that the desired output has certain characteristics, to be used in conjunction with the re-utterance recognition; and
  
  the process of selecting of one or more sequences as most likely matching both the re-utterance and the original utterance also uses the selected filtering inputs to favor the selection of any recognition candidates having the selected characteristics.

173. A method as in innovation 172 wherein the user interface allows a user to select alphabetic filtering inputs indicating that the desired output contains a word containing a sequence of one or more specified letters.

174. A computing device for performing large vocabulary speech recognition comprising microprocessor readable memory;
- a microphone or audio input for providing an electronic signal representing an utterance to be recognized;
  
  a speaker or audio output for enabling an electric representation of sound produced in said telephone to be transduced into a corresponding sound;
  
  programming recorded in the memory including;
  
  speech recognition programming including instructions for speech recognition programming for performing large vocabulary speech recognition that responds to the electronic representations of a sequence of one or more utterances received from the microphone or microphone input by producing a text output corresponding to the one or more words recognized as corresponding to the utterances; and
  
  TTS programming for providing TTS output to said speaker or audio output saying one or more words of said text recognized by the speech recognition for the utterance;
  
  shared speech modeling data stored in the memory that are used by both said speech recognition programming to recognize words corresponding to spoken utterances and by said TTS programming to generate sounds corresponding to the speaking of a sequence of one or more words.

175. A computing device as in innovation 174 wherein said shared speech modeling data includes letter to sound rules.

176. A computing device as in innovation 174 wherein said shared speech modeling data includes a mapping between a words and one or more phonetic spellings for each of at least several thousand vocabulary words.

177. A computing device as in innovation 176 wherein said mappings include an indication of the different phonetic spellings appropriate for certain words when they occur as different parts of speech.

178. A computing device as in innovation 177 wherein said shared speech modeling data includes language modeling information indicating which parts of speech for one or more words are more likely to occur in a given language context.

179. A computing device as in innovation 174 wherein the device is a handheld device.

180. A computing device as in innovation 179 wherein the device is a cell phone.

181. A computing device for performing large vocabulary speech recognition comprising microprocessor readable memory;
- a microphone or audio input for providing an electronic signal representing an utterance to be recognized;
  
  a speaker or audio output for enabling an electric representation of sound produced in said telephone to be transduced into a corresponding sound;
  
  programming recorded in the memory including speech recognition programming including instructions for;
  
  performing large vocabulary speech recognition upon an electronic representations of utterances received from the microphone or microphone input to produce a text output;
  
  providing TTS output to said speaker or audio output saying one or more words of said text output;
  
  recognizing utterances which are voice commands as commands;
  
  providing TTS or recorded audio output to said speaker or audio output saying the name of a recognized command.

182. A computing device as in innovation 181 wherein the device is a handheld device.

183. A computing device as in innovation 182 wherein the device is a cell phone.

184. A computing device for performing large vocabulary speech recognition comprising microprocessor readable memory;
- a microphone or audio input for providing an electronic signal representing an utterance to be recognized;
  
  a speaker or audio output for enabling an electric representation of sound produced in said telephone to be transduced into a corresponding sound;
  
  programming recorded in the memory including speech recognition programming including instructions for performing large vocabulary speech recognition that responds to the electronic representations of each of a sequence of one or more utterances received from the microphone or microphone input by;
  
  producing a text output corresponding to one or more words recognized as corresponding to the utterance; and
  
  then providing TTS output to said speaker or audio output saying one or more words of said text recognized by the speech recognition for the utterance.

185. A computing device as in innovation 184 wherein said speech recognition is discrete speech recognition and said TTS output says the text word which is recognized in response to each utterance.

186. A computing device as in innovation 184 wherein said speech recognition is continuous speech recognition and said TTS output says the one or more text words recognized in response to each utterance after the end of the utterance.

187. A computing device as in innovation 184 wherein the device is a handheld device.

188. A computing device as in innovation 187 wherein the device is a cell phone.

189. A computing device for performing large vocabulary speech recognition comprising microprocessor readable memory;
- a microphone or audio input for providing an electronic signal representing an utterance to be recognized;
  
  a speaker or audio output for enabling an electric representation of sound produced in said telephone to be transduced into a corresponding sound;
  
  programming recorded in the memory including speech recognition programming including instructions for;
  
  performing large vocabulary speech recognition upon an electronic representation of utterances received from the microphone or microphone input to produce a text output;
  
  responding to text navigation commands by moving a cursor backward and forward in the one or more words of said text output;
  
  responding to each movement in response to one of said navigational commands by providing a TTS output to said speaker or audio output saying one or more words either starting or ending with the location of the cursor after of said movement.

190. A computing device as in innovation 189 wherein said programming further includes instructions for responding to a selection expansion command by:
- recording the cursor location at the time the command is received as a selection start;
  
  starting a selection at the selection start; and
  
  entering a selection expansion mode in which the response to one of said navigational commands further includes causing the selection to extend from the selection start to the cursor location after the cursor movement made in response to said navigation command.

191. A computing device as in innovation 190 wherein said programming further includes instructions for responding to a play selection command by providing a TTS output to said speaker or audio output saying the one or more words that are currently in the selection.

192. A computing device as in innovation 189 wherein said saying of one or more words starts speaking words of said text starting at the current cursor position and continues speaking them until an end of a unit of text larger than a word is reached or until a user input is received to terminate such playback.

193. A computing device as in innovation 189 wherein the device is a handheld device.

194. A computing device as in innovation 193 wherein the device is a cell phone.

195. A computing device for performing large vocabulary speech recognition comprising microprocessor readable memory;
- a microphone or audio input for providing an electronic signal representing an utterance to be recognized;
  
  a speaker or audio output for enabling an electric representation of sound produced in said telephone to be transduced into a corresponding sound;
  
  programming recorded in the memory including speech recognition programming including instructions for;
  
  performing large vocabulary speech recognition upon an electronic representations of uttered sound received from the microphone or microphone input to produce a choice list of recognition candidates, each comprised of a sequence of one or more words, selected by the recognition as scoring best against said uttered sound;
  
  providing spoken output to said speaker or audio output saying the one or more words of one of the recognition candidates in the choice list.

196. A computing device as in innovation 195 wherein said programming includes instructions for:
- responding to choice navigation commands by moving which of the recognition candidates in the list of choices is currently selected; and
  
  responding to each movement in response to one of said navigational commands by providing spoken output saying the one or more words in the currently selected recognition candidate.

197. A computing device as in innovation 195 wherein:
- said spoken output says the words of a plurality of recognition candidates in said list and contains a spoken indication of a choice input signal associated with each of said plurality of commands; and
  
  said programming further includes instructions for responding to receipt of one of said choice input signal by selecting the associated recognition candidate as the output for said uttered sound.

198. A computing device as in innovation 197 wherein:
- said device has a telephone keypad; and
  
  said choice input signals include phone key numbers; and
  
  said responding to receipt of one of said choice input signal includes responding to the pressing of numbered phone keys as said choice input signals.

199. A computing device as in innovation 197 wherein said spoken output says the best scoring recognition candidate first.

200. A computing device as in innovation 195 wherein said programming includes instructions for responding to the receipt of filtering input by;
- producing a filtered choice list of recognition candidates, each comprised of a sequence of one or more words that agree with said filtering input and which have been selected by the recognition as scoring best against said uttered sound; and
  
  providing spoken output to said speaker or audio output saying the one or more words of one of the recognition candidates in the filtered choice list.

201. A computing device as in innovation 200 wherein said programming further includes instructions for providing spoken output saying the current value of the filter.

202. A computing device as in innovation 201 wherein the filtering input is a sequence of letters and the spoken output says the letters in the filter sequence.

203. A computing device as in innovation 195 wherein the spoken output includes the spelling of one or more choices.

204. A computing device as in innovation 195 wherein the device is a handheld device.

205. A computing device as in innovation 204 wherein the device is a cell phone.

206. A method of word recognition comprising:
- receiving a handwritten representation of all or a part of a given sequence of one or more words to be recognized;
  
  receiving a spoken representation of said sequence of one or more words;
  
  performing handwriting recognition the handwritten representation and speech recognition upon the spoken representation and selecting one or more best scoring recognition candidates each comprised of a sequences of one or more words based on the scoring of recognition candidates against both the handwritten and spoken representations.

207. A method of word recognition comprising:
- receiving a spoken representation of a given sequence of one or more words to be recognized;
  
  receiving a filtering input consisting of handwriting or character drawing input;
  
  using handwriting or character recognition, respectively, to define a filter representing one or more sequences of characters selected by said recognition as most likely corresponding to said filtering input; and
  
  using a combination of said filter and speech recognition performed on said spoken representation to select one or more recognition candidates, each consisting of a sequence of one or more words, selected as a function of the closeness of their match against the spoken representation and whether or not they match one of the one or more character sequences associated said filter.

208. A method as in innovation 207 wherein said filtering input consists of handwriting.

209. A method as in innovation 208 wherein:
- said filter represents a plurality of sequences of characters; and
  
  said selection of recognition candidates selects a plurality of best scoring recognition candidates, different ones of which can match different sequences of characters represented by said filter

210. A method as in innovation 209 wherein said plurality of character sequences represented by one filter and used in said selection of recognition candidates can be of different character length.

211. A method as in innovation 208 wherein:
- said filter represents only one of sequences of characters which is used for filtering; and
  
  said selection of recognition candidates selects a plurality of best scoring recognition candidates, all of which match said one character sequence.

212. A method as in innovation 207 wherein said filtering input consists of one or more separate character drawings.

213. A method as in innovation 212 wherein:
- said filter represents a plurality of sequences of characters; and
  
  said selection of recognition candidates selects a plurality of best scoring recognition candidates, different ones of which can match different sequences of characters represented by said filter

214. A method as in innovation 212 wherein:
- said filter represents only one of sequences of characters which is used for filtering; and
  
  said selection of recognition candidates selects a plurality of best scoring recognition candidates, all of which match said one character sequence.

215. A method as in innovation 207:
- further including;
  
  receiving a spoken representation of a second sequence of one or more words to be recognized;
  
  using speech recognition to output a corresponding sequence of one or more words into a sequential body of text;
  
  responding to user input with the pointing device that touches a sequence of one or more words in said body of text by selecting the touched sequence as a sequence to be correction;
  
  treating the portion of the spoken representation of said second sequence of words as said given sequence of words; and
  
  then receiving said filtering input;
  
  using said handwriting or character recognition to define said filter; and
  
  using said combination of the filter and speech recognition to select one or more recognition candidates.

216. A method of word recognition comprising:
- receiving a handwritten representation of a given sequence of one or more words to be recognized;
  
  receiving a filtering input consisting one or more utterances representing a sequence of one or more letter identifying words;
  
  using speech recognition to define a filter representing one or more sequences of characters selected by said recognition as most likely corresponding to said filtering input; and
  
  using a combination of said filter and handwriting recognition performed on said handwritten representation to select one or more recognition candidates, each consisting of a sequence of one or more words, selected as a function of the closeness of their match against the handwritten representation and whether or not they match one of the one or more character sequences associated said filter.

217. A method as in innovation 216 wherein:
- the filtering input is a sequence of continuously spoken letter identifying words; and
  
  the speech recognition is continuous speech recognition.

218. A method as in innovation 216 wherein:
- the filtering input is a sequence of discretely spoken letter identifying words; and
  
  the speech recognition is discrete speech recognition.

219. A method as in innovation 216 wherein:
- said filter represents a plurality of sequences of characters; and
  
  said selection of recognition candidates selects a plurality of best scoring recognition candidates, different ones of which can match different sequences of characters represented by said filter

220. A method as in innovation 219 wherein said plurality of character sequences represented by one filter and used in said selection of recognition candidates can be of different character length.

221. A method as in innovation 220 wherein:
- the filtering input is a sequence of continuously spoken letter names; and
  
  the speech recognition is continuous speech recognition.

222. A method as in innovation 216 wherein:
- said filter represents only one of sequences of characters which is used for filtering; and
  
  said selection of recognition candidates selects a plurality of best scoring recognition candidates, all of which match said one character sequence.

223. A method as in innovation 216 further including providing a user interface which enables a user to select whether the filtering input is recognized with discrete or continuous recognition.

224. A method as in innovation 216 further including providing a user interface which enables a user to select whether the filtering input is recognized in a mode which favors the recognition of letter names or of non-letter name letter identifying words.

225. A method of word recognition comprising:
- receiving a handwritten representation of a given sequence of one or more words to be recognized;
  
  performing handwriting recognition upon said handwritten representation to produce one or more best scoring recognition candidates, each of which contains one or more words selected as having a likelihood of corresponding to the one or more words of said handwritten representation;
  
  then receiving a spoken representation of a given sequence of one or more words to be recognized;
  
  performing speech recognition upon said spoken representation to produce one or more best scoring recognition candidates, each of which contains one or more words selected as having a likelihood of corresponding to the one or more words of said spoken representation;
  
  using information in one of said speech recognition'"'"'s best scoring candidates to correct the prior recognition of said handwritten representation.

226. A method as in innovation 225 wherein said using of speech recognition information to correct handwriting recognition includes replacing a best scoring recognition candidate produced by the handwriting recognition with a best scoring recognition candidate produced by the speech recognition.

227. A method as in innovation 225 wherein said using of speech recognition information to correct handwriting recognition includes interpreting one of the recognition candidates produced by the speech recognition as a command, and performing said command in corrections of a best scoring recognition candidate produced by the handwriting recognition.

228. A hand held computing device for performing large vocabulary speech recognition comprising:
- one or more processing devices;
  
  memory readable by the processing devices;
  
  a microphone or audio input for providing an electronic signal representing a sound input;
  
  a speaker or audio output for enabling an electric representation of sound produced in said device to be transduced into a corresponding sound;
  
  programming recorded in one or more of the memory devices including;
  
  speech recognition programming for performing large vocabulary speech recognition that responds to the electronic representations of the sound of a sequence of one or more utterances received from the microphone or microphone input by producing a text output corresponding to the one or more words recognized as corresponding to the utterances; and
  
  audio recording programming for recording an electronically readable representation of said sound in one or more of said memory devices; and
  
  audio playback programming for playing back said recorded sound representation and providing a corresponding audio signal to said speaker or audio output;
  
  wherein the devices programming has instructions for enabling a user to select between two of the following three possible modes of recording sound input as it is received;
  
  a first mode that places text output in response to speech recognition of said sound input into a user navigable document at a current cursor location, without a representation of a recording of said sound input;
  
  a second mode that places a representation of a recording of said sound input into said user navigable document at said cursor without text responding to speech recogntion of said sound input; and
  
  a third mode that places text output in response to speech recognition of said sound input into the user navigable document at the current cursor location, with the words of the text output themselves representing the portions a recording of the sound input from which each such word has been recognized; and
  
  wherein the audio playback programming includes instructions for enabling a user to select to play recorded sound represented by the sound representations placed in the document by the second and third recording modes by having the cursor located on such representations when in a playback mode.

229. A device as in innovation 228 wherein the device'"'"'s instructions for enabling a user to select to switch back and forth between the second mode to either the first or third with less than one second'"'"'s delay for each switch.

230. A device as in innovation 228 wherein the device'"'"'s programming further includes instructions for enabling a user to select a portion of audio recorded without corresponding recognition to have speech recognition performed on the selected portion of audio recording so as to produce a text output corresponding to the selected audio.

231. A device as in innovation 228 wherein the device'"'"'s programming further includes instructions for enabling a user to select a sub-portion of text output by speech recognition in the third mode that has recorded sound associated with its words and to have the recorded sound associated with the selected text removed.

232. A device as in innovation 228 wherein the device'"'"'s programming further includes instructions for enabling a user to select a sub-portion of text output by speech recognition in the third mode that has recorded sound associated with its words and to have the selected text removed and to replace its location in the document with the type of representation of the recorded sound produced by recording in the second mode.

233. A device as in innovation 228 wherein the representations of sound placed in the document by the second recording mode are audiographic representations that vary in length as a function of the duration of the respective portions of recorded sound they represent.

234. A computing device as in innovation 228 wherein the device is a handheld device.

235. A computing device as in innovation 234 wherein the device is a cell phone.

236. A hand held computing device for performing large vocabulary speech recognition comprising:
- one or more processing devices;
  
  memory readable by the processing devices;
  
  a microphone or audio input for providing an electronic signal representing a sound input;
  
  a speaker or audio output for enabling an electric representation of sound produced in said device to be transduced into a corresponding sound;
  
  programming recorded in one or more of the memory devices including;
  
  speech recognition programming for performing large vocabulary speech recognition that responds to the electronic representations of the sound of a sequence of one or more utterances received from the microphone or microphone input by producing a text output corresponding to the one or more words recognized as corresponding to the utterances; and
  
  audio recording programming for recording an electronically readable representation of said sound in one or more of said memory devices; and
  
  audio playback programming for playing back said recorded sound representation and providing a corresponding audio signal to said speaker or audio output;
  
  wherein the device'"'"'s programming further includes instructions for enabling a user to select a portion of audio recorded without corresponding recognition and to have speech recognition performed on the selected portion of audio recording so as to produce a text output corresponding to the selected audio.

237. A hand held computing device for performing large vocabulary speech recognition comprising:
- one or more processing devices;
  
  memory readable by the processing devices;
  
  a microphone or audio input for providing an electronic signal representing a sound input;
  
  a speaker or audio output for enabling an electric representation of sound produced in said device to be transduced into a corresponding sound;
  
  programming recorded in one or more of the memory devices including;
  
  speech recognition programming for performing large vocabulary speech recognition that responds to the electronic representations of the sound of a sequence of one or more utterances received from the microphone or microphone input by producing a text output corresponding to the one or more words recognized as corresponding to the utterances; and
  
  audio recording programming for recording an electronically readable representation of said sound in one or more of said memory devices; and
  
  audio playback programming for playing back said recorded sound representation and providing a corresponding audio signal to said speaker or audio output;
  
  wherein said device'"'"'s programming further includes instructions for;
  
  enabling a user to associate recorded portions of text output by said speech recognition with portions of the recorded sound representation that have not previously been labeled by voice;
  
  enabling a user to select to cause text output by said speech recognition to be used as a text search string; and
  
  performing a search for recorded text output that matches the search string;
  
  whereby the user can select to find a portion of recorded sound representation by searching for its associated recorded text.

238. A computing device for performing large vocabulary speech recognition comprising:
- one or more processing devices;
  
  memory readable by the processing devices;
  
  a microphone or audio input for providing an electronic signal representing a sound input;
  
  a speaker or audio output for enabling an electric representation of sound produced in said device to be transduced into a corresponding sound;
  
  programming recorded in one or more of the memory devices including;
  
  speech recognition programming for performing large vocabulary speech recognition that responds to the electronic representations of the sound of a sequence of one or more utterances received from the microphone or microphone input by producing a text output corresponding to the one or more words recognized as corresponding to the utterances; and
  
  audio recording programming for recording an electronically readable representation of said sound in one or more of said memory devices;
  
  audio playback programming for playing back said recorded sound representation and providing a corresponding audio signal to said speaker or audio output; and
  
  instructions for switching back and forth between said audio playback and said speech recognition with one user input causing each such switch, with successive audio playbacks starting slightly before the end of the prior playback.

239. A computing device as in innovation 238 wherein said instructions for switching back and forth between said audio playback and said speech recognition make both such switch in response to a user selection of the same input device.

240. A computing device that functions as a cell phone comprising:
- a user perceivable output device;
  
  a set of phone keys including at least a standard twelve key phone key pad;
  
  one or more processing devices;
  
  memory readable by the processing devices;
  
  a microphone or audio input from which said telephone can receive electronic representations of sound;
  
  a speaker or audio output for enabling an electric representation of sound produced in said device to be transduced into a corresponding sound;
  
  transmitting and receiving circuitry;
  
  programming recorded in the memory including;
  
  telephone programming having instructions for performing telephone functions including making and receiving calls; and
  
  speech recognition programming for performing large vocabulary speech recognition that responds to the electronic representations of the sound of a sequence of one or more utterances received from the microphone or microphone input by producing a text output corresponding to the one or more words recognized as corresponding to the utterances; and
  
  audio recording programming for recording an electronically readable representation of said sound in one or more of said memory devices;
  
  audio playback programming for playing back said recorded sound representation and providing a corresponding audio signal to said speaker or audio output.

241. A computing device as in innovation 240 wherein said play programming includes instructions for:
- enabling a user to select a sub-portion of said recorded sound representation; and
  
  enabling a user to select to play a selected sub-portion of said sound representation to the other side of a cellular telephone call.

242. A computing device as in innovation 240 wherein said recording programming includes instructions for:
- enabling a user to select to record an electronically readable representation of one or both sides of a cellular phone conversation.

243. A computing device as in innovation 240 wherein the device'"'"'s programming further includes instructions for enabling a user to associate recorded portions of text output by said speech recognition with portions of the recorded sound representation that have not previously been labeled by voice.

244. A computing device as in innovation 243 wherein the device'"'"'s programming further includes instructions for:
- enabling a user to select to cause text output by said speech recognition to be used as a text search string; and
  
  performing a search for recorded text output corresponding to said search string;
  
  whereby said user can select to find a portion of recorded sound representation by searching for its associated recorded text.

245. A computing device as in innovation 240 wherein the device'"'"'s programming further includes instructions for enabling a user to select a sub-portion of said recorded sound representation which had not previously been recognized and to have said large vocabulary speech recognition performed upon said selected sub-portion.

246. A computing device as in innovation 245 wherein:
- said speech recognition programming includes instructions for performing speech recognition at different levels of quality, with the higher quality recognition taking more time to recognize a given length of sound; and
  
  said instructions for enabling a user to select to have speech recognition performed on a selected sub portion of recorded sound includes instructions for enabling the selected recorded sound to be recognized said higher quality.

247. A computing device as in innovation 245 wherein said speech recognition programming includes instructions for:
- marking the time alignment between individual recognized words in text output by said speech recognition and the portions of the recorded sound associated with each recognized word in said text; and
  
  enabling a user select a sequence of one or more words and to have the recorded sound associated with those words played back.

248. A computing device as in innovation 240 wherein the device'"'"'s programming further includes instructions for switching back and forth between audio playback and speech recognition, with successive audio playbacks starting slightly before the end of the prior playback.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Daniel L. Roth, David F. Johnston, Jordan R. Cohen, Manfred G. Grabherr
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Roth, Daniel L., Johnston, David F., Cohen, Jordan R., Grabherr, Manfred G.

Granted Patent

US 7,225,130 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/251
CPC Class Codes

G10L 15/19 Grammatical context, e.g. d...

G10L 15/22 Procedures used during a sp...

Methods, systems, and programming for performing speech recognition

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

248 Claims

Specification

Solutions

Use Cases

Quick Links

Methods, systems, and programming for performing speech recognition

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

248 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links