Methods, systems, and programming for performing speech recognition

US 7,225,130 B2
Filed: 09/06/2002
Issued: 05/29/2007
Est. Priority Date: 09/05/2001
Status: Active Grant

First Claim

Patent Images

1. A computing system for performing speech recognition comprising:

one or more memory devices for storing information, including programming information;

one or more processors for processing information in response to said programming information;

one or more input devices for receiving inputs from a user that can be supplied to one or more of said processors;

wherein said programming information includes programming, including both speech recognition programming and programming external to said speech recognition programming, for causing said computing system, under control of said one or more processors, to perform the following functions;

using said speech recognition programming for;

providing a user interface which allows a user to select, using said one or more input devices, between generating a first and a second user input;

responding to the generation of the first user input by performing large vocabulary speech recognition on one or more utterances in a prior language context dependent mode, which recognizes at least the first word of an utterance depending in part on a language model context created by a previously recognized word from the previous utterance, if any; and

responding to the generation of the second user input by performing large vocabulary speech recognition on one or more utterances in a prior language context independent mode, which recognizes at least the first word of an utterance substantially independently of any language model context created by a previously recognized word from the previous utterance, if any;

wherein;

as words are recognized by said speech recognition programming in both of said recognition modes such words are output to said programming external to said speech recognition programming for use by said external programming; and

the response by said speech recognition programming to said first and second inputs by switching recognition modes is independent of the state of said external programming.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention relates to: speech recognition using selectable recognition modes; using choice lists in large-vocabulary speech recognition; enabling users to select word transformations; speech recognition that automatically turns recognition off in one or more specified ways; phone key control of large-vocabulary speech recognition; speech recognition using phone key alphabetic filtering and spelling: speech recognition that enables a user to perform re-utterance recognition; the combination of speech recognition and text-to-speech (TTS) generation; the combination of speech recognition with handwriting and/or character recognition; and the combination of large-vocabulary speech recognition with audio recording and playback.

135 Citations

View as Search Results

18 Claims

1. A computing system for performing speech recognition comprising:
- one or more memory devices for storing information, including programming information;
  
  one or more processors for processing information in response to said programming information;
  
  one or more input devices for receiving inputs from a user that can be supplied to one or more of said processors;
  
  wherein said programming information includes programming, including both speech recognition programming and programming external to said speech recognition programming, for causing said computing system, under control of said one or more processors, to perform the following functions;
  
  using said speech recognition programming for;
  
  providing a user interface which allows a user to select, using said one or more input devices, between generating a first and a second user input;
  
  responding to the generation of the first user input by performing large vocabulary speech recognition on one or more utterances in a prior language context dependent mode, which recognizes at least the first word of an utterance depending in part on a language model context created by a previously recognized word from the previous utterance, if any; and
  
  responding to the generation of the second user input by performing large vocabulary speech recognition on one or more utterances in a prior language context independent mode, which recognizes at least the first word of an utterance substantially independently of any language model context created by a previously recognized word from the previous utterance, if any;
  
  wherein;
  
  as words are recognized by said speech recognition programming in both of said recognition modes such words are output to said programming external to said speech recognition programming for use by said external programming; and
  
  the response by said speech recognition programming to said first and second inputs by switching recognition modes is independent of the state of said external programming.
- View Dependent Claims (2, 3, 4)
- - 2. A computer system as in claim 1 wherein:
    - the one or more input devices include a first button and a second button, where said buttons can be either hardware or software buttons;
      
      the first user input is generated by pressing the first button; and
      
      the second user input is generated by pressing the second button.
  - 3. A computer system as in claim 1 wherein the prior language context independent mode uses language context probabilities within an utterance, by causing the recognition of a word in a given utterance to depend on the identity of the one or more words, if any, recognized before it in said given utterance.
  - 4. A computer system as in claim 1 wherein said function are performed by a software input panel in Microsoft Windows CE.

5. A computing system for performing speech recognition comprising:
- one or more memory devices for storing information, including programming information;
  
  one or more processors for processing information in response to said programming information;
  
  one or more input devices for receiving inputs from a user that can be supplied to one or more of said processors;
  
  wherein said programming information includes programming for causing said computing system, under control of said one or more processors, to perform the following functions;
  
  providing a user interface which allows a user to select, using said one or more input devices, between generating a first and a second user input;
  
  responding to the generation of the first user input by selecting a continuous speech recognition mode which performs continuous speech recognition on speech sounds using given vocabularyresponding to the generation of the second user input by selecting a discrete recognition mode which performs discrete recognition on speech sounds using substantially the same given vocabulary; and
  
  responding to speech sounds by performing recognition upon them using the currently selected speech recognition mode;
  
  wherein;
  
  the user can switch between the use of continuous and discrete recognition by selecting one of said user inputsthe one or more input devices include a first button and a second button;
  
  the first user input is generated by pressing the first button; and
  
  the second user input is generated by pressing the second button.touching the first or second button causes its respective recognition mode to start from substantially the start of the touching of such a button and to terminate by the next detection of an end of utterance;
  
  the discrete recognition is limited to the recognition of the one or more vocabulary word candidates with the best scoring match against the utterance whose end is detected after the touching of said button; and
  
  the continuous recognition mode is not so limited;
  
  so that said discrete recognition mode is limited to outputting only one single vocabulary word as the best scoring recognition candidate for the recognition of the given utterance and the continuous recognition mode can output a sequence of multiple words for the recognition of the given utterance.
- View Dependent Claims (6, 7, 8, 9, 10, 11)
- - 6. A computer system as in claim 5 wherein the given vocabulary is a large vocabulary.
  - 7. A computer system as in claim 5 wherein the given vocabulary is an alphabetic input vocabulary.
  - 8. A computer system as in claim 5 wherein:
    - said user interface allows a user to select between generating a third and a fourth input independently from the selection of the first and second input; and
      
      said method further includes responding to said third and fourth inputs, respectively, by selecting as said given vocabulary a first vocabulary or a second vocabulary;
      
      whereby the user can separately switch between recognition vocabularies and between the discrete and continuous recognition modes.
  - 9. A computer system as in claim 8 wherein said first and second vocabulary are a large vocabulary of words and an alphabetic input vocabulary, respectively.
  - 10. A computer system as in claim 8 wherein said first and second vocabulary are two different alphabetic entry vocabularies that contain different letter identifying words for individual letters of the alphabet.
  - 11. A computer system as in claim 5 wherein acoustic models used to represent words in the discrete recognition mode are different than the acoustic models used to represent the same words in the continuous recognition mode.

12. A computing system for performing speech recognition comprising:
- one or more memory devices for storing information, including programming information;
  
  one or more processors for processing information in response to said programming information;
  
  one or more input devices for receiving inputs from a user that can be supplied to one or of said processors;
  
  wherein said programming information includes programming for causing said computing device, under control of said one or more processors, to perform the following functions;
  
  providing a user interface which allows a user to select, using said one or more input devices, between generating a first and a second user input;
  
  responding to the generation of the first user input by switching to a first recognition mode that recognizes one or more utterances as one or more words in a first alphabetic entry vocabulary; and
  
  responding to the generation of the second user input by switching to a second recognition mode that recognize one or more utterances as one or more words in a second, different, alphabetic entry vocabulary;
  
  wherein the first and second alphabetic entry vocabularies contain different letter-identifying words for individual letters of the alphabet.
- View Dependent Claims (13, 14, 15, 16)
- - 13. A computer system as in claim 12 wherein:
    - the first alphabetic entry vocabulary includes the names of each letter of the alphabet and the second alphabetic entry vocabulary does not; and
      
      the second alphabetic entry vocabulary includes one or more words that start with each letter of the alphabet and the first alphabetic entry vocabulary does not.
  - 14. A computer system as in claim 12 wherein said one or more input devices include separate buttons for generating said first and second inputs.
  - 15. A computer system as in claim 14 wherein touching of each of said buttons turns on recognition in the button'"'"'s associated alphabetic entry mode.
  - 16. A computer system as in claim 12 whereinsaid user interface enables:
    - a user to select a filtering mode in which word choices for the recognition of a given word are limited to word'"'"'s whose spelling matches a sequence of one or more characters input by the user;
      
      a user to enter said one or more filtering characters by voice recognition using either said first or second alphabetic entry modes; and
      
      said first and second inputs select between whether such recognition of filtering characters is performed using said first or second alphabetic entry modes, respectively.

17. A computing system for performing speech recognition comprising:
- one or more memory devices for storing information, including programming information;
  
  one or more processors for processing information in response to said programming information;
  
  one or more input devices for receiving inputs from a user that can be supplied to one or more of said processors;
  
  wherein said programming information includes programming for causing said computing system, under control of saidone or more processors, to perform the following functions;
  
  providing a user interface which allows a user to select, using said one or more input devices, between generating a first and a second user input;
  
  responding to the generation of the first user input by selecting a continuous speech recognition mode which performs continuous speech recognition on speech sounds using a given vocabulary;
  
  responding to the generation of the second user input by selecting a discrete recognition mode which performs discrete recognition on speech sounds using substantially the same given vocabulary; and
  
  responding to speech sounds by performing recognition upon them using the currently selected speech recognition mode;
  
  wherein;
  
  the user can switch between the use of continuous and discrete recognition by selecting one of said user inputssaid user interface allows a user to select between generating a third and a fourth input independently from the selection of the first and second input; and
  
  said method further includes responding to said third and fourth inputs, respectively, by selecting as said given vocabulary a first vocabulary or a second vocabulary;
  
  whereby the user can separately switch between recognition vocabularies and between the discrete and continuous recognition modes; and
  
  wherein said first and second vocabulary are two different alphabetic entry vocabularies that contain different letter-identifying words for individual letters of the alphabet.

18. A computing system for performing speech recognition comprising:
- one or more processors for processing information in response to programming instructions;
  
  one or more input devices for receiving inputs from a user that can be supplied to one or more of said processors;
  
  one or more memory devices for storing processor readable information, including said programming instructions;
  
  processor readable programming instructions stored in said memory for;
  
  a speech recognition program that responds to speech sounds by outputting a sequence of one or more words recognized as matching said speech sounds;
  
  a programs external to the speech recognition program capable of receiving an input comprised of a sequence of one or more words; and
  
  supplying a sequence of one or more of said words output from the speech recognition program as said input to said external program substantially as said words are recognized by said speech recognition program;
  
  wherein said programming instructions for said speech recognition program further include instructions for;
  
  providing a user interface which allows a user to select between generating a first and a second user input, using said one or more input devices;
  
  responding to the generation of the first user input by performing large vocabulary speech recognition on one or more utterances in a prior language-context-dependent mode, which recognizes at least the first word of such recognition depending in part on a language model context created by a previously recognized word from the previous utterance, if any; and
  
  responding to the generation of the second user input by performing large vocabulary speech recognition on one or more utterances in a prior language-context-independent mode, which recognizes at least the first word of such recognition independently of any language model context created by a previously recognized word from the previous utterance, if any;
  
  wherein said programming instructions for said external program further includes instructions for providing a user interface which allows a user to selectively change the context in which successive words of said input are processed by said external program; and
  
  wherein the response by said speech recognition programming to said first and second inputs by switching recognition modes is independent of said context in which successive words input to the external program are processed.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
Voice Signal Technologies Incorporated (Microsoft Corporation)
Inventors
Roth, Daniel L., Johnston, David F., Cohen, Jordan R., Grabherr, Manfred G.
Primary Examiner(s)
Hudspeth; David
Assistant Examiner(s)
ALBERTALLI, BRIAN LOUIS

Application Number

US10/227,653
Publication Number

US 20040049388A1
Time in Patent Office

1,726 Days
Field of Search

None
US Class Current

704/270
CPC Class Codes

G10L 15/19 Grammatical context, e.g. d...

G10L 15/22 Procedures used during a sp...

Methods, systems, and programming for performing speech recognition

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

135 Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Methods, systems, and programming for performing speech recognition

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

135 Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links