Interactive speech recognition apparatus

US 4,866,778 A
Filed: 08/11/1986
Issued: 09/12/1989
Est. Priority Date: 08/11/1986
Status: Expired due to Fees

First Claim

Patent Images

1. A speech recognition system comprising:

means for receiving an acoustic description of a portion of speech to be recognized;

means for storing an acoustic description of each word in a system vocabulary;

recognition means for making a determination of which one or more words of a recognition vocabulary, comprised of one or more words from said system vocabulary, most probably correspond to said portion of speech, said recognition means including comparing means for determining how closely the acoustic description of said portion of speech compares to the acoustic descriptions of words from said recognition vocabulary; and

first-pass means for causing said recognition means to start to perform a first recognition of said portion of speech using a first such recognition vocabulary;

control-input means for enabling an operator to input a string of one or more selected characters if he or she so desires; and

re-recognition means responsive to the input of a string of characters through said control-input means for causing said recognition means to start to perform a second recognition of said portion of speech using a second such recognition vocabulary, said re-recognition means including alphabetic filtering means for selecting a sub-vocabulary from said system vocabulary to be used as said second recognition vocabulary, said filtering means including means, responsive to said control-input means, for causing said sub-vocabulary to include an increased percent of vocabulary words specified as a function of said string of one or more characters input through said control-input means.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech recognition system which can perform multiple recognition passes on each word. If the recognizer is correct in its first pass, the operator may abort later passes by either pressing a key or speaking the next word. Otherwise, the operator may either wait for a second recognition pass to be performed against a larger vocabulary, or may specify one or more initial letters causing the second recognition pass to be performed against a vocabulary substantially restricted to words starting with those initial letters. Each time the user adds an additional letter to the initial string, any previous recognition is aborted and the re-recognition process is started anew with the new string. If the user types a control character after the initial string, then the string itself is used as the output of the recognizer. In one embodiment, a language model limits a relatively small vocabulary used in the first pass to the words most likely to occur given the language context of the dictated word. The system may also be used as an interactive transcription system for prerecorded speech and can operate on either discrete utterances or continuous speech. When used with prerecorded speech, the system displays the best scoring words of a recognition to the user, and, when the user choses a desired word from such a display, the system employs the portion of prerecorded speech matched against the chosen word to help determine where in that prerecorded speech the system should look for the next word to recognize.

Citations

29 Claims

1. A speech recognition system comprising:
- means for receiving an acoustic description of a portion of speech to be recognized;
  
  means for storing an acoustic description of each word in a system vocabulary;
  
  recognition means for making a determination of which one or more words of a recognition vocabulary, comprised of one or more words from said system vocabulary, most probably correspond to said portion of speech, said recognition means including comparing means for determining how closely the acoustic description of said portion of speech compares to the acoustic descriptions of words from said recognition vocabulary; and
  
  first-pass means for causing said recognition means to start to perform a first recognition of said portion of speech using a first such recognition vocabulary;
  
  control-input means for enabling an operator to input a string of one or more selected characters if he or she so desires; and
  
  re-recognition means responsive to the input of a string of characters through said control-input means for causing said recognition means to start to perform a second recognition of said portion of speech using a second such recognition vocabulary, said re-recognition means including alphabetic filtering means for selecting a sub-vocabulary from said system vocabulary to be used as said second recognition vocabulary, said filtering means including means, responsive to said control-input means, for causing said sub-vocabulary to include an increased percent of vocabulary words specified as a function of said string of one or more characters input through said control-input means.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. A speech recognition system as described in claim 1, whereinsaid control-input means includes means for enabling an operator to input a string of one or more selected alphabetic letters as said string;
    - andsaid alphabetic filtering means includes means for responding to the input through said control-input means of a string of one or more alphabetic letters by causing a majority of the words in the sub-vocabulary selected by said alphabetic filtering means to start with said string.
  - 3. A speech recognition system as described in claim 2, wherein:
    - said control-input means further includes means for enabling an operator to indicate that said second recognition is to be performed against a vocabulary which is not alphabetically filtered; and
      
      said re-recognition means further includes means responsive to said indication by said operator for selecting as said second recognition vocabulary a vocabulary which does not contain a majority of words starting with any one initial string.
  - 4. A speech recognition system as described in claim 1, wherein:
    - said control-input means include means for enabling an operator to input said string of one or more characters while said recognition means is performing said first recognition; and
      
      said re-recognition means includes means for responding to the input of a string of one or more characters during the performance of the first recognition by stopping the performance of the first recognition, by causing said alphabetic filtering means to select a sub-vocabulary as a function of said string, and by causing said recognizer to start performing said second recognition using the second recognition vocabulary selected by said alphabetic filtering means.
  - 5. A speech recognition system as described in claim 1, wherein:
    - said system further includes means for displaying to an operator, after said first recognition has made its determination with regard to said first vocabulary, the one or more words which the recognition means determines most probably correspond to said portion of speech;
      
      said control-input means includes means for enabling an operator to input said string of one or more characters after said means for displaying displays said one or more words;
      
      said re-recognition means includes means for responding to the input through said control-input means of a string of one or more characters after said display means displays said one or more words by causing said alphabetic filtering means to select a sub-vocabulary as a function of said string, and by causing said recognizer to start performing said second recognition using the second recognition vocabulary selected by said alphabetic filtering means.
  - 6. A speech recognition system as described in claim 1, wherein:
    - said control-input means includes means for enabling an operator to input a string of one or more selected alphabetic letters as said string;
      
      said alphabetic filtering means includes means for responding to a string of one or more alphabetic letters input through said control-input means by selecting a sub-vocabulary for use as said second recognition vocabulary which includes an increased percent of vocabulary words which start with that stringsaid control-input means includes means for enabling an operator to add one or more selected additional alphabetic letters to the end of a string of one or more letters input by the operator after that string has been input, and said re-recognition means has caused its alphabetic filtering means to select a first sub-vocabulary based on that string, and has caused said recognition means to start performing said second recognition using said first sub-vocabulary as said second recognition vocabulary; and
      
      said re-recognition means includes means for responding to the input of additional letters to said string through said control-input means by causing said alphabetic filtering means to select a second sub-vocabulary including an increased percentage of words starting with the new string formed by adding said one or more additional letters to the said string, and for causing said recognition means to start to perform an additional recognition of said portion of speech using said second sub-vocabulary as a third recognition vocabulary.
  - 7. A speech recognition system as described in claim 6, wherein said re-recognition means further includes means for causing said recognition means to abort said second recognition using said first sub-vocabulary before it causes said recognition means to start performing said additional recognition using said second sub-vocabulary.
  - 8. A speech recognition system as described in claim 1, wherein said control-input means includes a keyboard and means for enabling an operator to input characters by the pressing of keys on said keyboard.
  - 9. A speech recognition system as described in claim 1, wherein:
    - said control-input means includes means for enabling an operator to input a string of one or more selected alphabetic letters plus a control character as said string;
      
      said speech recognition system further includes output producing means for selecting one of the one or more words determined by said recognition means in either said first recognition or second recognition to most probably correspond to said portion of speech and for producing a string of one or more letters corresponding to that word as an output; and
      
      said speech recognition system further includes means for responding to the input through said control-input means of said string of one or more letters plus said control character by causing said output producing means to produce said string of letters as said output.
  - 10. A speech recognition system as described in claim 1, wherein:
    - said means for receiving an acoustic description of a portion of speech to be recognized includes means for storing an acoustic description of both a first and a second portion of speech to be recognized;
      
      said means for storing an acoustic description of each word in a system vocabulary includes means for storing an acoustic description of a plurality of control words;
      
      said recognition means includes means for storing an associated character string for each of said control words;
      
      said control-input means includes means for causing said recognition means to make a determination of which, if any, one or more control words most probably correspond to said second portion of speech and to use said character string associated with said one or more control words as said string of one or more characters input by said operator through said control-input means for purposes of affecting the recognition of said first portion of speech.
  - 11. A speech recognition system in claim 1 wherein said speech recognition system is a discrete utterance recognition system.
  - 12. A speech recognition system as described in claim 1 wherein:
    - said means for receiving an acoustic description of a portion of speech to be recognized includes means for recording an extended acoustic description of a plurality of successive spoken words;
      
      said recognition means includes means for making a determination of which one or more words of a recognition vocabulary most probably correspond to each of a plurality of successive segments of said extended acoustic description.
  - 13. A speech recognition system as described in claim 12 wherein:
    - said means for recording an extended acoustic description of a plurality of spoken words include means for recording said speech in a form from which a humanly understandable audio playback of that speech can be made;
      
      said system further includes means for playing back an audio representation of one or more of said successive segments so that a human operator can hear them in conjunction with the recognition by said recognition means of those segments.
  - 14. A speech recognition system as described in claim 12 wherein:
    - said means for recording an extended acoustic description includes means for recording an extended acoustic description of a plurality of continuously spoken words;
      
      said recognition means includes means for making a determination of which one or more words from a recognition vocabulary most probably correspond to successive segments of continuous speech recorded in said extended acoustic description.

15. A speech recognition system comprising:
- means for recording an extended acoustic description of a plurality of successive spoken words;
  
  means for storing an acoustic description of each word in a recognition vocabulary;
  
  recognition means for making a determination of which words of said recognition vocabulary most probably correspond to a given portion of speech recorded in said extended acoustic description, said recognition means including comparing means for determining how closely the acoustic description of each such portion of speech compares to the acoustic descriptions of words from said recognition vocabulary;
  
  choice display means for displaying a plurality of the words determined by said recognition means to most probably correspond to each successive portion of speech to be recognized;
  
  word selection means for enabling an operator to select which of said plurality of displayed words corresponds to said given portion of speech; and
  
  said speech recognition system further including means, responsive to a selection by said selection means of a displayed word as corresponding to said given portion of speech, for determining how much of said extended acoustic description corresponds to said selected word and supplying a successive portion of the extended acoustic description which follows that associated with the selected word to said recognition means as the next portion of speech to be recognized, and for causing said recognition means to make a determination of which words of said recognition vocabulary most probably correspond to said next portion of speech to be recognized.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. A speech recognition system as described in claim 15 whereinsaid means for recording an extended acoustic description of a plurality of spoken words include means for recording said speech in a form from which a humanly understandable audio playback of that speech can be made;
    - andsaid system further includes means for playing back an audio representation of a portion of speech so that a human operator can hear it in conjunction with the recognition by said recognition means of that portion of speech.
  - 17. A speech recognition system as described in claim 15 wherein:
    - said word selection means includes means for causing said speech recognition system to wait for the operator to select one of the displayed words associated with a given portion of speech before said recognition means make a determination with regard to said successive portion of speech and said choice display means displays the words determined by said recognition means to most probably correspond to said successive portion of speech.
  - 18. A speech recognition system as described in claim 15 wherein,said recognition means includes means for making a determination of which single word of said recognition vocabulary most probably corresponds to a portion of speech to be recognized;
    - said word selection means includes means for selecting said most probably corresponding word from among said displayed words when the operator fails to select one of the displayed words within a given response time.
  - 19. A speech recognition system as described in claim 15 wherein:
    - said means for recording an extended acoustic description includes means for recording an extended acoustic description of a plurality of continuously spoken words;
      
      said recognition means includes means for making a determination of which one or more words from said recognition vocabulary most probably correspond to successive portions of said continuously spoken words recorded in said extended acoustic description.
  - 20. A speech recognition system as described in claim 19 wherein said recognition includes:
    - means for time aligning the acoustic description of at least said selected word against the acoustic description of said continuously spoken words, and for determining the time in the acoustic description of said continuously spoken words at which the time alignment of said selected word most probably ends; and
      
      means responsive to said selection means for using the time in said acoustic description of said continuously spoken words at which the time alignment of the displayed word selected by the operator most probably ends as the starting time of the next successive portions of said extended acoustic description to be recognized.

21. A speech recognition system for recognizing a succession of words comprising:
- means for receiving an acoustic description of a portion of speech to be recognized;
  
  means for storing an acoustic description of each word in a system vocabulary;
  
  recognition means for making a determination of which one or more words of a sub-vocabulary comprised of one or more words of said system vocabulary most probably correspond to said portion of speech, said recognition means including comparing means for determining how closely the acoustic description of said portion of speech compares to the acoustic descriptions of words from said sub-vocabulary;
  
  means for storing a body of text comprised of one or more words and for associating the portion of speech to be recognized with a location in that text which can be preceded by one or more of said wordsfirst-pass means for causing said recognition means to make a first determination of which one or more words of a first sub-vocabulary of said system vocabulary most probably correspond to said portion of speech, said first pass-means including language model filtering means for selecting said first sub-vocabulary as a function of the sequence of one or more words preceding the location associated with the speech to be recognized in said body of text; and
  
  means for displaying said one or more words of said first sub-vocabulary selected by said first determination as most probably corresponding to said portion of speech;
  
  re-recognition means for causing said recognition means to start making a second determination of which one or more words of a second sub-vocabulary, which can be different from said first sub-vocabulary, most probably correspond to said portion of speech;
  
  control-input means for enabling an operator to input a command to control the re-recognition process; and
  
  means for aborting, in response to in input of said command by said operator, said second determination of which one or more words of said second sub-vocabulary most probably correspond to said portion of speech.
- View Dependent Claims (22, 23)
- - 22. A speech recognition system as described in claim 21 wherein said language model filtering means selects said first sub-vocabulary so that it is comprised substantially of the words from said system vocabulary which are the most likely words to occur following said sequence of one or more words preceding the location associated with the speech to be recognized in said body of text, according to a probablistic model of what words are likely to occur after given other words in a given type of speech modeled by said probabilistic model.
  - 23. A speech recognition system as described in claim 22 wherein said language model filtering means includes means for selecting said first sub-vocabulary based on the word preceding the location associated with said speech to be recognized in said body of text.

24. A speech recognition system comprising:
- means for receiving an acoustic description of a portion of speech to be recognized;
  
  means for storing an acoustic description of each word in a system vocabulary;
  
  recognition means for making a determination of which one or more words of a recognition vocabulary which is a sub-vocabulary consisting of one or more words of said system vocabulary most probably correspond to said portion of speech, said recognition means including comparing means for determining how closely the acoustic description of said portion of speech compares to the acoustic descriptions of words from said recognition vocabulary;
  
  first-pass means for causing said recognition means to first make a first determination of which one or more words of a first such recognition vocabulary most probably correspond to said portion of speech;
  
  re-recognition means for causing said recognition means to start to make a second determination of which one or more words of a second such recognition vocabulary most probably correspond to said portion of speech; and
  
  means for aborting said second determination in response to an abort signal from an operator of the system.
- View Dependent Claims (25, 26, 27, 28, 29)
- - 25. A speech recognition system as described in claim 24 wherein said second vocabulary is substantially larger than said first vocabulary.
  - 26. A speech recognition system as described in claim 24 further including:
    - means for displaying a plurality of those words from said first vocabulary determined by said recognition means to most probably correspond to said portion of speech, and for then adding to the said display of words from the first vocabulary additional words from said second vocabulary determined by said recognition means to most probably correspond to said portion of speech; and
      
      means for enabling an operator to select which of said currently displayed words is the word corresponding to said given portion of speech.
  - 27. A speech recognition system as described in claim 24 further including:
    - means for displaying a plurality of those words from said first vocabulary determined by said recognition means to most probably correspond to said portion of speech;
      
      means for displaying a plurality of those words from said second vocabulary determined by said recognition means to most probably correspond to said portion of speech upon the receipt of a second display command;
      
      means for enabling an operator to selectively indicate that a selected one of said words displayed from said first vocabulary is the word corresponding to said portion of speech, or to cause the display of said plurality of words from said second vocabulary by generating said second display command; and
      
      means for enabling an operator to selectively indicate that a selected one of said words displayed from said second vocabulary is the word corresponding to said portion of speech.
  - 28. A speech recognition system as described in claim 24 wherein:
    - said means for receiving an acoustic description of a portion of speech to be recognized includes means for receiving an acoustic description of each successive utterance spoken by an operator of the system;
      
      said speech recognition system is a discrete utterance recognition system and said means for aborting said second determination includes means for detecting the beginning of another such utterance after the utterance associated with said portion of speech for which said first determination is made and for treating such a detection as said abort signal.
  - 29. A speech recognition system as described in claim 24 further including:
    - means for displaying a plurality of those words from said first vocabulary determined by said first determination to most probably correspond to said portion of speech;
      
      means for enabling an operator to make a selection of which of said displayed words is the word corresponding to said given portion of speech; and
      
      in which said means for aborting said second determination consists of means for detecting that said operator has made such a selection and for treating said selection as said abort signal.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Dragon Systems, Inc. (Microsoft Corporation)
Original Assignee
Dragon Systems, Inc. (Microsoft Corporation)
Inventors
Baker, James K.
Primary Examiner(s)
Harkcom, Gary V.
Assistant Examiner(s)
Merecki, John A.

Application Number

US06/895,488
Time in Patent Office

1,128 Days
Field of Search

364/513.5, 364/513, 381/41-46, 381/51-53, 381/110, 367/198, 369/24-25
US Class Current

704/254
CPC Class Codes

G10L 15/22 Procedures used during a sp...

Interactive speech recognition apparatus

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

29 Claims

Specification

Solutions

Use Cases

Quick Links

Interactive speech recognition apparatus

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

29 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links