Speech recognition using ambiguous or phone key spelling and/or filtering

US 7,526,431 B2
Filed: 09/24/2004
Issued: 04/28/2009
Est. Priority Date: 09/05/2001
Status: Active Grant

First Claim

Patent Images

1. A method of performing large vocabulary speech recognition comprising:

receiving a filtering sequence of one or more key-press signals each of which indicates which of a plurality of keys has been selected by a user, where each of the keys represents two or more letters;

receiving an acoustic representation of a key-disambiguating utterance made in association with a given key press signal in said filtering sequence;

performing speech recognition upon the acoustic representation of the key-disambiguation utterance that favors recognition of letter identifying words identifying letters represented by the given key press signal;

responding to a recognition of the given key press signal'"'"'s associated key-disambiguation utterance as a letter identifying word by causing the set of letters represented by the given key press signal in the filtering sequence to be substantially limited to a letter identified by the recognized letter identifying word;

receiving an acoustic representation of a word utterance that represents one or more words;

performing speech recognition upon the acoustic word utterance representation which scores word candidates as a function of the match between the acoustic representation and acoustic models of words;

wherein the scoring of said word candidates favors word candidates containing a sequence of one or more alphabetic characters corresponding to the filtering sequence of key-press signals, where a candidate word is considered to contain a character sequence corresponding to the filtering sequence if each sequential character in the character sequence corresponds to one of the letters represented by its corresponding sequential key-press signal;

wherein said method further includes;

responding to a key press signal by displaying in user-perceivable form a set of one or more letter identifying words starting with each letter represented by the key press signal'"'"'s associated pressed key;

favoring the recognition of an utterance made after the display of the pressed key'"'"'s associated letter identifying words as corresponding to one of said displayed words; and

responding to recognition of one of said displayed words by said causing the set of letters represented by the key press signal in the filtering sequence to be substantially limited to the letter associated with the recognized displayed word.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Alphabetic filtering of the speech recognition of words uses a key press to indicate a desired character in an alphabetic filter string, where each key press represents two or more letters. The key presses can be disambiguated by recognizing a key-disambiguation utterance in association with a given key press. A user can select a desired recognition candidate from a choice list produced by such filtered word recognition. Ambiguous alphabetic filtering can be performed iteratively in response to the addition of successive ambiguous key presses. A user can select to re-recognize the utterance using filtering based on ambiguous key input after seeing the results of recognition without such filtering. Unambiguous alphabetic filtering can be performed by using multiple presses of an ambiguous key to disambiguate which letter is intended. A user can select between entering text by either large vocabulary speech recognition or by spelling text by pressing phone keys.

203 Citations

22 Claims

1. A method of performing large vocabulary speech recognition comprising:
- receiving a filtering sequence of one or more key-press signals each of which indicates which of a plurality of keys has been selected by a user, where each of the keys represents two or more letters;
  
  receiving an acoustic representation of a key-disambiguating utterance made in association with a given key press signal in said filtering sequence;
  
  performing speech recognition upon the acoustic representation of the key-disambiguation utterance that favors recognition of letter identifying words identifying letters represented by the given key press signal;
  
  responding to a recognition of the given key press signal'"'"'s associated key-disambiguation utterance as a letter identifying word by causing the set of letters represented by the given key press signal in the filtering sequence to be substantially limited to a letter identified by the recognized letter identifying word;
  
  receiving an acoustic representation of a word utterance that represents one or more words;
  
  performing speech recognition upon the acoustic word utterance representation which scores word candidates as a function of the match between the acoustic representation and acoustic models of words;
  
  wherein the scoring of said word candidates favors word candidates containing a sequence of one or more alphabetic characters corresponding to the filtering sequence of key-press signals, where a candidate word is considered to contain a character sequence corresponding to the filtering sequence if each sequential character in the character sequence corresponds to one of the letters represented by its corresponding sequential key-press signal;
  
  wherein said method further includes;
  
  responding to a key press signal by displaying in user-perceivable form a set of one or more letter identifying words starting with each letter represented by the key press signal'"'"'s associated pressed key;
  
  favoring the recognition of an utterance made after the display of the pressed key'"'"'s associated letter identifying words as corresponding to one of said displayed words; and
  
  responding to recognition of one of said displayed words by said causing the set of letters represented by the key press signal in the filtering sequence to be substantially limited to the letter associated with the recognized displayed word.

2. A method of performing large vocabulary speech recognition comprising:
- receiving a filtering sequence of one or more key-press signals each of which indicates which of a plurality of keys has been selected by a user, where each of the keys represents two or more letters;
  
  receiving an acoustic representation of a key-disambiguating utterance made in association with a given key press signal in said filtering sequence;
  
  performing speech recognition upon the acoustic representation of the key-disambiguation utterance that favors recognition of letter identifying words identifying letters represented by the given key press signal;
  
  responding to a recognition of the given key press signal'"'"'s associated key-disambiguation utterance as a letter identifying word by causing the set of letters represented by the given key press signal in the filtering sequence to be substantially limited to a letter identified by the recognized letter identifying word;
  
  receiving an acoustic representation of a word utterance that represents one or more words;
  
  performing speech recognition upon the acoustic word utterance representation which scores word candidates as a function of the match between the acoustic representation and acoustic models of words;
  
  wherein;
  
  the scoring of said word candidates favors word candidates containing a sequence of one or more alphabetic characters corresponding to the filtering sequence of key-press signals, where a candidate word is considered to contain a character sequence corresponding to the filtering sequence if each sequential character in the character sequence corresponds to one of the letters represented by its corresponding sequential key-press signal; and
  
  each key press signal of the filtering sequence has a time period associated with it that starts after the previous key press signal in the sequence, if any, and ends with or before the subsequent key press signal in the sequence, if any; and
  
  a received key-disambiguating utterance is associated with a given key press signal if it is received in the utterance duration associated with that key press.
- View Dependent Claims (3, 4)
- - 3. A method as in claim 2 wherein the method is performed by software running on a telephone and the keys are keys of a telephone keypad.
  - 4. A method as in claim 3 wherein the telephone is a cell phone.

5. A computerized method of performing speech recognition comprising:
- receiving a filtering sequence of one or more key-press signals each of which indicates which of a plurality of keys has been selected by a user, where each of the keys represents two or more letters;
  
  receiving an acoustic representation of a key-disambiguating utterance made in association with a given key press signal in said filtering sequence;
  
  performing speech recognition upon the acoustic representation of the key-disambiguation utterance that favors recognition of letter identifying words identifying letters represented by the given key press signal;
  
  responding to a recognition of the given key press signal'"'"'s associated key-disambiguation utterance as a letter identifying word by causing the set of letters represented by the given key press signal in the filtering sequence to be substantially limited to a letter identified by the recognized letter identifying word;
  
  receiving an acoustic representation of a word utterance that represents one or more words;
  
  performing speech recognition upon the acoustic word utterance representation which scores word candidates as a function of the match between the acoustic representation and acoustic models of words;
  
  wherein;
  
  the scoring of said word candidates favors word candidates containing a sequence of one or more alphabetic characters corresponding to the filtering sequence of key-press signals, where a candidate word is considered to contain a character sequence corresponding to the filtering sequence if each sequential character in the character sequence corresponds to one of the letters represented by its corresponding sequential key-press signal; and
  
  each key press signal of the filtering sequence has a time period associated with it that starts after the previous key press signal in the sequence, if any, and ends with or before the subsequent key press signal in the sequence, if any; and
  
  a received key-disambiguating utterance is associated with a given key press signal if it is received in the utterance duration associated with that key press; and
  
  wherein said method further includes;
  
  outputting a plurality of the word candidates produced by said speech recognition in a user-perceivable form in a choice list; and
  
  responding to a user selection of one of the output word candidates by selecting it as the one or more recognized word for the recognition.
- View Dependent Claims (6, 7)
- - 6. A method as in claim 5 wherein the method is performed by software running on a telephone and the keys are keys of a telephone keypad.
  - 7. A method as in claim 6 wherein the telephone is a cell phone.

8. A computerized method of performing speech recognition comprising:
- receiving a filtering sequence of one or more key-press signals each of which indicates which of a plurality of keys has been selected by a user, where each of the keys represents two or more letters;
  
  receiving an acoustic representation of a key-disambiguating utterance made in association with a given key press signal in said filtering sequence;
  
  performing speech recognition upon the acoustic representation of the key-disambiguation utterance that favors recognition of letter identifying words identifying letters represented by the given key press signal;
  
  responding to a recognition of the given key press signal'"'"'s associated key-disambiguation utterance as a letter identifying word by causing the set of letters represented by the given key press signal in the filtering sequence to be substantially limited to a letter identified by the recognized letter identifying word;
  
  receiving an acoustic representation of a word utterance that represents one or more words;
  
  performing speech recognition upon the acoustic word utterance representation which scores word candidates as a function of the match between the acoustic representation and acoustic models of words;
  
  wherein;
  
  the scoring of said word candidates favors word candidates containing a sequence of one or more alphabetic characters corresponding to the filtering sequence of key-press signals, where a candidate word is considered to contain a character sequence corresponding to the filtering sequence if each sequential character in the character sequence corresponds to one of the letters represented by its corresponding sequential key-press signal; and
  
  each key press signal of the filtering sequence has a time period associated with it that starts after the previous key press signal in the sequence, if any, and ends with or before the subsequent key press signal in the sequence, if any; and
  
  a received key-disambiguating utterance is associated with a given key press signal if it is received in the utterance duration associated with that key press; and
  
  said performing of speech recognition that favors candidates containing a sequence of characters corresponding to the filter sequence is performed repeatedly for a given acoustic word utterance representation in response to the receipt of successive key-press signals in said filtering sequence.
- View Dependent Claims (9, 10)
- - 9. A method as in claim 8 wherein the method is performed by software running on a telephone and the keys are keys of a telephone keypad.
  - 10. A method as in claim 9 wherein the telephone is a cell phone.

11. A computerized method of performing speech recognition comprising:
- receiving an acoustic representation of a word utterance that represents one or more words;
  
  performing speech recognition upon the acoustic word utterance representation that scores word candidates as a function of the match between the acoustic representation and acoustic models of words;
  
  providing a user perceivable output indicating the one or more words of the word candidate selected by the speech recognition as most probably corresponding to said word utterance representation;
  
  providing a user interface that enable a user to select to respond to such an output by entering a filtering sequence to filter recognition of said utterance representation, which filtering sequence includes one or more key-press signals each of which indicates which of a plurality of keys has been selected by a user, where each of the key-press signals represents two or more letters;
  
  receiving a filtering sequence of one or more key-press signals each of which indicates which of a plurality of keys has been selected by a user, where each of the keys represents two or more letters;
  
  receiving an acoustic representation of a key-disambiguating utterance made in association with a given key press signal in said filtering sequence;
  
  performing speech recognition upon the acoustic representation of the key-disambiguation utterance that favors recognition of letter identifying words identifying letters represented by the given key press signal;
  
  responding to a recognition of the given key press signal'"'"'s associated key-disambiguation utterance as a letter identifying word by causing the set of letters represented by the given key press signal in the filtering sequence to be substantially limited to a letter identified by the recognized letter identifying word;
  
  re-performing speech recognition upon the acoustic word utterance representation which scores word candidates as a function of the match between the acoustic representation and acoustic models of words;
  
  wherein;
  
  the scoring of said word candidates favors word candidates containing a sequence of one or more alphabetic characters corresponding to the filtering sequence of key-press signals, where a candidate word is considered to contain a character sequence corresponding to the filtering sequence if each sequential character in the character sequence corresponds to one of the letters represented by its corresponding sequential key-press signal; and
  
  each key press signal of the filtering sequence has a time period associated with it that starts after the previous key press signal in the sequence, if any, and ends with or before the subsequent key press signal in the sequence, if any; and
  
  a received key-disambiguating utterance is associated with a given key press signal if it is received in the utterance duration associated with that key press; and
  
  providing a user perceivable output indicating the one or more words of the word candidate selected by the re-performing of speech recognition as most probably corresponding to said word utterance.
- View Dependent Claims (12, 13)
- - 12. A method as in claim 11 wherein the method is performed by software running on a telephone and the keys are keys of a telephone keypad.
  - 13. A method as in claim 12 wherein the telephone is a cell phone.

14. A computerized method of inputting a sequence of one or more alphabetic characters into a computing system comprising performing the following for each character in the sequence:
- receiving a given key-press signal indicating which of a plurality of keys has been selected by a user, where;
  
  the given key press signal is ambiguous in that it indicates that one of two or more letters associated with it has been selected; and
  
  the key press signal has a time period associated with it that starts after the previous key press signal in the input sequence, if any, and ends with or before the subsequent key press signal in the input sequence, if any;
  
  responding to receipt of an acoustic representation of an utterance during the time period associated with the given key press signal by;
  
  associating the utterance with the key press signal; and
  
  performing speech recognition upon the acoustic representation to select a best scoring word for the utterance, with the recognition favoring recognition of a letter identifying word that identifies one of the letters associated with the given key press signal; and
  
  responding to the selection of a letter identifying word as the best scoring word by treating the letter identified by said best scoring word as the letter input by the user in association with the associated key press signal.
- View Dependent Claims (15, 16, 17, 18, 19, 20, 21)
- - 15. A method as in claim 14 wherein:
    - a given key press signal is generated by pressing a selected key; and
      
      the pressing of said key turns on said speech recognition.
  - 16. A method as in claim 14 further including:
    - responding to a key press signal by displaying in user-perceivable form a set of letter identifying words for identifying each letter represented by the pressed key; and
      
      favoring the recognition of an utterance made after the display of the pressed key'"'"'s associated letter identifying words as corresponding to one of said displayed words.
  - 17. A method as in claim 16 wherein:
    - the method is performed on a telephone having a display;
      
      the outputting of the subset of letter identifying words is performed by displaying such words on the telephone'"'"'s display; and
      
      the keys used to generate said key press signals are phone keys.
  - 18. A method as in claim 14 wherein:
    - the method is used in conjunction with a large vocabulary speech recognition system; and
      
      a majority of the words which starts with a given letter in the vocabulary of the large vocabulary recognition system function as letter identifying words for the given letter.
  - 19. A method as in claim 14 wherein:
    - only two to five letter identifying words which start with a given letter function as letter identifying words for said given letter; and
      
      the recognition of an utterance associated with a given key press signal favors the recognition of a one of the letter identifying words identifying of the two or more letters represented by the utterance'"'"'s associated key press signal.
  - 20. A method as in claim 14 further including:
    - responding to a key press signal by displaying in user-perceivable form a set of letter identifying words containing one or more words starting with each letter represented by the pressed key; and
      
      favoring the recognition of an utterance made after the display of the pressed key'"'"'s associated letter identifying words as corresponding to one of said displayed words.
  - 21. A method as in claim 20 wherein:
    - the method is performed on a telephone having a display; and
      
      the outputting of the subset of letter identifying words is performed by displaying such words on the telephone'"'"'s display.

22. A computerized method of performing alphabetic input using user selectable keys:
- receiving a sequence of one or more key-press signals, each of which indicates which of a plurality of keys has been selected by a user for its position in the sequence, where each of the keys represents a plurality of letters;
  
  responding to a given key-press signal by displaying in user-perceivable form a separate letter identifying word for each of said the plurality of letters represented by the given key-press'"'"'s key;
  
  receiving an acoustic representation of a key-disambiguating utterance made in association with the given key-press signal;
  
  performing speech recognition upon the acoustic representation of the key-disambiguation utterance, which recognition favors recognition of one of said letter identifying displayed in association with the given key-press signal;
  
  responding to a recognition of the given key-press signal'"'"'s associated key-disambiguation utterance as a given letter identifying word by increasing the probability the letter output in association with the given key-press signal will be that associated with the recognized letter identifying word; and
  
  outputting a sequence of one or more alphabetic characters corresponding to the sequence of key-press signals, in which each character in the sequence corresponds to one of the set of letters represented by a corresponding key-press signal in said sequence of key-press signals, as affected by changes in probability caused by said recognition of said key-disambiguation utterance.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cerence Operating Company (Cerence Inc.)
Original Assignee
Voice Signal Technologies Incorporated (Microsoft Corporation)
Inventors
Roth, Daniel L., Johnston, David F., Cohen, Jordan R.
Primary Examiner(s)
Hudspeth; David R
Assistant Examiner(s)
ALBERTALLI, BRIAN LOUIS

Application Number

US10/950,090
Publication Number

US 20050043947A1
Time in Patent Office

1,677 Days
Field of Search

None
US Class Current

704/270
CPC Class Codes

G10L 15/19 Grammatical context, e.g. d...

G10L 15/22 Procedures used during a sp...

Speech recognition using ambiguous or phone key spelling and/or filtering

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

203 Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition using ambiguous or phone key spelling and/or filtering

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

203 Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links