Using a spoken utterance for disambiguation of spelling inputs into a speech recognition system
First Claim
1. A computer-implemented method of verifying a speech input comprising:
- determining pronunciation data for a received user spoken utterance specifying a word;
speech recognizing a plurality of further user spoken utterances specifying characters of the word, wherein each further user spoken utterance specifies one individual character of the word;
generating an N-best list, comprising N-best character matches, for each received further user spoken utterance specifying a character of the word, wherein each N-best list is associated with one of the user spoken utterances specifying a character and a number of the N-best lists generated corresponds to a number of characters in the word;
automatically generating a grammar comprising word candidates using the N-best list for each character, wherein each word of the grammar is formed using one letter selected from each N-best list existing at a same level of each respective N-best list and in an order corresponding to an order in which each user spoken utterance specifying a character, and being associated with an N-best list, is received;
automatically generating pronunciation data for each word in the grammar;
comparing the pronunciation data from the user spoken utterance specifying the word with the pronunciation data for the word candidates of the grammar to determine at least one match; and
storing the match in memory as a recognition result for the word.
3 Assignments
0 Petitions
Accused Products
Abstract
A method of verifying a speech input can include determining pronunciation data for a received user spoken utterance specifying a word and speech recognizing further user spoken utterances specifying individual characters of the word. An N-best list can be generated for each received user spoken utterance specifying a character. A grammar can be automatically generated that includes word candidates using the N-best list for each user spoken utterance specifying a character. Pronunciation data for each word in the grammar can be generated automatically. The pronunciation data from the user spoken utterance specifying the word can be compared with pronunciation data for the word candidates of the grammar to determine at least one match.
24 Citations
3 Claims
-
1. A computer-implemented method of verifying a speech input comprising:
-
determining pronunciation data for a received user spoken utterance specifying a word; speech recognizing a plurality of further user spoken utterances specifying characters of the word, wherein each further user spoken utterance specifies one individual character of the word; generating an N-best list, comprising N-best character matches, for each received further user spoken utterance specifying a character of the word, wherein each N-best list is associated with one of the user spoken utterances specifying a character and a number of the N-best lists generated corresponds to a number of characters in the word; automatically generating a grammar comprising word candidates using the N-best list for each character, wherein each word of the grammar is formed using one letter selected from each N-best list existing at a same level of each respective N-best list and in an order corresponding to an order in which each user spoken utterance specifying a character, and being associated with an N-best list, is received; automatically generating pronunciation data for each word in the grammar; comparing the pronunciation data from the user spoken utterance specifying the word with the pronunciation data for the word candidates of the grammar to determine at least one match; and storing the match in memory as a recognition result for the word.
-
-
2. A computer-implemented method of processing a speech input comprising:
-
selecting a domain of words; determining pronunciation data for a word specified by a received user spoken utterance; comparing the pronunciation data for the word with a list of common words of the domain to find a match; when a match is found, discontinuing further speech processing; when a match is not found, speech recognizing a plurality of further user spoken utterances specifying characters of the word, wherein each of the plurality of further user spoken utterances specifies one individual character of the word for comparison to the pronunciation data, generating an N-best list comprising N-best character matches for each received user spoken utterance specifying a character and a number of N-best lists corresponding to a number of characters in the word are generated, wherein each N-best list is associated with one of the user spoken utterances specifying a character, automatically generating a grammar comprising word candidates using the N-best list for each character, wherein each word of the grammar is formed using one letter selected from each N-best list existing at a same level of each respective N-best list and in an order corresponding to an order in which each user spoken utterance specifying a character, and being associated with an N-best list, is received, automatically generating pronunciation data for each word in the grammar, and comparing the pronunciation data from the user spoken utterance specifying the word with the pronunciation data for the word candidates of the grammar to determine at least one match; and storing the match in memory as a recognition result for the word.
-
-
3. A machine readable storage, having stored thereon a computer program having a plurality of code sections executable by a machine for causing the machine to perform the steps of:
-
determining pronunciation data for a received user spoken utterance specifying a word; speech recognizing a plurality of further user spoken utterances specifying characters of the word, wherein each further user spoken utterance specifies one individual character of the word; generating an N-best list comprising N-best character matches for each received further user spoken utterance specifying a character of the word, wherein each N-best list is associated with one of the user spoken utterances specifying a character and a number of the N-best lists generated corresponds to a number of characters in the word; automatically generating a grammar comprising word candidates using the N-best list for each character, wherein each word of the grammar is formed using one letter selected from each N-best list existing at a same level of each respective N-best list and in an order corresponding to an order in which each user spoken utterance specifying a character, and being associated with an N-best list, is received; automatically generating pronunciation data for each word in the grammar; and comparing the pronunciation data from the user spoken utterance specifying the word with the pronunciation data for the word candidates of the grammar to determine at least one match.
-
Specification