Expanding an effective vocabulary of a speech recognition system
First Claim
1. A method of expanding an effective active vocabulary of a speech recognition system, the method comprising:
- receiving an input user utteranceusing a speech recognizer to perform speech recognition on said user utterance to produce one or more recognition candidates, the speech recognition comprising comparing digital values representative of the user utterance to a set of acoustic models representative of an active vocabulary of the system, the set of acoustic models including models of words and models of word fragments,receiving the recognition candidates from the speech recognizer, andwhen a received recognition candidate includes a word fragment;
determining whether the word fragment may be combined with one or more adjacent word fragments or words to form a proposed word included in a backup dictionary of the speech recognition system, wherein forming the proposed word includes using a spelling rule associated with the word fragment that causes the spelling of the proposed word to differ from a spelling that would result from merely concatenating the particular word fragment with the one or more adjacent word fragments or words;
if the word fragment may be combined with one or more adjacent word fragments or words to form a proposed word included in a backup dictionary of the speech recognition system, modifying the recognition candidate to substitute the proposed word for the word fragment and the one or more adjacent word fragments or words used to form the proposed word; and
if the word fragment may not be combined with one or more adjacent word fragments or words to form a proposed word included in a backup dictionary of the speech recognition system, discarding the recognition candidate.
8 Assignments
0 Petitions
Accused Products
Abstract
The invention provides techniques for creating and using fragmented word models to increase the effective size of an active vocabulary of a speech recognition system. The active vocabulary represents all words and word fragments that the speech recognition system is able to recognize. Each word may be represented by a combination of acoustic models. As such, the active vocabulary represents the combinations of acoustic models that the speech recognition system may compare to a user'"'"'s speech to identify acoustic models that best match the user'"'"'s speech. The effective size of the active vocabulary may be increased by dividing words into constituent components or fragments (for example, prefixes, suffixes, separators, infixes, and roots) and including each component as a separate entry in the active vocabulary. Thus, for example, a list of words and their plural forms (for example, “book, books, cook, cooks, hook, hooks, look and looks”) may be represented in the active vocabulary using the words (for example, “book, cook, hook and look”) and an entry representing the suffix that makes the words plural (for example, “+s”, where the “+” preceding the “s” indicates that “+s” is a suffix). For a large list of words, and ignoring the entry associated with the suffix, this technique may reduce the number of vocabulary entries needed to represent the list of words considerably.
187 Citations
43 Claims
-
1. A method of expanding an effective active vocabulary of a speech recognition system, the method comprising:
-
receiving an input user utterance using a speech recognizer to perform speech recognition on said user utterance to produce one or more recognition candidates, the speech recognition comprising comparing digital values representative of the user utterance to a set of acoustic models representative of an active vocabulary of the system, the set of acoustic models including models of words and models of word fragments, receiving the recognition candidates from the speech recognizer, and when a received recognition candidate includes a word fragment; determining whether the word fragment may be combined with one or more adjacent word fragments or words to form a proposed word included in a backup dictionary of the speech recognition system, wherein forming the proposed word includes using a spelling rule associated with the word fragment that causes the spelling of the proposed word to differ from a spelling that would result from merely concatenating the particular word fragment with the one or more adjacent word fragments or words; if the word fragment may be combined with one or more adjacent word fragments or words to form a proposed word included in a backup dictionary of the speech recognition system, modifying the recognition candidate to substitute the proposed word for the word fragment and the one or more adjacent word fragments or words used to form the proposed word; and if the word fragment may not be combined with one or more adjacent word fragments or words to form a proposed word included in a backup dictionary of the speech recognition system, discarding the recognition candidate. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32)
-
-
33. A method of recognizing speech, the method comprising:
- receiving an input user utterance
using a speech recognizer to perform speech recognition on said user utterance to produce a set of one or more recognition candidates, the speech recognition comprising comparing digital values representative of the user utterance to a set of acoustic models representative of an active vocabulary of the system, the set of acoustic models including models of words, models of roots that are not words, and models of affixes that are not words, the affixes including prefixes and suffixes, receiving the recognition candidates from the speech recognizer, and when a received recognition candidate includes an affix; combining the affix with one or more adjacent words, roots, or other affixes to form a new word, wherein forming the new word includes using a spelling rule associated with the affix that causes the spelling of the new word to differ from a spelling that would result from merely concatenating the affix with the one or more adjacent words, roots, or other affixes; and modifying the recognition candidate to substitute the new word for the affix and the one or more adjacent words, roots, or other affixes used to form the new word.
- receiving an input user utterance
-
34. A computer-implemented speech recognition system that uses an expanded effective active vocabulary, the system comprising:
-
a storage device configured to store an active vocabulary that includes multiple entries corresponding to words, commands, and word fragments; and a processor configured to; receive data representing a user utterance, produce one or more recognition candidates, by comparing digital values representative of the user utterance to a set of acoustic models representative of the active vocabulary of the system, the set of acoustic models including models of words and models of word fragments, when a produced recognition candidate includes a word fragment; determine whether the word fragment may be combined with one or more adjacent word fragments or words to form a proposed word included in a backup dictionary of the speech recognition system, wherein forming the proposed word includes using a spelling rule associated with the word fragment that causes the spelling of the proposed word to differ from a spelling that would result from merely concatenating the particular word fragment with the one or more adjacent word fragments or words; if the word fragment may be combined with one or more adjacent word fragments or words to form a proposed word included in a backup dictionary, modify the recognition candidate to substitute the proposed word for the word fragment and the one or more adjacent word fragments or words used to form the proposed word; and if the word fragment may not be combined with one or more adjacent word fragments or words to form a proposed word included in a backup dictionary of the speech recognition system, discard the recognition candidate. - View Dependent Claims (35, 36, 37, 38)
-
-
39. Computer software, residing on a computer readable medium, for a speech recognition system that uses an expanded effective active vocabulary to recognize words, and commands, the computer software comprising instructions for causing a computer to perform the following operations:
-
receive data representing a user utterance, produce one or more recognition candidates, by comparing digital values representative of the user utterance to a set of acoustic models representative of an active vocabulary of the system, the set of acoustic models including models of words and models of word fragments, when a produced recognition candidate includes a word fragment; determine whether the word fragment may be combined with one or more adjacent word fragments or words to form a proposed word included in a backup dictionary of the speech recognition system, wherein forming the proposed word includes using a spelling rule associated with the word fragment that causes the spelling of the proposed word to differ from a spelling that would result from merely concatenating the particular word fragment with the one or more adjacent word fragments or words; if the word fragment may be combined with one or more adjacent word fragments or words to form a proposed word included in a backup dictionary, modify the recognition candidate to substitute the proposed word for the word fragment and the one or more adjacent word fragments or words used to form the proposed word; and if the word fragment may not be combined with one or more adjacent word fragments or words to form a proposed word included in a backup dictionary of the speech recognition system, discard the recognition candidate. - View Dependent Claims (40, 41, 42, 43)
-
Specification