Multi-stage large vocabulary speech recognition system and method
First Claim
1. A method of speech recognition comprising:
- receiving a spoken utterance by a user;
providing a vocabulary word storage memory comprising a first set of orthographies potentially recognizable as the spoken utterance, wherein the first set of orthographies comprises a first subset of an entire vocabulary comprising orthographies predetermined to have a high probability of matching the spoken utterance;
providing a spoken utterance storage memory for storing the spoken utterance;
processing the spoken utterance to determine whether one of the orthographies stored in the first set corresponds to the spoken utterance, thereby indicating a recognized word;
prompting the user for a keyword input if none of the orthographies in the first set corresponds to the spoken utterance, wherein the keyword input designates one of a plurality of second sets of orthographies potentially recognizable as the spoken utterance, wherein each second set comprises a predetermined subset of the entire vocabulary;
receiving the keyword input;
comparing the keyword input to the first set of orthographies to determine the second set of orthographies; and
determining whether one of the orthographies stored in the second set corresponds to the spoken utterance, thereby indicating a recognized word.
6 Assignments
0 Petitions
Accused Products
Abstract
Multiple processing stages are provided with different vocabulary databases to improve processing time, efficiency, and accuracy in speech recognition. The entire vocabulary is divided into smaller vocabulary subsets, which are associated with particular keywords. A small vocabulary subset is generated or retrieved based on certain information, such as a calling party'"'"'s locality. A user is prompted to provide input information, such as the locality in which a business whose phone number is requested is located, in the form of a spoken utterance to the system. If the utterance matches one of the entries in the initial small vocabulary subset, then the utterance is considered to be recognizable. If the utterance is not recognizable when compared to the initial small vocabulary subset, then the utterance is stored for later use. The user is then prompted for a keyword related to another subset of words in which his initial utterance may be found. A vocabulary subset associated with the received keyword is generated or retrieved. The initial stored utterance is then retrieved and compared to the newly loaded vocabulary subset. If the utterance matches one of the entries in the newly loaded vocabulary subset, then the utterance is recognizable. Otherwise, it is determined that the initial utterance was unrecognizable, and the user is prompted to repeat the initial utterance.
-
Citations
16 Claims
-
1. A method of speech recognition comprising:
-
receiving a spoken utterance by a user;
providing a vocabulary word storage memory comprising a first set of orthographies potentially recognizable as the spoken utterance, wherein the first set of orthographies comprises a first subset of an entire vocabulary comprising orthographies predetermined to have a high probability of matching the spoken utterance;
providing a spoken utterance storage memory for storing the spoken utterance;
processing the spoken utterance to determine whether one of the orthographies stored in the first set corresponds to the spoken utterance, thereby indicating a recognized word;
prompting the user for a keyword input if none of the orthographies in the first set corresponds to the spoken utterance, wherein the keyword input designates one of a plurality of second sets of orthographies potentially recognizable as the spoken utterance, wherein each second set comprises a predetermined subset of the entire vocabulary;
receiving the keyword input;
comparing the keyword input to the first set of orthographies to determine the second set of orthographies; and
determining whether one of the orthographies stored in the second set corresponds to the spoken utterance, thereby indicating a recognized word. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
comparing the spoken utterance to each orthography in the first set or one of the plurality of second sets to generate a score;
determining the orthography having the highest score; and
determining if the highest score is at least equal to a predetermined threshold to determine whether one of the orthographies stored in the first set or in one of the plurality of second sets corresponds to the spoken utterance.
-
-
6. The method according to claim 5, wherein the orthography corresponding to the highest score and having a score at least equal to a predetermined threshold is determined to be the recognized word.
-
7. The method according to claim 6, further comprising providing the recognized word to another program as input.
-
8. The method according to claim 1, further comprising determining the first set of orthographies potentially recognizable as the spoken utterance based on a calling locality of the user.
-
9. A speech recognition system comprising:
-
an input for receiving a spoken utterance by a user;
a vocabulary word storage memory comprising a first set of orthographies and a second set of orthographies, wherein the first set of orthographies is predetermined to have a high probability of matching the spoken utterance and the plurality of second sets of orthographies each comprise a predetermined subset of an entire vocabulary, which is potentially recognizable as the spoken utterance;
a spoken utterance storage memory for storing the spoken utterance;
a processor in operative relationship with the vocabulary word storage memory and the spoken utterance storage memory for processing the spoken utterance to determine whether one of the orthographies stored in the first set corresponds to the spoken utterance, thereby indicating a recognized word; and
a prompt generator for prompting the user for a keyword input, wherein the keyword input designates one of the plurality of second sets that contains the spoken utterance, if none of the orthographies in the first set corresponds to the spoken utterance;
wherein the processor is adapted to compare the keyword input to the first set to determine one of the second sets of orthographies; and
wherein the processor is further adapted to determine whether one of the orthographies stored in the second set corresponds to the spoken utterance, thereby indicating a recognized word. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
Specification