Multi-stage large vocabulary speech recognition system and method

US 6,751,595 B2
Filed: 05/09/2001
Issued: 06/15/2004
Est. Priority Date: 05/09/2001
Status: Expired due to Term

First Claim

Patent Images

1. A method of speech recognition comprising:

receiving a spoken utterance by a user;

providing a vocabulary word storage memory comprising a first set of orthographies potentially recognizable as the spoken utterance, wherein the first set of orthographies comprises a first subset of an entire vocabulary comprising orthographies predetermined to have a high probability of matching the spoken utterance;

providing a spoken utterance storage memory for storing the spoken utterance;

processing the spoken utterance to determine whether one of the orthographies stored in the first set corresponds to the spoken utterance, thereby indicating a recognized word;

prompting the user for a keyword input if none of the orthographies in the first set corresponds to the spoken utterance, wherein the keyword input designates one of a plurality of second sets of orthographies potentially recognizable as the spoken utterance, wherein each second set comprises a predetermined subset of the entire vocabulary;

receiving the keyword input;

comparing the keyword input to the first set of orthographies to determine the second set of orthographies; and

determining whether one of the orthographies stored in the second set corresponds to the spoken utterance, thereby indicating a recognized word.

View all claims

6 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Multiple processing stages are provided with different vocabulary databases to improve processing time, efficiency, and accuracy in speech recognition. The entire vocabulary is divided into smaller vocabulary subsets, which are associated with particular keywords. A small vocabulary subset is generated or retrieved based on certain information, such as a calling party'"'"'s locality. A user is prompted to provide input information, such as the locality in which a business whose phone number is requested is located, in the form of a spoken utterance to the system. If the utterance matches one of the entries in the initial small vocabulary subset, then the utterance is considered to be recognizable. If the utterance is not recognizable when compared to the initial small vocabulary subset, then the utterance is stored for later use. The user is then prompted for a keyword related to another subset of words in which his initial utterance may be found. A vocabulary subset associated with the received keyword is generated or retrieved. The initial stored utterance is then retrieved and compared to the newly loaded vocabulary subset. If the utterance matches one of the entries in the newly loaded vocabulary subset, then the utterance is recognizable. Otherwise, it is determined that the initial utterance was unrecognizable, and the user is prompted to repeat the initial utterance.

Citations

16 Claims

1. A method of speech recognition comprising:
- receiving a spoken utterance by a user;
  
  providing a vocabulary word storage memory comprising a first set of orthographies potentially recognizable as the spoken utterance, wherein the first set of orthographies comprises a first subset of an entire vocabulary comprising orthographies predetermined to have a high probability of matching the spoken utterance;
  
  providing a spoken utterance storage memory for storing the spoken utterance;
  
  processing the spoken utterance to determine whether one of the orthographies stored in the first set corresponds to the spoken utterance, thereby indicating a recognized word;
  
  prompting the user for a keyword input if none of the orthographies in the first set corresponds to the spoken utterance, wherein the keyword input designates one of a plurality of second sets of orthographies potentially recognizable as the spoken utterance, wherein each second set comprises a predetermined subset of the entire vocabulary;
  
  receiving the keyword input;
  
  comparing the keyword input to the first set of orthographies to determine the second set of orthographies; and
  
  determining whether one of the orthographies stored in the second set corresponds to the spoken utterance, thereby indicating a recognized word.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method according to claim 1, further comprising prompting the user for another spoken utterance if none of the orthographies in the plurality of second sets corresponds to the spoken utterance.
  - 3. The method according to claim 1, further comprising providing a plurality of subsets of orthographies potentially recognizable as the spoken utterance to the vocabulary word storage memory.
  - 4. The method according to claim 1, further comprising prompting the user for another spoken utterance if none of the orthographies in the determined second set corresponds to the spoken utterance.
  - 5. The method according to claim 1, further comprising:
6. The method according to claim 5, wherein the orthography corresponding to the highest score and having a score at least equal to a predetermined threshold is determined to be the recognized word.
7. The method according to claim 6, further comprising providing the recognized word to another program as input.
8. The method according to claim 1, further comprising determining the first set of orthographies potentially recognizable as the spoken utterance based on a calling locality of the user.

9. A speech recognition system comprising:
- an input for receiving a spoken utterance by a user;
  
  a vocabulary word storage memory comprising a first set of orthographies and a second set of orthographies, wherein the first set of orthographies is predetermined to have a high probability of matching the spoken utterance and the plurality of second sets of orthographies each comprise a predetermined subset of an entire vocabulary, which is potentially recognizable as the spoken utterance;
  
  a spoken utterance storage memory for storing the spoken utterance;
  
  a processor in operative relationship with the vocabulary word storage memory and the spoken utterance storage memory for processing the spoken utterance to determine whether one of the orthographies stored in the first set corresponds to the spoken utterance, thereby indicating a recognized word; and
  
  a prompt generator for prompting the user for a keyword input, wherein the keyword input designates one of the plurality of second sets that contains the spoken utterance, if none of the orthographies in the first set corresponds to the spoken utterance;
  
  wherein the processor is adapted to compare the keyword input to the first set to determine one of the second sets of orthographies; and
  
  wherein the processor is further adapted to determine whether one of the orthographies stored in the second set corresponds to the spoken utterance, thereby indicating a recognized word.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The system according to claim 9, wherein the prompt generator is adapted to prompt the user for another spoken utterance if none of the orthographies in the second set of orthographies corresponds to the spoken utterance.
  - 11. The system according to claim 9, wherein the processor is adapted to compare the spoken utterance to each orthography in the first or second set to generate a score, determine the orthography providing the highest score, and determine if the highest score is at least equal to a predetermined threshold to determine whether one of the orthographies stored in the first or second set corresponds to the spoken utterance.
  - 12. The system according to claim 11, wherein the orthography corresponding to the highest score and having a score at least equal to a predetermined threshold is determined to be the recognized word.
  - 13. The system according to claim 9, wherein the processor is adapted to provide the recognized word as an input to a program.
  - 14. The system according to claim 9, wherein the first set of orthographies potentially recognizable as the spoken utterance is determined based on a calling locality of the user.
  - 15. The system according to claim 9, wherein the prompt generator is adapted to prompt the user for another spoken utterance if none of the orthographies in the second set of orthographies corresponds to the spoken utterance.
  - 16. The system according to claim 9, wherein the vocabulary word storage memory comprises a plurality of predetermined subsets of orthographies potentially recognizable as the spoken utterance.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
Bellsouth Intellectual Property Corporation (AT&T, Inc.)
Inventors
Busayapongchai, Senis, Chintrakulchai, Pichet
Primary Examiner(s)
MCFADDEN, SUSAN IRIS

Application Number

US09/852,180
Publication Number

US 20020169600A1
Time in Patent Office

1,133 Days
Field of Search

704/251, 704/275, 704/270, 379/88.03
US Class Current

704/275
CPC Class Codes

G10L 15/22 Procedures used during a sp...

G10L 2015/228 of application context

Multi-stage large vocabulary speech recognition system and method

First Claim

6 Assignments

0 Petitions

Accused Products

Abstract

Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

Multi-stage large vocabulary speech recognition system and method

First Claim

6 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links