Method and apparatus for continuous spelling speech recognition with early identification

US 5,995,928 A
Filed: 10/02/1996
Issued: 11/30/1999
Est. Priority Date: 10/02/1996
Status: Expired due to Term

First Claim

Patent Images

1. A speech recognition system for recognizing a word based on a continuous spoken spelling of the word before the word has been completely spoken and as each uttered letter of the spelling is received, the system comprising:

a speech recognition engine for;

receiving acoustic input representing one or more continuously uttered letters of at least one word;

determining, based on the acoustic input, hypotheses for the one or more letters of the word as the letters are received; and

periodically, before the word has been completely spoken, outputting an updated string of hypothesized letters as the hypotheses are determined, the updated string representing a partial spelling of the word or words represented by the continuous spelling; and

a spelling engine operably engaged with the speech recognition engine, the spelling engine having access to a vocabulary list and including a confusability matrix representing the confusability between each hypothesized letter and each letter of each word within the vocabulary list, wherein the spelling engine;

receives the periodically updated string of hypothesized letters representing the partial spelling as each letter is uttered; and

compares the string to the words in the vocabulary list to obtain one word from the vocabulary list that best matches the uttered letter.

View all claims

6 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech recognition system capable of recognizing a word or a plurality of words based on a continuous spelling of the word(s) by a user. The system includes a speech recognition engine with a decoder running in forward mode such that the recognition engine continuously outputs an updated string of hypothesized letters based on the letters uttered by the user. The system further includes a spelling engine for comparing each string of hypothesized letters to a vocabulary list of words. The spelling engine returns a best match for the string of hypothesized letters. The system may also include an early identification unit for presenting the user with the best matching word(s) possibly before the user has completed spelling the desired word(s).

280 Citations

14 Claims

1. A speech recognition system for recognizing a word based on a continuous spoken spelling of the word before the word has been completely spoken and as each uttered letter of the spelling is received, the system comprising:
- a speech recognition engine for;
  
  receiving acoustic input representing one or more continuously uttered letters of at least one word;
  
  determining, based on the acoustic input, hypotheses for the one or more letters of the word as the letters are received; and
  
  periodically, before the word has been completely spoken, outputting an updated string of hypothesized letters as the hypotheses are determined, the updated string representing a partial spelling of the word or words represented by the continuous spelling; and
  
  a spelling engine operably engaged with the speech recognition engine, the spelling engine having access to a vocabulary list and including a confusability matrix representing the confusability between each hypothesized letter and each letter of each word within the vocabulary list, wherein the spelling engine;
  
  receives the periodically updated string of hypothesized letters representing the partial spelling as each letter is uttered; and
  
  compares the string to the words in the vocabulary list to obtain one word from the vocabulary list that best matches the uttered letter.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The system of claim 1 wherein the spelling engine further comprises an analysis module for comparing the string of hypothesized letters to the words in the vocabulary list to produce a score value that represents a likelihood of a match between the string of hypothesized letters and a vocabulary word;
    - andwherein the spelling engine matches the string to the one vocabulary word based on the score value.
  - 3. The system of claim 2 wherein the spelling engine further comprises a transition cost table, and wherein the spelling engine compares the string to the words in the vocabulary list to obtain a best match to a word in the vocabulary list by constructing a node grid in memory, wherein the node grid represents a comparison of the string of hypothesized letters to a vocabulary word based on the confusability matrix and the transition cost table.
  - 4. The system of claim 3 wherein the analysis module further comprises:
    - means, using a dynamic programming algorithm, for analyzing the node grid and for generating the score for the vocabulary word associated with that node grid, by (a) for each row of the node grid, computing a maximum score of each of a plurality of paths through the node grid, from a start node in said row to each node in the node grid that is associated with the last hypothesized letter, and (b) selecting one of said paths that has a highest score.
  - 5. The system of claim 3 wherein the spelling engine further comprises a scoring module for determining when a best match has been obtained between a string of letters and a vocabulary word based on the scores generated by the analysis module, based on predefined criteria that include a minimum threshold-value which a score must exceed to qualify as a best score and a minimum delta-value by which the best score must exceed a next highest score.
  - 6. The system of claim 5 further comprising an early identification component controlled by the spelling engine, the early recognition component capable of presenting the best match vocabulary word to the user without waiting for the user to complete the spelling of the desired word.
  - 7. The system of claim 6 wherein the early identification component comprises a database of stored speech corresponding to the words in the vocabulary list and a reply generator for presenting the best match vocabulary word from the database to the user.
  - 8. The system of claim 1 further comprising an early identification component controlled by the spelling engine, the early recognition component capable of presenting the best match vocabulary word to the user without waiting for the user to complete the spelling of the desired word.
  - 9. The system of claim 8 wherein the early identification component comprises a database of stored speech corresponding to the words in the vocabulary list and a reply generator for presenting the best match vocabulary word from the database to the user.

10. A method for recognizing a desired word based on a continuous spoken spelling of that word by a user before the word has been completely spoken and as each uttered letter of the spelling is received, the method comprising the steps of:
- receiving one or more continuously uttered letters of the word from a user;
  
  processing the letters into a speech signal having a format that is compatible with a speech recognition engine as each of the letters is received;
  
  analyzing the speech signal using a speech recognition engine to determine hypotheses for the letters as they are received and to periodically output, before the word has been completely spoken, an updated string of hypothesized letters as hypotheses for the letters are determined, the updated string representing a partial spelling of the word represented by the uttered letters;
  
  comparing the updated strings of hypothesized letters representing partial spellings to a preselected vocabulary comprising a list of words using a spelling engine as the letters are received until a best match is obtained between a given string of hypothesized letters and a single vocabulary word.
- View Dependent Claims (11, 12, 13)
- - 11. The method of claim 10 wherein the spelling engine further comprises a confusability matrix and a transition cost table, and the step of comparing further comprises the steps of:
    - constructing a node grid between the string of hypothesized letters and a vocabulary word based on the confusability matrix and the transition cost table; and
      
      calculating a best score for the node grid in which the best score relates to the likelihood of a match between the string of hypothesized letters and the vocabulary word associated with the node grid.
  - 12. The method of claim 11 wherein the spelling engine further comprises a scoring module having predefined criteria and the step of comparing further comprises the step of contrasting the best score with the predefined criteria such that if the best score satisfies the predefined criteria then a match is recognized by the spelling engine between the string of letters and the vocabulary word associated with the node grid.
  - 13. The method of claim 12 further comprising the step of interrupting the user to present the vocabulary word having the best score that satisfies the predefined criteria.

14. A computer-implemented method for automatically recognizing a word based on an electronic speech signal that represents a continuous spoken spelling of the word before the word has been completely spoken and as each uttered letter of the word is received, the method comprising the steps of:
- receiving and processing one or more continuously uttered letters of the word into the electronic speech signal;
  
  analyzing the electronic speech signal to determine hypotheses for one or more letters of the word as the speech signal and uttered letters are received and before the word has been completely spoken;
  
  periodically outputting, before the word has been completely spoken, an updated string of hypothesized letters as the hypotheses are determined, the updated string representing the portion of the word;
  
  comparing the updated string to a plurality of pre-selected vocabulary words as the letters are received until a best match is obtained between the updated string and one vocabulary word; and
  
  outputting the vocabulary word, before the word has been completely spoken, based on the best match.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
SpeechWorks International, Inc. (Microsoft Corporation)
Original Assignee
SpeechWorks International, Inc. (Microsoft Corporation)
Inventors
Nguyen, John N., Marx, Matthew T.
Primary Examiner(s)
Hudspeth, David R.
Assistant Examiner(s)
Opsasnick, Michael N.

Application Number

US08/720,554
Time in Patent Office

1,154 Days
Field of Search

704/231, 704/235, 704/241, 704/242, 704/251, 704/254, 704/257, 704/276, 704/277, 704/270
US Class Current

704/251
CPC Class Codes

G10L 15/18   using natural language mode...

G10L 15/187   Phonemic context, e.g. pron...

G10L 2015/086   Recognition of spelled words

Method and apparatus for continuous spelling speech recognition with early identification

First Claim

6 Assignments

0 Petitions

Accused Products

Abstract

280 Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for continuous spelling speech recognition with early identification

First Claim

6 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

280 Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links