System and method for speech-to-text conversion using constrained dictation in a speak-and-spell mode

US 7,676,364 B2
Filed: 03/21/2005
Issued: 03/09/2010
Est. Priority Date: 03/25/2004
Status: Active Grant

First Claim

Patent Images

1. A system for converting speech to text comprising:

a) an interface requesting a user to speak a word and to speak a spelling of said word for each of a plurality of words spoken during a communication;

b) an audio receiving module for receiving said spoken word and spoken spelling of said word;

c) a signal processing module for extracting specific features of the spoken word and said spelled word;

d) a search module that uses said extracted specific features in conjunction with an at least one acoustic module;

e) at least one language module that matches a result presented by the search module; and

f) at least one constrained lexicon that takes the resulting match and using a system module outputting the desired text matching the spoken word.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

For improving the accuracy of a speech recognition system, for the specific task of speech-to-text (dictation style speech) translation, a constrained dictation methodology using speak-and-spell mode is disclosed. The invention is perfectly suited for modern day “text-messaging” applications wherein the number of words being dictated is very small (limited by the 140-160 characters message length constraint). Additionally, the invention adds a control on the way users interact with machines, thereby making the speech recognition task easier and improving system accuracy.

Citations

6 Claims

1. A system for converting speech to text comprising:
- a) an interface requesting a user to speak a word and to speak a spelling of said word for each of a plurality of words spoken during a communication;
  
  b) an audio receiving module for receiving said spoken word and spoken spelling of said word;
  
  c) a signal processing module for extracting specific features of the spoken word and said spelled word;
  
  d) a search module that uses said extracted specific features in conjunction with an at least one acoustic module;
  
  e) at least one language module that matches a result presented by the search module; and
  
  f) at least one constrained lexicon that takes the resulting match and using a system module outputting the desired text matching the spoken word.
- View Dependent Claims (2)
- - 2. The system of claim 1 wherein the at least one language module incorporates a network of phonemes followed by a network of alphabets.

3. A method for converting speech to text comprising:
- a) requesting a user to speak a word and to speak a spelling of said word for each of a plurality of words spoken during a communication;
  
  b) receiving said spoken word and spoken spelling of said word using an audio receiving module;
  
  c) extracting specific features of the spoken word and said spelling of said spoken word using a signal processing module;
  
  d) using said extracted specific features in conjunction with an at least acoustic module using a search module;
  
  e) matching a result presented by the search module using an at least one language module; and
  
  f) taking the results that match an at least one constrained lexicon outputting the desired text matching the spoken word.
- View Dependent Claims (4)
- - 4. The method of claim 3 wherein the at least one language module incorporates a network of phonemes followed by a network of alphabets.

5. A means for converting speech to text comprising:
- a) means for requesting a user to speak a word and to speak a spelling of said word for each of a plurality of words spoken during a communication;
  
  b) means for receiving said spoken word and spoken spelling of said word using an audio receiving module;
  
  c) means for extracting specific features of the spoken word and said spelling of said spoken word using a signal processing module;
  
  d) means for using said extracted specific features in conjunction with an at least one acoustic module using a search module;
  
  e) means for matching a result presented by the search module using an at least one language module; and
  
  f) means for taking the results that match an at least one constrained lexicon outputting the desired text matching the spoken word.
- View Dependent Claims (6)
- - 6. The means of claim 5 wherein the at least one language module incorporates a network of phonemes means followed by a network of alphabet means.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Ashwin Rao
Original Assignee
Ashwin Rao
Inventors
Rao, Ashwin
Primary Examiner(s)
Chawan; Vijay B

Application Number

US11/084,964
Publication Number

US 20050216272A1
Time in Patent Office

1,814 Days
Field of Search

704/251, 704/235, 704/270, 704/275, 704/231, 704/255, 704/E15.04, 704/E15.044, 379/88.14, 379/88.13
US Class Current

704/235
CPC Class Codes

G10L 15/08   Speech classification or se...

G10L 15/193   Formal grammars, e.g. finit...

G10L 2015/228   of application context

System and method for speech-to-text conversion using constrained dictation in a speak-and-spell mode

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

6 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for speech-to-text conversion using constrained dictation in a speak-and-spell mode

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

6 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links