System and method of using pre-enrolled speech sub-units for efficient speech synthesis

US 6,041,300 A
Filed: 03/21/1997
Issued: 03/21/2000
Est. Priority Date: 03/21/1997
Status: Expired due to Fees

First Claim

Patent Images

1. A speech recognition system, comprising:

speech receiving means for receiving a spoken token from a speaker;

storage means for storing instances of a plurality of lefemes corresponding to a plurality of tokens, said lefemes comprising portions of phones in a given context;

means for matching said spoken token to ones of said plurality lefemes stored in said storage means; and

means for concatenating said ones of said plurality of lefemes to synthesize a recognized token in the speaker'"'"'s voice.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech recognition system is disclosed useful in, for example, hands-free voice telephone dialing applications. The system will match a spoken word (token) to one previously enrolled in the system. The system will thereafter synthesize or replay the recognized word so that the speaker can confirm that the recognized word is indeed the correct word before further action is taken. In the case of voice activated dialing, this avoids wrong numbers. The token itself is not explicitly recorded; rather, only the lefemes may be recorded from which the token can be reconstructed for playback. This greatly reduces the amount of disk space that is needed for the database as well as provides the ability to reconstruction data in real time for synthesis use by a local name recognition machine.

76 Citations

View as Search Results

14 Claims

1. A speech recognition system, comprising:
- speech receiving means for receiving a spoken token from a speaker;
  
  storage means for storing instances of a plurality of lefemes corresponding to a plurality of tokens, said lefemes comprising portions of phones in a given context;
  
  means for matching said spoken token to ones of said plurality lefemes stored in said storage means; and
  
  means for concatenating said ones of said plurality of lefemes to synthesize a recognized token in the speaker'"'"'s voice.
- View Dependent Claims (2, 3)
- - 2. A speech recognition system as recited in claim 1 further including means for taking an action in response to a correctly recognized token.
  - 3. A speech recognition system as recited in claim 2 wherein said means for taking an action comprises dialing a telephone number associated with said correctly recognized token.

4. A method for synthesizing speech comprising the steps of:
- inputting by a speaker a plurality of speech tokens comprising an enrolled vocabulary;
  
  decoding each of said plurality of speech tokens into a plurality of lefemes associated with each of said plurality of speech tokens in said enrolled vocabulary, said lefemes comprising portions of phones in a given context;
  
  storing instances of said plurality of lefemes as representative waveforms;
  
  inputting a token to be recognized;
  
  decoding said token to be recognized into a plurality of lefemes;
  
  matching said lefemes of said token to be recognized to ones of said stored plurality of lefemes; and
  
  concatenating ones of said stored plurality of representative waveforms to synthesize a recognized token in the speaker'"'"'s voice.
- View Dependent Claims (5, 6, 7, 8, 9, 10, 11)
- - 5. A method for synthesizing speech as recited in claim 4 further comprising steps of inputting confirmation that said recognized token is correct.
  - 6. A method for synthesizing speech as recited in claim 5 further comprising the step of dialing a phone number associated with said recognized token if said recognized token is correct.
  - 7. A method for synthesizing speech as recited in claim 4 wherein ones of said plurality of lefemes may be used to synthesis more than one of said recognized tokens.
  - 8. A method for synthesizing speech as recited in claim 4 further comprising the step of transmitting ones of said stored plurality of representative waveforms between a server and a client over a network.
  - 9. A method for synthesizing speech as recited in claim 8 further comprising steps of inputting confirmation that said recognized token is correct.
  - 10. A method for synthesizing speech as recited in claim 4 wherein said token to be recognized is a speech token.
  - 11. A method for synthesizing speech as recited in claim 4 wherein said token to be recognized is text.

12. A speech recognition and synthesis system, comprising:
- speech receiving means for receiving spoken words from at least a first speaker;
  
  a decoder for decoding the words into a plurality of associated lefeme waveforms, said lefeme waveforms comprising portions of phones in a given context;
  
  storage means for storing instances said lefeme waveforms comprising an enrolled vocabulary;
  
  means for inputting a word to be recognized from said enrolled vocabulary by a second speaker, said decoder decoding the word to be recognized into associated lefeme waveforms;
  
  means for finding closest match lefeme waveforms between said associated lefeme waveforms and ones of said lefeme waveforms stored in said storage means; and
  
  means for concatenating said closest match lefeme waveforms from said storage means to synthesize a recognized word, wherein when the first speaker and the second speaker are the same person, said recognized word is synthesized in the person'"'"'s own voice.
- View Dependent Claims (13, 14)
- - 13. A speech recognition and synthesis system, as recited in claim 12 wherein said decoder is a Viterbi decoder.
  - 14. A speech recognition and synthesis system as recited in claim 12 further comprising means for verifying that said recognized word is correct.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Maes, Stephane Herman, Ittycheriah, Abraham Poovakunnel
Primary Examiner(s)
Tkacs, Stephen R.
Assistant Examiner(s)
SOFOCLEOUS, MICHAEL D

Application Number

US08/821,520
Time in Patent Office

1,096 Days
Field of Search

704/255, 704/256, 704/258, 704/266, 704/270
US Class Current

704/255
CPC Class Codes

G10L 15/22   Procedures used during a sp...

G10L 2015/221   Announcement of recognition...

G10L 2015/223   Execution procedure of a sp...

H04M 1/271   controlled by voice recogni...

System and method of using pre-enrolled speech sub-units for efficient speech synthesis

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

76 Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

System and method of using pre-enrolled speech sub-units for efficient speech synthesis

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

76 Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links