System and method of using pre-enrolled speech sub-units for efficient speech synthesis
First Claim
1. A speech recognition system, comprising:
- speech receiving means for receiving a spoken token from a speaker;
storage means for storing instances of a plurality of lefemes corresponding to a plurality of tokens, said lefemes comprising portions of phones in a given context;
means for matching said spoken token to ones of said plurality lefemes stored in said storage means; and
means for concatenating said ones of said plurality of lefemes to synthesize a recognized token in the speaker'"'"'s voice.
1 Assignment
0 Petitions
Accused Products
Abstract
A speech recognition system is disclosed useful in, for example, hands-free voice telephone dialing applications. The system will match a spoken word (token) to one previously enrolled in the system. The system will thereafter synthesize or replay the recognized word so that the speaker can confirm that the recognized word is indeed the correct word before further action is taken. In the case of voice activated dialing, this avoids wrong numbers. The token itself is not explicitly recorded; rather, only the lefemes may be recorded from which the token can be reconstructed for playback. This greatly reduces the amount of disk space that is needed for the database as well as provides the ability to reconstruction data in real time for synthesis use by a local name recognition machine.
76 Citations
14 Claims
-
1. A speech recognition system, comprising:
-
speech receiving means for receiving a spoken token from a speaker; storage means for storing instances of a plurality of lefemes corresponding to a plurality of tokens, said lefemes comprising portions of phones in a given context; means for matching said spoken token to ones of said plurality lefemes stored in said storage means; and means for concatenating said ones of said plurality of lefemes to synthesize a recognized token in the speaker'"'"'s voice. - View Dependent Claims (2, 3)
-
-
4. A method for synthesizing speech comprising the steps of:
-
inputting by a speaker a plurality of speech tokens comprising an enrolled vocabulary; decoding each of said plurality of speech tokens into a plurality of lefemes associated with each of said plurality of speech tokens in said enrolled vocabulary, said lefemes comprising portions of phones in a given context; storing instances of said plurality of lefemes as representative waveforms; inputting a token to be recognized; decoding said token to be recognized into a plurality of lefemes; matching said lefemes of said token to be recognized to ones of said stored plurality of lefemes; and concatenating ones of said stored plurality of representative waveforms to synthesize a recognized token in the speaker'"'"'s voice. - View Dependent Claims (5, 6, 7, 8, 9, 10, 11)
-
-
12. A speech recognition and synthesis system, comprising:
-
speech receiving means for receiving spoken words from at least a first speaker; a decoder for decoding the words into a plurality of associated lefeme waveforms, said lefeme waveforms comprising portions of phones in a given context; storage means for storing instances said lefeme waveforms comprising an enrolled vocabulary; means for inputting a word to be recognized from said enrolled vocabulary by a second speaker, said decoder decoding the word to be recognized into associated lefeme waveforms; means for finding closest match lefeme waveforms between said associated lefeme waveforms and ones of said lefeme waveforms stored in said storage means; and means for concatenating said closest match lefeme waveforms from said storage means to synthesize a recognized word, wherein when the first speaker and the second speaker are the same person, said recognized word is synthesized in the person'"'"'s own voice. - View Dependent Claims (13, 14)
-
Specification