Apparatus, methods, and programming for speech synthesis via bit manipulations of compressed database
First Claim
1. A method of performing text-to-speech synthesis comprising:
- storing a plurality of encoded speech snippets, each including a sequence of one or more encoded sound representations produced by linear predictive encoding of speech sounds corresponding to a sequence of one or more phonemes, where a plurality of said snippets correspond to sequences of phonemes that are shorter than any of the words in which such a sequence of phonemes occurs;
storing a desired phonetic representation, indicating a sequence of phonemes to be generated as speech sounds; and
storing a desired pitch contour, indicating which of different possible pitch values are to be used in the generation of the speech sounds of different phonemes in the phonetic representation;
selecting from said stored snippets a sequence of such snippets that correspond to the sequence of phonemes in the phonetic representation and concatenating those snippets into a synthesized sequence of such snippets;
altering the encoded representations associated with one or more of the selected snippets associated with said synthesized sequence to cause the pitch values of the speech sounds represented by each such encoded representation to more closely match the pitch values indicated for the selected snippet'"'"'s corresponding one or more phonemes in the pitch contour; and
using a linear predictive decoder to convert the synthesized sequences of snippets, including said altered snippets, into a waveform signal representing a sequence of speech sound corresponding to the phonetic representation and the pitch contour.
1 Assignment
0 Petitions
Accused Products
Abstract
Text-to-speech synthesis modifies the pitch of the sounds it concatenates to generate speech, when such sounds are in compressed, coded form, so as to make them sound better together. The pitch, duration, and energy of such concatenated sounds can be altered to better match, respectively, pitch, duration, and/or energy contours generated from phonetic spelling of the speech to be synthesized, which can, in turn, be derived from the text to be synthesized. The synthesized speech can be generated from the encoded sound of sub-word snippets as well as of one or more whole words. The duration of concatenated sounds can be changed by inserting or deleting sound frames associated with individual snippets. Such text-to-speech can be used to say words recognized by speech recognition, such as to provide feedback on the recognition. Such text-to-speech synthesis can be used in portable devices such as cellphones, PDAs, and/or wrist phones.
-
Citations
13 Claims
-
1. A method of performing text-to-speech synthesis comprising:
-
storing a plurality of encoded speech snippets, each including a sequence of one or more encoded sound representations produced by linear predictive encoding of speech sounds corresponding to a sequence of one or more phonemes, where a plurality of said snippets correspond to sequences of phonemes that are shorter than any of the words in which such a sequence of phonemes occurs;
storing a desired phonetic representation, indicating a sequence of phonemes to be generated as speech sounds; and
storing a desired pitch contour, indicating which of different possible pitch values are to be used in the generation of the speech sounds of different phonemes in the phonetic representation;
selecting from said stored snippets a sequence of such snippets that correspond to the sequence of phonemes in the phonetic representation and concatenating those snippets into a synthesized sequence of such snippets;
altering the encoded representations associated with one or more of the selected snippets associated with said synthesized sequence to cause the pitch values of the speech sounds represented by each such encoded representation to more closely match the pitch values indicated for the selected snippet'"'"'s corresponding one or more phonemes in the pitch contour; and
using a linear predictive decoder to convert the synthesized sequences of snippets, including said altered snippets, into a waveform signal representing a sequence of speech sound corresponding to the phonetic representation and the pitch contour. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
Specification