Method and apparatus for speech synthesis using paralinguistic variation
First Claim
1. A method for producing synthetic speech comprising:
- processing received text using a prosody model to produce prosodic features representative of the linguistic meaning of the received text;
generating an acoustic sequence of speech signals that represents the synthesized speech, the acoustic sequence having the prosodic features representative of the processed text;
determining a prior paralinguistic variation that has been applied to the acoustic sequence before a current paralinguistic variation; and
applying the current paralinguistic variation which includes a mathematical transformation to the acoustic sequence overall, wherein the current paralinguistic variation is determined based on the prior paralinguistic variation, wherein the mathematical transformation does not alter the prosodic features representative of the linguistic meaning of the received text, wherein the current paralinguistic variation is applied to change the sound of the generated acoustic sequence of the speech signals.
2 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus for speech synthesis in a computer-user interface using random paralinguistic variation is described herein. According to one aspect of the present invention, a method for synthesizing speech comprises generating synthesized speech having certain prosodic features. The synthesized speech is further processed by applying a random paralinguistic variation to the acoustic sequence representing the synthesized speech without altering the linguistic prosodic features. According to one aspect of the present invention, the application of the paralinguistic variation is correlated with a previously applied paralinguistic variation to reflect a gradual change in the computer voice, while still maintaining a random quality.
73 Citations
62 Claims
-
1. A method for producing synthetic speech comprising:
-
processing received text using a prosody model to produce prosodic features representative of the linguistic meaning of the received text; generating an acoustic sequence of speech signals that represents the synthesized speech, the acoustic sequence having the prosodic features representative of the processed text; determining a prior paralinguistic variation that has been applied to the acoustic sequence before a current paralinguistic variation; and applying the current paralinguistic variation which includes a mathematical transformation to the acoustic sequence overall, wherein the current paralinguistic variation is determined based on the prior paralinguistic variation, wherein the mathematical transformation does not alter the prosodic features representative of the linguistic meaning of the received text, wherein the current paralinguistic variation is applied to change the sound of the generated acoustic sequence of the speech signals. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. An apparatus for producing synthetic speech comprising:
-
means for receiving text into a circuit; means for processing the received text using a prosody model to produce prosodic features representative of the linguistic meaning of the received text; means for generating an acoustic sequence of speech signals representing the synthesized speech, the acoustic sequence having the prosodic features representative of the processed text; means for determining a prior paralinguistic variation that has been applied to the acoustic sequence before a current paralinguistic variation; and means for applying the current paralinguistic variation which includes a mathematical transformation to the acoustic sequence overall, wherein the current paralinguistic variation is determined based on the prior paralinguistic variation, wherein the mathematical transformation does not alter the prosodic features representative of the linguistic meaning of the received text, wherein the current paralinguistic variation is applied to change the sound of the generated acoustic sequence of the speech signals. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
-
-
27. An apparatus comprising:
a machine-accessible non-transitory medium storing executable instructions which, when executed in a machine, cause the machine to perform a method for synthesizing speech comprising; processing received text using a prosody model to produce prosodic features representative of the linguistic meaning of the received text; generating an acoustic sequence of speech signals representing the synthesized speech, the acoustic sequence having the prosodic features representative of the processed text; determining a prior paralinguistic variation that has been applied to the acoustic sequence before a current paralinguistic variation; and applying the current paralinguistic variation which includes a mathematical transformation to the acoustic sequence overall, wherein the current paralinguistic variation is determined based on the prior paralinguistic variation, wherein the mathematical transformation does not alter the prosodic features representative of the linguistic meaning of the received text, wherein the current paralinguistic variation is applied to change the sound of the generated acoustic sequence of the speech signals. - View Dependent Claims (28, 29, 30, 31, 32, 33, 34)
-
35. An apparatus for speech synthesis comprising:
-
an input for receiving text signals; and a circuit coupled to the input, the circuit configured to synthesize an acoustic sequence representing a synthesized speech, the acoustic sequence having one or more of a plurality of prosodic features representative of the linguistic meaning of the received text signals, to determine a prior paralinguistic variation that has been previously applied to the acoustic sequence; and
to paralinguistically vary the synthesized acoustic sequence overall without altering the plurality of prosodic features that include relative pitch values of speech segments in the generated acoustic sequence, wherein paralinguistically varying the synthesized acoustic sequence comprises selecting at least one current paralinguistic variation from a plurality of paralinguistic variations based on the prior paralinguistic variation; and
applying the selected current paralinguistic variation which includes a mathematical transformation to the synthesized acoustic sequence overall, wherein the mathematical transformation does not alter the plurality of prosodic features representative of the linguistic meaning of the received text signals associated with individual phonemes in the acoustic sequence. - View Dependent Claims (36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47)
-
-
48. A speech synthesis process implemented in a machine comprising:
-
generating an acoustic speech output representing a synthesized speech in response to an input text, wherein the acoustic speech output comprises one or more of a plurality of prosodic features representative of the linguistic meaning of the input text; and varying the generated acoustic speech output without altering the plurality of prosodic features that include relative pitch values of speech segments in the generated acoustic sequence, wherein varying the generated acoustic speech output comprises determining a prior paralinguistic variation that has been previously applied to the acoustic sequence; selecting at least one current paralinguistic variation from a plurality of paralinguistic variations based on the prior paralinguistic variation; and applying the selected current paralinguistic variation which includes a mathematical transformation to the generated acoustic speech output overall, wherein the mathematical transformation does not alter the plurality of prosodic features representative of the linguistic meaning of the input text. - View Dependent Claims (49, 50, 51, 52, 53, 54, 55, 56, 57)
-
-
58. A method for generating a paralinguistic model for use in a speech synthesis system, the method comprising:
developing, by a processor, one or more of a plurality of paralinguistic variations which include a mathematical transformation that, when applied to a synthesized acoustic sequence of the speech signals representing a synthesized speech, the synthesized acoustic sequence having prosodic features representative of a received text, change the sound of the synthesized acoustic sequence while preserving the prosodic features representative of the linguistic meaning of the received text, wherein the developing includes determining, by the processor, a prior paralinguistic variation that has been previously applied to the synthesized acoustic sequence, wherein at least one of the plurality of paralinguistic variations is developed based on the prior paralinguistic variation. - View Dependent Claims (59)
-
60. A speech synthesis system comprising:
-
a voice generation device including a processor for outputting an acoustic phoneme sequence having prosodic features representative of a text;
a duration modeling device that provides relative phoneme durations using a phoneme duration model to the voice generation device;a pitch modeling device coupled to said duration modeling device that, using a pitch model, provides a relative phoneme pitch value for the at least one phoneme to the voice generation device; and a variation modeling device coupled to the voice generation device that receives the acoustic sequence of synthesized speech signals having the prosodic features including the relative phoneme durations and the relative pitch values from the voice generation device;
determines a prior paralinguistic variation that has been previously applied to the acoustic sequence; and
, using a paralinguistic variation model selected based on the prior paralinguistic variation, varies an overall speaking rate and an overall pitch range of the acoustic sequence of synthesized speech signals by applying a mathematical transformation to the acoustic sequence of synthesized speech signals having the prosodic features overall, wherein the mathematical transformation varies the overall speaking rate and the overall pitch rate without altering the prosodic features. - View Dependent Claims (61, 62)
-
Specification