Method and system for text-to-speech synthesis with personalized voice
First Claim
1. A method for text-to-speech synthesis with personalized voice, comprising:
- receiving, at a mobile communications device operated by a user, incidental audio speech data from a sending device operated by a remote input speaker, wherein the speech data of the remote input speaker is received over a first network communication link during a voice communication between the remote input speaker and the user of the mobile communications device;
generating, by the user'"'"'s mobile communications device, a voice dataset for the remote input speaker based, at least in part, on the incidental audio speech data;
receiving, over a second network communication link, text data at the user'"'"'s mobile communications device, wherein the text data is sent from the sending device subsequent to the voice communication; and
converting, by the user'"'"'s mobile communications device, the text data to synthesized speech, at least in part, using the voice dataset to personalize the synthesized speech to sound like the remote input speaker.
8 Assignments
0 Petitions
Accused Products
Abstract
A method and system are provided for text-to-speech synthesis with personalized voice. The method includes receiving an incidental audio input (403) of speech in the form of an audio communication from an input speaker (401) and generating a voice dataset (404) for the input speaker (401). The method includes receiving a text input (411) at the same device as the audio input (403) and synthesizing (312) the text from the text input (411) to synthesized speech including using the voice dataset (404) to personalize the synthesized speech to sound like the input speaker (401). In addition, the method includes analyzing (316) the text for expression and adding the expression (315) to the synthesized speech. The audio communication may be part of a video communication (453) and the audio input (403) may have an associated visual input (455) of an image of the input speaker. The synthesis from text may include providing a synthesized image personalized to look like the image of the input speaker with expressions added from the visual input (455).
-
Citations
20 Claims
-
1. A method for text-to-speech synthesis with personalized voice, comprising:
-
receiving, at a mobile communications device operated by a user, incidental audio speech data from a sending device operated by a remote input speaker, wherein the speech data of the remote input speaker is received over a first network communication link during a voice communication between the remote input speaker and the user of the mobile communications device; generating, by the user'"'"'s mobile communications device, a voice dataset for the remote input speaker based, at least in part, on the incidental audio speech data; receiving, over a second network communication link, text data at the user'"'"'s mobile communications device, wherein the text data is sent from the sending device subsequent to the voice communication; and converting, by the user'"'"'s mobile communications device, the text data to synthesized speech, at least in part, using the voice dataset to personalize the synthesized speech to sound like the remote input speaker. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A computer program product stored on a non-transitory computer readable storage medium for text-to-speech synthesis, comprising computer readable program code means for performing the steps of:
-
receiving, at a mobile communications device operated by a user, incidental audio speech data from a sending device operated by a remote input speaker, wherein the speech data of the remote input speaker is received over a first network communication link during a voice communication between the remote input speaker and the user of the mobile communications device; generating, by the user'"'"'s mobile communications device, a voice dataset for the remote input speaker based, at least in part, on the incidental audio speech data; receiving, over a second network communication link, text data at the user'"'"'s mobile communications device, wherein the text data is sent from the sending device subsequent to the voice communication; and converting, by the user'"'"'s mobile communications device, the text data to synthesized speech, at least in part, using the voice dataset to personalize the synthesized speech to sound like the remote input speaker.
-
-
13. A mobile communications device capable of text-to-speech synthesis with personalized voice, comprising:
-
an audio communication input for receiving over a first network communication link incidental audio speech data from a sending device operated by a remote input speaker during a voice communication between the remote input speaker and a user of the mobile communications device; a processor configured to generate, at the user'"'"'s mobile communications device, a voice dataset for the remote input speaker based, at least in part, on the incidental audio speech data; at least one input for receiving over a second network communication link text data at the user'"'"'s mobile communication device, wherein the text data is sent from the sending device subsequent to the voice communication; and a text-to-speech synthesizer for producing synthesized speech by converting the text data to synthesized speech to sound like the remote input speaker, at least in part, using the voice dataset. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
-
Specification