Method and system for text-to-speech synthesis with personalized voice
First Claim
1. A method for text-to-speech synthesis, comprising:
- receiving, at a first device and from a second device, incidental audio speech data over a first network communication link, wherein the incidental audio speech data comprises speech of an operator of the second device recorded during an audio communication in which the operator of the second device participates;
generating, by the first device, a voice dataset for the operator based, at least in part, on the incidental audio speech data;
receiving, at the first device, text data from the second device over a second network communication link subsequent to receiving the incidental audio speech data;
converting, by the first device, the text data to synthesized speech, at least in part, using the voice dataset to personalize the synthesized speech to sound like the operator of the second device.
8 Assignments
0 Petitions
Accused Products
Abstract
A method and system are provided for text-to-speech synthesis with personalized voice. The method includes receiving an incidental audio input (403) of speech in the form of an audio communication from an input speaker (401) and generating a voice dataset (404) for the input speaker (401). The method includes receiving a text input (411) at the same device as the audio input (403) and synthesizing (312) the text from the text input (411) to synthesized speech including using the voice dataset (404) to personalize the synthesized speech to sound like the input speaker (401). In addition, the method includes analyzing (316) the text for expression and adding the expression (315) to the synthesized speech. The audio communication may be part of a video communication (453) and the audio input (403) may have an associated visual input (455) of an image of the input speaker. The synthesis from text may include providing a synthesized image personalized to look like the image of the input speaker with expressions added from the visual input (455).
-
Citations
20 Claims
-
1. A method for text-to-speech synthesis, comprising:
-
receiving, at a first device and from a second device, incidental audio speech data over a first network communication link, wherein the incidental audio speech data comprises speech of an operator of the second device recorded during an audio communication in which the operator of the second device participates; generating, by the first device, a voice dataset for the operator based, at least in part, on the incidental audio speech data; receiving, at the first device, text data from the second device over a second network communication link subsequent to receiving the incidental audio speech data; converting, by the first device, the text data to synthesized speech, at least in part, using the voice dataset to personalize the synthesized speech to sound like the operator of the second device. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A first communication device comprising:
-
at least one processor; and memory elements, wherein the at least one processor is configured to; receive from a second communication device incidental audio speech data over a first network communication link, wherein the incidental audio speech data comprises speech of an operator of the second device recorded during an audio communication in which the operator of the second communication device participates; generate a voice dataset for the operator based, at least in part, on the incidental audio speech data; receive text data from the second communication device over a second network communication link subsequent to receiving the incidental audio speech data; convert the text data to synthesized speech, at least in part, using the voice dataset to personalize the synthesized speech to sound like the operator of the second device. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification