METHOD AND SYSTEM FOR TEXT-TO-SPEECH SYNTHESIS WITH PERSONALIZED VOICE
First Claim
1. A method for text-to-speech synthesis with personalized voice, comprising:
- receiving an incidental audio input of speech in the form of an audio communication from an input speaker and generating a voice dataset for the input speaker;
receiving a text input at a same device as the audio input;
synthesizing the text from the text input to synthesized speech including using the voice dataset to personalize the synthesized speech to sound like the input speaker.
8 Assignments
0 Petitions
Accused Products
Abstract
A method and system are provided for text-to-speech synthesis with personalized voice. The method includes receiving an incidental audio input (403) of speech in the form of an audio communication from an input speaker (401) and generating a voice dataset (404) for the input speaker (401). The method includes receiving a text input (411) at the same device as the audio input (403) and synthesizing (312) the text from the text input (411) to synthesized speech including using the voice dataset (404) to personalize the synthesized speech to sound like the input speaker (401). In addition, the method includes analyzing (316) the text for expression and adding the expression (315) to the synthesized speech. The audio communication may be part of a video communication (453) and the audio input (403) may have an associated visual input (455) of an image of the input speaker. The synthesis from text may include providing a synthesized image personalized to look like the image of the input speaker with expressions added from the visual input (455).
-
Citations
35 Claims
-
1. A method for text-to-speech synthesis with personalized voice, comprising:
-
receiving an incidental audio input of speech in the form of an audio communication from an input speaker and generating a voice dataset for the input speaker; receiving a text input at a same device as the audio input; synthesizing the text from the text input to synthesized speech including using the voice dataset to personalize the synthesized speech to sound like the input speaker. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A method for text-to-speech synthesis with personalized voice, comprising:
-
receiving an audio input of speech from an input speaker and generating a voice dataset for the input speaker; receiving a text input at a same device as the audio input; analyzing the text for expression; synthesizing the text from the text input to synthesized speech including using the voice dataset to personalize the synthesized speech to sound like the input speaker and adding expression in the personalized synthesized speech. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A computer program product stored on a computer readable storage medium for text-to-speech synthesis, comprising computer readable program code means for performing the steps of:
-
receiving an incidental audio input of speech in the form of an audio communication from an input speaker and generating a voice dataset for the input speaker; receiving a text input at a same device as the audio input; synthesizing the text from the text input to synthesized speech including using the voice dataset to personalize the synthesized speech to sound like the input speaker.
-
-
24. A system for text-to-speech synthesis with personalized voice, comprising:
-
audio communication means for input of speech from an input speaker and means for generating a voice dataset for an input speaker; text input means at the same device as the audio communication means; a text-to-speech synthesizer for producing synthesized speech including means for converting the synthesized speech to sound like the input speaker. - View Dependent Claims (25, 26, 27, 28, 29, 30, 31, 32, 33, 34)
-
-
35. A method of providing a service to a customer over a network, the service comprising:
-
obtaining a received incidental audio input of speech, in the form of an audio communication, from an input speaker and generating a voice dataset for the input speaker; receiving a text input from a client; synthesizing the text from the text input to synthesized speech including using the voice dataset to personalize the synthesized speech to sound like the input speaker.
-
Specification