Personalized text-to-speech synthesis and personalized speech feature extraction
First Claim
1. A personalized text-to-speech synthesizing device, comprising:
- a processor;
a memory;
a personalized speech feature library creator, configured to recognize personalized speech features of a specific speaker by recognizing whether a keyword from preset keywords associated with the specific speaker occurs in a random speech fragment of the specific speaker that includes multiple words including the keyword and speech in addition to the keyword, the random speech fragment being part of a multiple speaker conversation including the speaker, and, if the keyword is found in the random speech fragment, recognizing the personalized speech features of the specific speaker based on a comparison of a standard speech of the keyword and the speech of the keyword by the specific speaker in the random speech fragment, thereby to create a personalized speech feature library associated with the specific speaker, and store the personalized speech feature library in association with the specific speaker; and
a text-to-speech synthesizer, configured to perform a speech synthesis of a text message from the specific speaker, based on the personalized speech feature library associated with the specific speaker and created by the personalized speech feature library creator, thereby to generate and output a speech fragment having pronunciation characteristics of the specific speaker.
3 Assignments
0 Petitions
Accused Products
Abstract
A personalized text-to-speech synthesizing device includes: a personalized speech feature library creator, configured to recognize personalized speech features of a specific speaker by comparing a random speech fragment of the specific speaker with preset keywords, thereby to create a personalized speech feature library associated with the specific speaker, and store the personalized speech feature library in association with the specific speaker; and a text-to-speech synthesizer, configured to perform a speech synthesis of a text message from the specific speaker, based on the personalized speech feature library associated with the specific speaker and created by the personalized speech feature library creator, thereby to generate and output a speech fragment having pronunciation characteristics of the specific speaker. A personalized speech feature library of a specific speaker is established without a deliberate training process, and a text is synthesized into personalized speech with the speech characteristics of the speaker.
-
Citations
37 Claims
-
1. A personalized text-to-speech synthesizing device, comprising:
-
a processor; a memory; a personalized speech feature library creator, configured to recognize personalized speech features of a specific speaker by recognizing whether a keyword from preset keywords associated with the specific speaker occurs in a random speech fragment of the specific speaker that includes multiple words including the keyword and speech in addition to the keyword, the random speech fragment being part of a multiple speaker conversation including the speaker, and, if the keyword is found in the random speech fragment, recognizing the personalized speech features of the specific speaker based on a comparison of a standard speech of the keyword and the speech of the keyword by the specific speaker in the random speech fragment, thereby to create a personalized speech feature library associated with the specific speaker, and store the personalized speech feature library in association with the specific speaker; and a text-to-speech synthesizer, configured to perform a speech synthesis of a text message from the specific speaker, based on the personalized speech feature library associated with the specific speaker and created by the personalized speech feature library creator, thereby to generate and output a speech fragment having pronunciation characteristics of the specific speaker. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 18, 19, 20, 21, 22, 23, 24, 25)
-
-
9. A personalized text-to-speech synthesizing method, comprising:
-
presetting one or more keywords with respect to a specific language; receiving a random speech fragment of a specific speaker that includes multiple words including a keyword from the preset one or more keywords and speech in addition to the keyword, wherein the random speech fragment is part of a multiple speaker conversation including the speaker; recognizing personalized speech features of the specific speaker by recognizing whether the keyword is found in the random speech fragment of the specific speaker, and, if the keyword is found in the random speech fragment, recognizing the personalized speech features of the specific speaker based on a comparison of a standard speech of the keyword and the speech of the keyword by the specific speaker in the random speech fragment, thereby creating a personalized speech feature library associated with the specific speaker, and storing in a memory the personalized speech feature library in association with the specific speaker; and performing a speech synthesis of a text message from the specific speaker, based on the personalized speech feature library associated with the specific speaker, thereby generating and outputting a speech fragment having pronunciation characteristics of the specific speaker. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17)
-
-
26. A personalized speech feature extraction device, comprising:
-
a processor; a memory; a keyword setting unit, configured to set one or more keywords suitable for reflecting the pronunciation characteristics of a specific speaker with respect to a specific language, and store the keywords in association with the specific speaker; a speech feature recognition unit, configured to recognize whether any keyword associated with the specific speaker occurs in a random speech fragment of the specific speaker that includes multiple words including the keyword and speech in addition to the keyword, the random speech fragment obtained from a multiple speaker conversation including the speaker, and when a keyword associated with the specific speaker is found in the speech fragment of the specific speaker, recognize speech features of the specific speaker according to a standard pronunciation of the recognized keyword and the pronunciation of the speaker; a speech feature filtration unit, configured to filter out abnormal speech features from the keyword as found in the speech fragment through statistical analysis while retaining speech features reflecting the normal pronunciation characteristics of the specific speaker, when the speech features of the specific speaker recognized by the speech feature recognition unit reach a predetermined number, thereby to create a personalized speech feature library associated with the specific speaker, and store the personalized speech feature library in association with the specific speaker; and a text-to-speech synthesizer, configured to perform a speech synthesis of a text message from the specific speaker, based on the stored personalized speech feature library associated with the specific speaker. - View Dependent Claims (27, 28, 29, 30, 31)
-
-
32. A personalized speech feature extraction method, comprising:
-
setting one or more keywords suitable for reflecting the pronunciation characteristics of a specific speaker with respect to a specific language, and storing in a memory the keywords in association with the specific speaker; recognizing whether any keyword associated with the specific speaker occurs in a random speech fragment of the specific speaker obtained from a multiple speaker conversation including the speaker and that includes multiple words including the keyword and speech in addition to the keyword, and when a keyword associated with the specific speaker is found in the speech fragment of the specific speaker, recognizing speech features of the specific speaker according to a standard pronunciation of the recognized keyword and the pronunciation of the speaker; and filtering out abnormal speech features from the keyword as found in the speech fragment through statistical analysis while retaining speech features reflecting the normal pronunciation characteristics of the specific speaker, when the speech features of the specific speaker recognized by the speech feature recognition unit reach a predetermined number, thereby creating a personalized speech feature library associated with the specific speaker, and storing the personalized speech feature library in association with the specific speaker; and performing a speech synthesis of a text message from the specific speaker based on the stored personalized speech feature library associated with the specific speaker. - View Dependent Claims (33, 34, 35, 36, 37)
-
Specification