Technique of Generating High Quality Synthetic Speech
First Claim
1. A system for generating synthetic speech, comprising:
- a phoneme segment storage section for storing a plurality of phoneme segment data pieces indicating a plurality of sounds of phonemes which are different from each other; and
a synthesis section for generating voice data representing synthetic speech of text by receiving an inputted text, by reading out phoneme segment data pieces that correspond to respective phonemes indicating the pronunciation of the inputted text, and then by connecting the read-out phoneme segment data pieces to each other;
a computing section for computing a score indicating the unnaturalness of the synthetic speech of the text, on the basis of the voice data;
a paraphrase storage section for storing a plurality of second notations, the second notations being paraphrases of first notations and for associating the second notations with the respective first notations;
a replacement section for searching the text for a notation matching with any of the first notations and for replacing the searched-out notation with the second notation corresponding to the first notation; and
a judgment section for receiving the score and for outputting the generated voice data on condition that the score is smaller than a predetermined reference value, and for inputting the text to the synthesis section in order for the synthesis section to generate further voice data for the text after replacement when the score is equal to or greater than the reference value.
8 Assignments
0 Petitions
Accused Products
Abstract
A synthetic speech system includes a phoneme segment storage section for storing multiple phoneme segment data pieces; a synthesis section for generating voice data from text by reading phoneme segment data pieces representing the pronunciation of an inputted text from the phoneme segment storage section and connecting the phoneme segment data pieces to each other; a computing section for computing a score indicating the unnaturalness of the voice data representing the synthetic speech of the text; a paraphrase storage section for storing multiple paraphrases of the multiple first phrases; a replacement section for searching the text and replacing with appropriate paraphrases; and a judgment section for outputting generated voice data on condition that the computed score is smaller than a reference value and for inputting the text after the replacement to the synthesis section to cause the synthesis section to further generate voice data for the text.
-
Citations
12 Claims
-
1. A system for generating synthetic speech, comprising:
-
a phoneme segment storage section for storing a plurality of phoneme segment data pieces indicating a plurality of sounds of phonemes which are different from each other; and a synthesis section for generating voice data representing synthetic speech of text by receiving an inputted text, by reading out phoneme segment data pieces that correspond to respective phonemes indicating the pronunciation of the inputted text, and then by connecting the read-out phoneme segment data pieces to each other; a computing section for computing a score indicating the unnaturalness of the synthetic speech of the text, on the basis of the voice data; a paraphrase storage section for storing a plurality of second notations, the second notations being paraphrases of first notations and for associating the second notations with the respective first notations; a replacement section for searching the text for a notation matching with any of the first notations and for replacing the searched-out notation with the second notation corresponding to the first notation; and a judgment section for receiving the score and for outputting the generated voice data on condition that the score is smaller than a predetermined reference value, and for inputting the text to the synthesis section in order for the synthesis section to generate further voice data for the text after replacement when the score is equal to or greater than the reference value. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method for generating synthetic speech, comprising the steps of:
-
storing a plurality of phoneme segment data pieces indicating a plurality of sounds of phonemes different from each other; generating voice data representing synthetic speech of text by receiving an inputted text, by reading out the phoneme segment data pieces corresponding to respective phonemes indicating the pronunciation of the inputted text, and then by connecting the read-out phoneme segment data pieces to each other; computing a score indicating the unnaturalness of the synthetic speech of the text, on the basis of the voice data; storing a plurality of second notations that are paraphrases of a plurality of first notations and associating the second notations with the respective first notations; searching the text for a notation matching with any of the first notations, and replacing the searched-out notation with the second notation corresponding to the first notation; and outputting the generated voice data when the score is smaller than a predetermined reference value, and further generating synthetic speech in order to generate further voice data for the text after replacement on condition that the score is equal to or greater than the reference value.
-
-
12. A program allowing an information processing apparatus to function as a system for generating synthetic speech, the program causing the information apparatus to function as:
-
a phoneme segment storage section for storing a plurality of phoneme segment data pieces indicating a plurality of sounds of phonemes which are different from each other; and a synthesis section for generating voice data representing synthetic speech of text by receiving an inputted text, by reading out phoneme segment data pieces that correspond to respective phonemes indicating the pronunciation of the inputted text, and then by connecting the read-out phoneme segment data pieces to each other; a computing section for computing a score indicating the unnaturalness of the synthetic speech of the text, on the basis of the voice data; a paraphrase storage section for storing a plurality of second notations, the second notations being paraphrases of first notations and for associating the second notations with the respective first notations; a replacement section for searching the text for a notation matching with any of the first notations and for replacing the searched-out notation with the second notation corresponding to the first notation; and a judgment section for receiving the score and for outputting the generated voice data on condition that the score is smaller than a predetermined reference value, and for inputting the text to the synthesis section in order for the synthesis section to generate further voice data for the text after replacement when the score is equal to or greater than the reference value.
-
Specification