Generating objectively evaluated sufficiently natural synthetic speech from text by using selective paraphrases
First Claim
1. A system for generating synthetic speech, comprising:
- a phoneme segment storage section operable to store a plurality of phoneme segment data pieces indicating a plurality of sounds of phonemes which are different from each other; and
a synthesis section operable to generate voice data representing synthetic speech of text by receiving an inputted text, reading out phoneme segment data pieces that correspond to respective phonemes indicating the pronunciation of the inputted text, and connecting the read-out phoneme segment data pieces to each other;
a computing section operable to compute a score indicating naturalness of the synthetic speech of the text, on the basis of the voice data;
a paraphrase storage section operable to store a plurality of notations each comprising a word or phrase, the plurality of notations comprising a plurality of first notations and a plurality of second notations, each second notation being a paraphrase of a respective first notation;
a replacement section operable to search the text for a notation matching any of the first notations and to replace a matching notation with the second notation corresponding to the first notation; and
a judgment section operable to receive the score computed by the computing section and determine whether the score indicates the synthetic speech is sufficiently natural, and;
if the score indicates the synthetic speech is sufficiently natural, output the generated voice data; and
if the score indicates the synthetic speech is not sufficiently natural, cause the replacement section to generate revised text by replacing at least one other notation in the inputted text matching a first notation with a corresponding second notation, and cause the synthesis section to generate voice data for the revised text.
8 Assignments
0 Petitions
Accused Products
Abstract
A synthetic speech system includes a phoneme segment storage section for storing multiple phoneme segment data pieces; a synthesis section for generating voice data from text by reading phoneme segment data pieces representing the pronunciation of an inputted text from the phoneme segment storage section and connecting the phoneme segment data pieces to each other; a computing section for computing a score indicating the unnaturalness of the voice data representing the synthetic speech of the text; a paraphrase storage section for storing multiple paraphrases of the multiple first phrases; a replacement section for searching the text and replacing with appropriate paraphrases; and a judgment section for outputting generated voice data on condition that the computed score is smaller than a reference value and for inputting the text after the replacement to the synthesis section to cause the synthesis section to further generate voice data for the text.
305 Citations
12 Claims
-
1. A system for generating synthetic speech, comprising:
-
a phoneme segment storage section operable to store a plurality of phoneme segment data pieces indicating a plurality of sounds of phonemes which are different from each other; and a synthesis section operable to generate voice data representing synthetic speech of text by receiving an inputted text, reading out phoneme segment data pieces that correspond to respective phonemes indicating the pronunciation of the inputted text, and connecting the read-out phoneme segment data pieces to each other; a computing section operable to compute a score indicating naturalness of the synthetic speech of the text, on the basis of the voice data; a paraphrase storage section operable to store a plurality of notations each comprising a word or phrase, the plurality of notations comprising a plurality of first notations and a plurality of second notations, each second notation being a paraphrase of a respective first notation; a replacement section operable to search the text for a notation matching any of the first notations and to replace a matching notation with the second notation corresponding to the first notation; and a judgment section operable to receive the score computed by the computing section and determine whether the score indicates the synthetic speech is sufficiently natural, and; if the score indicates the synthetic speech is sufficiently natural, output the generated voice data; and if the score indicates the synthetic speech is not sufficiently natural, cause the replacement section to generate revised text by replacing at least one other notation in the inputted text matching a first notation with a corresponding second notation, and cause the synthesis section to generate voice data for the revised text. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method for generating synthetic speech, comprising acts of:
-
storing a plurality of phoneme segment data pieces indicating a plurality of sounds of phonemes different from each other; generating voice data representing synthetic speech of text by receiving an inputted text, reading out the phoneme segment data pieces corresponding to respective phonemes indicating the pronunciation of the inputted text, and connecting the read-out phoneme segment data pieces to each other; computing a score indicating naturalness of the synthetic speech of the text, on the basis of the voice data; storing a plurality of notations each comprising a word or phrase, the plurality of notations comprising a plurality of first notations and a plurality of second notations, each second notation being a paraphrase of a respective first notation; searching the text for a notation matching any of the first notations, and replacing a matching notation with the second notation corresponding to the first notation; determining whether the score indicates that the synthetic speech is sufficiently natural; and if the score indicates that the synthetic speech is sufficiently natural, outputting the generated voice data; and if the score indicates that the synthetic speech is not sufficiently natural, generating revised text by replacing at least one other notation in the inputted text matching a first notation with a corresponding second notation, and generating voice data for the revised text.
-
-
12. At least one storage device having instructions encoded thereon which, when executed, perform a method of generating synthetic speech, the method comprising acts of:
-
storing a plurality of phoneme segment data pieces indicating a plurality of sounds of phonemes which are different from each other; and generating voice data representing synthetic speech of text by receiving an inputted text, reading out phoneme segment data pieces that correspond to respective phonemes indicating the pronunciation of the inputted text, and connecting the read-out phoneme segment data pieces to each other; computing a score indicating naturalness of the synthetic speech of the text, on the basis of the voice data; storing a plurality of notations each comprising a word or phrase, the plurality of notations comprising a plurality of first notations and a plurality of second notations, each of the second notations being a paraphrase of a respective first notation; and searching the text for a notation matching any of the first notations and replacing a matching notation with the second notation corresponding to the first notation; and determining whether the score indicates that the synthetic speech is sufficiently natural; and if the score indicates that the synthetic speech is sufficiently natural, outputting the generated voice data; and if the score indicates that the synthetic speech is not sufficiently natural, generating revised text by replacing at least one other notation in the inputted text matching a first notation with a respective second notation, and generating voice data for the revised text.
-
Specification