Speech synthesis method
First Claim
1. A speech synthesis method comprising the steps of:
- generating a plurality of synthesis speech segments by changing at least one of a pitch and a duration of each of a plurality of second speech segments in accordance with at least one of a pitch and a duration of each of a plurality of first speech segments;
selecting a plurality of synthesis units from the second speech segments on the basis of a distance between the synthesis speech segments and the first speech segments; and
generating a synthesis speech by selecting predetermined synthesis units from the synthesis units and connecting the predetermined synthesis units to one another to generate a synthesis speech.
0 Assignments
0 Petitions
Accused Products
Abstract
In a synthesis unit generator, a plurality of synthesis speech segments are generated by synthesizing training speech segments labeled with phonetic contexts and input speech segments while altering the pitch/duration of the input speech segments in accordance with the pitch/duration of the training speech segments. Typical speech segments are selected from the input speech segments on the basis of a distance between the synthesis speech segments and the training speech segments, and are stored in a storage. In addition, a plurality of phonetic context clusters corresponding to the synthesis units are generated on the basis of the distance, and are stored in a storage. A synthesis speech signal is generated by reading out, from the storage, those of the synthesis units, which correspond to the phonetic context clusters including phonetic contexts of input phonemes, and connecting the selected synthesis units in a speech synthesizer.
-
Citations
36 Claims
-
1. A speech synthesis method comprising the steps of:
-
generating a plurality of synthesis speech segments by changing at least one of a pitch and a duration of each of a plurality of second speech segments in accordance with at least one of a pitch and a duration of each of a plurality of first speech segments;
selecting a plurality of synthesis units from the second speech segments on the basis of a distance between the synthesis speech segments and the first speech segments; and
generating a synthesis speech by selecting predetermined synthesis units from the synthesis units and connecting the predetermined synthesis units to one another to generate a synthesis speech. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A speech synthesis method comprising the steps of:
-
generating a plurality of synthesis speech segments by changing at least one of a pitch and a duration of each of a plurality of second speech segments in accordance with at least one of a pitch and a duration of each of a plurality of first speech segments;
selecting a plurality of synthesis speech segments using information regarding a distance between the synthesis speech segments;
forming a plurality of synthesis context clusters using the information regarding the distance and the synthesis units; and
generating a synthesis speech by selecting those of the synthesis units, which correspond to at least one of the phonetic context clusters which includes phonetic contexts of input phonemes, and connecting the selected synthesis units. - View Dependent Claims (7, 8, 9, 10, 11)
-
-
12. A speech synthesis method comprising the steps of:
-
generating a plurality of synthesis speech segments by changing at least one of a pitch and a duration of each of a plurality of second speech segments in accordance with at least one of the pitch and duration of each of a plurality of first speech segments labeled with phonetic contexts;
forming a plurality of synthesis context clusters using information regarding a distance between the synthesis speech segments and the first speech segments and information regarding the synthesis units;
selecting the synthesis units using the information regarding the distance and the synthesis context cluster; and
generating a synthesis speech by selecting predetermined synthesis units from the synthesis units and connecting the selected synthesis units.
-
-
13. A speech synthesis method comprising the steps of:
-
generating a plurality of synthesis speech segments by changing at least one of a pitch and a duration of each of a plurality of second speech segments and a plurality of second speech segments in accordance with at least one of the pitch and duration of each of a plurality of first speech segments labeled with phonetic contexts;
generating a plurality of phonetic context clusters on the basis of a distance between the synthesis speech segments and the first speech segments;
selecting a plurality of synthesis units corresponding to the phonetic context clusters from the second speech segments on the basis of the distance; and
generating a synthesis speech by selecting those of the synthesis units, which correspond to the phonetic context clusters including phonetic contexts of input phonemes, and connecting the selected synthesis units. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
-
-
21. A speech synthesis method comprising the steps of:
-
prestoring information on a plurality of speech synthesis units including at least speech spectrum parameters;
selecting predetermined information from the stored information on the speech synthesis units;
generating a synthesis speech signal by connecting the selected predetermined information; and
emphasizing a formant of the synthesis speech signal by a formant emphasis filter whose filtering coefficient is determined in accordance with the spectrum parameters of the selected information. - View Dependent Claims (22, 23, 24, 25, 26)
-
-
27. A speech synthesis method comprising the steps of:
-
generating linear prediction coefficients by subjecting a reference speech signal to a linear prediction analysis;
producing a residual pitch wave from a typical speech pitch wave extracted from the reference speech signal, using the linear prediction coefficients;
storing information regarding the residual pitch wave as information of a speech synthesis unit in a voiced period; and
synthesizing a speech, using the information of the speech synthesis unit.
-
-
28. A speech synthesis method comprising the steps of:
-
storing information on a residual pitch wave generated from a reference speech signal and a spectrum parameter extracted from the reference speech signal;
driving a vocal track filter having the spectrum parameter as a filtering coefficient, by a voiced speech source signal generated by using the information on the residual pitch wave in a voiced period, and by an unvoiced speech source signal in an unvoiced period, thereby generating a synthesis speech; and
generating the residual pitch wave from a typical speech pitch wave extracted from the reference speech signal, by using a linear prediction coefficient obtained by subjecting the reference speech signal to linear prediction analysis. - View Dependent Claims (29, 30, 31, 32, 33)
-
-
34. A speech synthesis apparatus comprising:
-
a speech segment generator for generating a plurality of synthesis speech segments by changing at least one of a pitch and a duration of each of a plurality of second speech segments in accordance with at least one of a pitch and a duration of each of a plurality of first speech segments;
a synthesis unit selector for selecting a plurality of synthesis units from the second speech segments on the basis of a distance between the synthesis speech segments and the first speech segments; and
a speech synthesis section for generating a synthesis speech by selecting predetermined synthesis units from the synthesis units and connecting the predetermined synthesis units to one another to generate a synthesis speech.
-
-
35. A speech synthesis apparatus comprising:
-
a speech segment generator for generating a plurality of synthesis speech segments by changing at least one of a pitch and a duration of each of a plurality of second speech segments and a plurality of second speech segments in accordance with at least one of the pitch and duration of each of a plurality of first speech segments labeled with phonetic contexts;
a phonetic context cluster generator for generating a plurality of phonetic context clusters on the basis of a distance between the synthesis speech segments and the first speech segments;
a synthesis unit selector for selecting a plurality of synthesis units corresponding to the phonetic context clusters from the second speech segments on the basis of the distance; and
a speech synthesis unit for generating a synthesis speech by selecting those of the synthesis units, which correspond to the phonetic context clusters including phonetic contexts of input phonemes, and connecting the selected synthesis units.
-
-
36. A speech synthesis apparatus comprising:
-
a storage for prestoring information on a plurality of speech synthesis units including at least speech spectrum parameters;
a selector for selecting predetermined information from the stored information on the speech synthesis units;
a speech synthesis section for generating a synthesis speech signal by connecting the selected predetermined information; and
an emphasis section including a formant emphasis filter whose filtering coefficient is determined in accordance with the spectrum parameters of the selected information for emphasizing a formant of the synthesis speech signal.
-
Specification