Speech processing apparatus and program
First Claim
Patent Images
1. A speech processing apparatus for carrying out text-to-speech synthesis, comprising:
- an input unit to which a plurality of segments obtained by delimiting a phonological sequence corresponding to a target speech in units of synthesis and prosodic information on the respective segments corresponding to the target speech are entered;
a unit selector configured to select a plurality of first speech units from a group of speech units on the basis of the prosodic information for each of the plurality of segments;
a decomposer configured to decompose each of the plurality of first speech units into periodic components and aperiodic components for each of the plurality of segments;
a periodic component fusing unit configured to generate a second speech unit by fusing the periodic components of the plurality of first speech units for each of the plurality of segments;
an aperiodic component fusing unit configured to generate a third speech unit by fusing the aperiodic components of the plurality of first speech units for each of the plurality of segments; and
a generator configured to generate a synthesized speech by adding speech waveforms obtained respectively from the second speech unit and the third speech unit generated for each of the plurality of segments and concatenating the same among the segments.
1 Assignment
0 Petitions
Accused Products
Abstract
A speech synthesizer includes a periodic component fusing unit and an aperiodic component fusing unit, and fuses periodic components and aperiodic components of a plurality of speech units for each segment, which are selected by a unit selector, by a periodic component fusing unit and an aperiodic component fusing unit, respectively. The speech synthesizer is further provided with an adder, so that the adder adds, edits, and concatenates the periodic components and the aperiodic components of the fused speech units to generate a speech waveform.
-
Citations
21 Claims
-
1. A speech processing apparatus for carrying out text-to-speech synthesis, comprising:
-
an input unit to which a plurality of segments obtained by delimiting a phonological sequence corresponding to a target speech in units of synthesis and prosodic information on the respective segments corresponding to the target speech are entered; a unit selector configured to select a plurality of first speech units from a group of speech units on the basis of the prosodic information for each of the plurality of segments; a decomposer configured to decompose each of the plurality of first speech units into periodic components and aperiodic components for each of the plurality of segments; a periodic component fusing unit configured to generate a second speech unit by fusing the periodic components of the plurality of first speech units for each of the plurality of segments; an aperiodic component fusing unit configured to generate a third speech unit by fusing the aperiodic components of the plurality of first speech units for each of the plurality of segments; and a generator configured to generate a synthesized speech by adding speech waveforms obtained respectively from the second speech unit and the third speech unit generated for each of the plurality of segments and concatenating the same among the segments. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A speech processing apparatus for carrying out text-to-speech synthesis, comprising:
-
an input unit to which a plurality of segments obtained by delimiting a phonological sequence corresponding to a target speech in units of synthesis and prosodic information on the respective segments corresponding to the target speech are entered; an environment storage configured to store speech-units'"'"' environments of a plurality of speech units; a unit storage configured to store periodic components and aperiodic components of each of the speech units, (which were decomposed from the waveform data of each of the speech units); an environment selector configured to select the unit environments of a plurality of first speech units from the environment storage on the basis of the prosodic information for each of the plurality of segments; a periodic component fusing unit configured to extract the periodic components of the first speech units corresponding to the selected unit environments of the plurality of first speech units from the unit storage and fuse the periodic components to generate the second speech unit for each of the plurality of segments; an aperiodic component fusing unit configured to extract the aperiodic components of the first speech units corresponding to the unit environments of the plurality of first speech units from the unit storage and fuse the aperiodic components to generate a third speech unit for each of the plurality of segments; and a generator configured to generate a synthesized speech by adding speech waveforms obtained respectively from the second speech units and the third speech units of the plurality of segments and concatenating the same among the segments. - View Dependent Claims (8, 9, 10, 11, 12, 13)
-
-
14. A speech processing apparatus for creating a storage for storing a plurality of speech units used for text-to-speech synthesis comprising:
-
an input unit to which a plurality of segments obtained by delimiting a phonological sequence corresponding to a target speech in units of synthesis and prosodic information on the respective segments corresponding to the target speech are entered; a unit selector configured to select a plurality of first speech units from a group of the speech units on the basis of the prosodic information for each of the plurality of segments; a decomposer configured to decompose each of the plurality of first speech units into periodic components and aperiodic components for each of the plurality of segments; a periodic component fusing unit configured to generate a second speech unit by fusing the periodic components of the plurality of first speech units for each of the plurality of segments; an aperiodic component fusing unit configured to generate a third speech unit by fusing the aperiodic components of the plurality of first speech units for each of the plurality of segments; and the storage configured to store the plurality of second speech units and the plurality of third speech units. - View Dependent Claims (15)
-
-
16. A speech processing apparatus for creating a storage configured to store a plurality of speech units used for text-to-speech synthesis comprising:
-
a unit storage configured to store periodic components and aperiodic components of each of the speech units, (which were decomposed from the waveform data of each of the speech units); an input unit to which a plurality of segments obtained by delimiting a phonological sequence corresponding to a target speech in units of synthesis and prosodic information on the respective segments corresponding to the target speech are entered; a component selector configured to select the periodic components and the aperiodic components of the plurality of first speech units from the unit storage on the basis of the prosodic information for each of the plurality of segments; a periodic component fusing unit configured to generate a second speech unit by fusing the periodic components of the plurality of first speech units for each of the plurality of segments; an aperiodic component fusing unit configured to generate a third speech unit by fusing the aperiodic components of the plurality of first speech units for each of the plurality of segments; and the storage configured to store the plurality of second speech units and the plurality of third speech units. - View Dependent Claims (17)
-
-
18. A speech processing program product configured to carry out text-to-speech synthesis and stored in a non-transitory computer readable medium, a computer realizing the functions of:
-
accepting a plurality of segments obtained by delimiting a phonological sequence corresponding to a target speech in units of synthesis and prosodic information on the respective segments corresponding to the target speech; selecting a plurality of first speech units from a group of speech units on the basis of the prosodic information for each of the plurality of segments; decomposing each of the plurality of first speech units into periodic components and aperiodic components for each of the plurality of segments; generating a second speech unit by fusing the periodic components of the plurality of first speech units for each of the plurality of segments; generating a third speech unit by fusing the aperiodic components of the plurality of first speech units for each of the plurality of segments; and generating a synthesized speech by adding speech waveform obtained respectively from the second speech unit and the third speech unit generated for each of the plurality of segments and concatenating the same among the segments.
-
-
19. A speech processing program product configured to carry out text-to-speech synthesis and stored in a non-transitory computer readable medium, a computer comprising:
-
an environment storage configured to store unit environments of a plurality of speech units; a unit storage configured to store periodic components and aperiodic components of each of the speech units (which were decomposed from the waveform data of each of the speech units); the computer realizing the functions of; accepting a plurality of segments obtained by delimiting a phonological sequence corresponding to a target speech in units of synthesis and prosodic information on the respective segments corresponding to the target speech; selecting the unit environments of a plurality of first speech units from the environment storage on the basis of the prosodic information for each of the plurality of segments; extracting the periodic components of the first speech units corresponding to the selected unit environments of the plurality of first speech units from the unit storage and fusing the periodic components individually to generate the second speech unit for each of the plurality of segments; extracting the aperiodic components of the first speech units corresponding to the selected unit environments of the plurality of first speech units from the unit storage and fusing the aperiodic components individually to generate third speech unit for each of the plurality of segments; and generating a synthesized speech by adding speech waveform obtained respectively from the second speech unit and the third speech unit for each of the plurality of segments and concatenating the same among the segments.
-
-
20. A speech processing program product for creating a storage configured to store a plurality of speech units used for text-to-speech synthesis stored in a non-transitory computer readable medium, a computer realizing the functions of:
-
accepting a plurality of segments obtained by delimiting a phonological sequence corresponding to a target speech in units of synthesis and prosodic information on the respective, segments corresponding to the target speech; selecting a plurality of first speech units from a group of the speech units on the basis of the prosodic information for each of the plurality of segments; decomposing each of the plurality of first speech units into periodic components and aperiodic components for each of the plurality of segments; generating a second speech unit by fusing the periodic components of the plurality of first speech units for each of the plurality of segments; generating a third speech unit by fusing the aperiodic components of the plurality of first speech units for each of the plurality of segments; and storing the plurality of second speech units and the plurality of third speech units in the storage.
-
-
21. A speech processing program product for creating a storage configured to store a plurality of speech units used for text-to-speech synthesis stored in a non-transitory computer readable medium, a computer comprising:
-
a unit storage configured to store periodic components and aperiodic components of each of the plurality of speech units, (which were decomposed from the waveform data of each of the speech units); the computer realizing the functions of; accepting a plurality of segments obtained by delimiting a phonological sequence corresponding to a target speech in units of synthesis and prosodic information on the respective segments corresponding to the target speech; selecting the periodic components and the aperiodic components of the plurality of first speech units from the unit storage on the basis of the prosodic information for each of the plurality of segments; generating a second speech unit by fusing the periodic components of the plurality of first speech units for each of the plurality of segments; generating a third speech unit by fusing the aperiodic components of the plurality of first speech units for each of the plurality of segments; and storing the plurality of second speech units and the plurality of third speech units in the storage.
-
Specification