Sound synthesis device, sound synthesis method and storage medium
First Claim
1. A sound synthesis device, comprising a processor configured to perform the following:
- receiving text data and extracting phoneme sequence from the text data;
obtaining a plurality of digital sound units from a speech corpus database based on the text data and concatenating the plurality of digital sound units so as to construct a concatenated series of digital sound units that corresponds to the text data;
receiving oral input speech data and calculating, as a target prosody, at least one of pitch height, duration, and power parameters from the oral input speech data by referring to the phoneme sequence; and
modifying the concatenated series of digital sound units in accordance with the target prosody to generate synthesized sound data corresponding to the input text data and the target prosody,wherein said processor smoothes a pitch sequence in the target prosody, andwherein, in smoothing said pitch sequence in the target prosody, said processor quantizes pitches of the pitch sequence, and smoothes the pitch sequence by acquiring a weighted moving average of the quantized pitches.
1 Assignment
0 Petitions
Accused Products
Abstract
A sound synthesis device that includes a processor configured to perform the following: extracting intonation information from prosodic information contained in sound data and digitally smoothing the extracted intonation information to obtain smoothed intonation information; obtaining a plurality of digital sound units based on text data and concatenating the plurality of digital sound units so as to construct a concatenated series of digital sound units that corresponds to the text data; and modifying the concatenated series of digital sound units in accordance with the smoothed intonation information with respect to at least one of parameters of the concatenated series of digital sound units to generate synthesized sound data corresponding to the text data.
-
Citations
10 Claims
-
1. A sound synthesis device, comprising a processor configured to perform the following:
-
receiving text data and extracting phoneme sequence from the text data; obtaining a plurality of digital sound units from a speech corpus database based on the text data and concatenating the plurality of digital sound units so as to construct a concatenated series of digital sound units that corresponds to the text data; receiving oral input speech data and calculating, as a target prosody, at least one of pitch height, duration, and power parameters from the oral input speech data by referring to the phoneme sequence; and modifying the concatenated series of digital sound units in accordance with the target prosody to generate synthesized sound data corresponding to the input text data and the target prosody, wherein said processor smoothes a pitch sequence in the target prosody, and wherein, in smoothing said pitch sequence in the target prosody, said processor quantizes pitches of the pitch sequence, and smoothes the pitch sequence by acquiring a weighted moving average of the quantized pitches. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A sound synthesis device, comprising a processor configured to perform the following:
-
receiving text data and extracting phoneme sequence from the text data; obtaining a plurality of digital sound units from a speech corpus database based on the text data and concatenating the plurality of digital sound units so as to construct a concatenated series of digital sound units that corresponds to the text data; receiving oral input speech data and calculating, as a target prosody, at least one of pitch height, duration, and power parameters from the oral input speech data by referring to the phoneme sequence; and modifying the concatenated series of digital sound units in accordance with the target prosody to generate synthesized sound data corresponding to the input text data and the target prosody, wherein said processor modifies a power sequence in the concatenated series of digital sound units so as to substantially match the target prosody, wherein said processor smoothes a power sequence in the target prosody, and wherein, in modifying the power sequence in the concatenated series of digital sound units, said processor smoothes the power sequence in the concatenated series of digital sound units, acquires a sequence of ratios between the smoothed power sequence in the concatenated series of digital sound units and the smoothed power sequence in the target prosody, and corrects the smoothed power sequence in the concatenated series of digital sound units in accordance with said sequence of ratios. - View Dependent Claims (7, 8)
-
-
9. A method of synthesizing sound performed by a processor in a sound synthesis device, the method comprising:
-
receiving text data and extracting phoneme sequence from the text data; obtaining a plurality of digital sound units from a speech corpus database based on the text data and concatenating the plurality of digital sound units so as to construct a concatenated series of digital sound units that corresponds to the text data; receiving oral input speech data and calculating, as a target prosody, at least one of pitch height, duration, and power parameters from the oral input speech data by referring to the phoneme sequence; and modifying the concatenated series of digital sound units in accordance with the target prosody to generate synthesized sound data corresponding to the input text data and the target prosody, wherein said processor smoothes a pitch sequence in the target prosody, and wherein, in smoothing said pitch sequence in the target prosody, said processor quantizes pitches of the pitch sequence, and smoothes the pitch sequence by acquiring a weighted moving average of the quantized pitches.
-
-
10. A non-transitory storage medium that stores instructions executable by a processor included in a sound synthesis device, said instructions causing the processor to perform the following:
-
receiving text data and extracting phoneme sequence from the text data; obtaining a plurality of digital sound units from a speech corpus database based on the text data and concatenating the plurality of digital sound units so as to construct a concatenated series of digital sound units that corresponds to the text data; receiving oral input speech data and calculating, as a target prosody, at least one of pitch height, duration, and power parameters from the oral input speech data by referring to the phoneme sequence; and modifying the concatenated series of digital sound units in accordance with the target prosody to generate synthesized sound data corresponding to the input text data and the target prosody, wherein said processor smoothes a pitch sequence in the target prosody, and wherein, in smoothing said pitch sequence in the target prosody, said processor quantizes pitches of the pitch sequence, and smoothes the pitch sequence by acquiring a weighted moving average of the quantized pitches.
-
Specification