Speech synthesis device, speech synthesis method, and speech synthesis program
First Claim
Patent Images
1. A speech synthesis device comprising:
- hardware including a processor, wherein the processor is configured to;
by using a voiced utterance likelihood index which is an index indicating a degree of voiced utterance likelihood of each state which represents a phoneme modeled by a statistical method, update a phoneme boundary position which is a boundary with other phonemes neighboring to the phoneme; and
calculate a duration of each phoneme based on the updated phoneme boundary position, and generate synthesized speech based on the calculated duration of phoneme,wherein, when a phoneme before and after a phoneme boundary is an unvoiced sound and a voiced sound, the processor is configured to determine whether a state before and after the phoneme boundary indicated a voiced state or an unvoiced state by using the voiced utterance likelihood index, andwherein, when the state before and after the phoneme boundary are both determined as voiced state or an unvoiced state, the processor is configured to update the phoneme boundary position to move in a predetermined direction according to the state.
1 Assignment
0 Petitions
Accused Products
Abstract
There are provided a speech synthesis device, a speech synthesis method and a speech synthesis program which can represent a phoneme as a duration shorter than a duration upon modeling according to a statistical method. A speech synthesis device 80 according to the present invention includes a phoneme boundary updating means 81 which, by using a voiced utterance likelihood index which is an index indicating a degree of voiced utterance likelihood of each state which represents a phoneme modeled by a statistical method, updates a phoneme boundary position which is a boundary with other phonemes neighboring to the phoneme.
-
Citations
15 Claims
-
1. A speech synthesis device comprising:
hardware including a processor, wherein the processor is configured to; by using a voiced utterance likelihood index which is an index indicating a degree of voiced utterance likelihood of each state which represents a phoneme modeled by a statistical method, update a phoneme boundary position which is a boundary with other phonemes neighboring to the phoneme; and calculate a duration of each phoneme based on the updated phoneme boundary position, and generate synthesized speech based on the calculated duration of phoneme, wherein, when a phoneme before and after a phoneme boundary is an unvoiced sound and a voiced sound, the processor is configured to determine whether a state before and after the phoneme boundary indicated a voiced state or an unvoiced state by using the voiced utterance likelihood index, and wherein, when the state before and after the phoneme boundary are both determined as voiced state or an unvoiced state, the processor is configured to update the phoneme boundary position to move in a predetermined direction according to the state. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
10. A speech synthesis method comprising,
by using a voiced utterance likelihood index which is an index indicating a degree of voiced utterance likelihood of each state which represents a phoneme modeled by a statistical method, updating a phoneme boundary position which is a boundary with other phonemes neighboring to the phoneme; - and
calculating a duration of each phoneme based on the updated phoneme boundary position, and generating synthesized speech based on the calculated duration of phoneme, wherein, when a phoneme before and after a phoneme boundary is an unvoiced sound and a voiced sound, determining whether a state before and after the phoneme boundary indicates a voiced state or an unvoiced state by using the voiced utterance likelihood index, and wherein, when the state before and after the phoneme boundary are both determined as voiced state or an unvoiced state, updating the phoneme boundary position to move in a predetermined direction according to the state. - View Dependent Claims (11, 12)
- and
-
13. A non-transitory computer readable information recording medium storing a speech synthesis program that, when executed by a processor, performs a method for:
-
by using a voiced utterance likelihood index which is an index indicating a degree of voiced utterance likelihood of each state which represents a phoneme modeled by a statistical method, updating a phoneme boundary position which is a boundary with other phonemes neighboring to the phoneme; and calculating a duration of each phoneme based on the updated phoneme boundary position, and generating synthesized speech based on the calculated duration of phoneme, wherein, when a phoneme before and after a phoneme boundary is an unvoiced sound and a voiced sound, determining whether a state before and after the phoneme boundary indicates a voiced state or an unvoiced state by using the voiced utterance likelihood index, and wherein, when the state before and after the phoneme boundary are both determined as voiced state or an unvoiced state, updating the phoneme boundary position to move in a predetermined direction according to the state. - View Dependent Claims (14, 15)
-
Specification