Streaming encoder, prosody information encoding device, prosody-analyzing device, and device and method for speech synthesizing
First Claim
1. A speech-synthesizing device, comprising:
- a hierarchical prosodic module generating at least a first hierarchical prosodic model;
a prosody structure analyzing device, receiving a low-level linguistic feature, a high-level linguistic feature and a first prosodic feature, and generating at least a prosodic tag based on the low-level linguistic feature, the high-level linguistic feature, the first prosodic feature and the first hierarchical prosodic model, wherein the prosodic tag includes a prosodic break sequence describing at least an inter-syllable pause duration and a prosodic state sequence defining at least a syllable pitch contour, a syllable duration and a syllable energy level, and describes a Mandarin Chinese prosodic hierarchical structure including a syllable, a prosodic word, a prosodic phrase and one of a breath group and a prosodic phrase group;
a prosody-synthesizing unit synthesizing a second prosodic feature based on the hierarchical prosodic module, the low-level linguistic feature and the prosodic tag;
a prosodic feature extractor receiving a speech input and the low-level linguistic feature, segmenting the speech input to form a segmented speech, and generating the first prosodic feature based on the low-level linguistic feature and the segmented speech; and
a prosody-synthesizing device, wherein the first hierarchical prosodic model is generated based on a first speech speed, on a condition that when the prosody-synthesizing device is going to generate a second speech speed being different from the first speech speed, the first hierarchical prosodic model is replaced with a second hierarchical prosodic model having the second speech speed and the prosody-synthesizing unit changes the second prosodic feature to a third prosodic feature, and the speech-synthesizing device generates a speech synthesis based on the third prosodic feature and the low-level linguistic feature.
1 Assignment
0 Petitions
Accused Products
Abstract
A speech-synthesizing device includes a hierarchical prosodic module, a prosody-analyzing device, and a prosody-synthesizing unit. The hierarchical prosodic module generates at least a first hierarchical prosodic model. The prosody-analyzing device receives a low-level linguistic feature, a high-level linguistic feature and a first prosodic feature, and generates at least a prosodic tag based on the low-level linguistic feature, the high-level linguistic feature, the first prosodic feature and the first hierarchical prosodic model. The prosody-synthesizing unit synthesizes a second prosodic feature based on the hierarchical prosodic module, the low-level linguistic feature and the prosodic tag.
11 Citations
8 Claims
-
1. A speech-synthesizing device, comprising:
-
a hierarchical prosodic module generating at least a first hierarchical prosodic model; a prosody structure analyzing device, receiving a low-level linguistic feature, a high-level linguistic feature and a first prosodic feature, and generating at least a prosodic tag based on the low-level linguistic feature, the high-level linguistic feature, the first prosodic feature and the first hierarchical prosodic model, wherein the prosodic tag includes a prosodic break sequence describing at least an inter-syllable pause duration and a prosodic state sequence defining at least a syllable pitch contour, a syllable duration and a syllable energy level, and describes a Mandarin Chinese prosodic hierarchical structure including a syllable, a prosodic word, a prosodic phrase and one of a breath group and a prosodic phrase group; a prosody-synthesizing unit synthesizing a second prosodic feature based on the hierarchical prosodic module, the low-level linguistic feature and the prosodic tag; a prosodic feature extractor receiving a speech input and the low-level linguistic feature, segmenting the speech input to form a segmented speech, and generating the first prosodic feature based on the low-level linguistic feature and the segmented speech; and a prosody-synthesizing device, wherein the first hierarchical prosodic model is generated based on a first speech speed, on a condition that when the prosody-synthesizing device is going to generate a second speech speed being different from the first speech speed, the first hierarchical prosodic model is replaced with a second hierarchical prosodic model having the second speech speed and the prosody-synthesizing unit changes the second prosodic feature to a third prosodic feature, and the speech-synthesizing device generates a speech synthesis based on the third prosodic feature and the low-level linguistic feature. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method for synthesizing a speech, comprising steps of:
-
providing a hierarchical prosodic module, a low-level linguistic feature, a high-level linguistic feature and a first prosodic feature; generating at least a prosodic tag based on the low-level linguistic feature, the high-level linguistic feature, the first prosodic feature and the hierarchical prosodic module, wherein the prosodic tag includes a prosodic break sequence describing at least an inter-syllable pause duration and a prosodic state sequence defining at least a syllable pitch contour, a syllable duration and a syllable energy level, and describes a Mandarin Chinese prosodic hierarchical structure including a syllable, a prosodic word, a prosodic phrase and one of a breath group and a prosodic phrase group; and outputting the speech according to the prosodic tag. - View Dependent Claims (8)
-
Specification