×

Streaming encoder, prosody information encoding device, prosody-analyzing device, and device and method for speech synthesizing

  • US 9,837,084 B2
  • Filed: 01/30/2014
  • Issued: 12/05/2017
  • Est. Priority Date: 02/05/2013
  • Status: Active Grant
First Claim
Patent Images

1. A speech-synthesizing device, comprising:

  • a hierarchical prosodic module generating at least a first hierarchical prosodic model;

    a prosody structure analyzing device, receiving a low-level linguistic feature, a high-level linguistic feature and a first prosodic feature, and generating at least a prosodic tag based on the low-level linguistic feature, the high-level linguistic feature, the first prosodic feature and the first hierarchical prosodic model, wherein the prosodic tag includes a prosodic break sequence describing at least an inter-syllable pause duration and a prosodic state sequence defining at least a syllable pitch contour, a syllable duration and a syllable energy level, and describes a Mandarin Chinese prosodic hierarchical structure including a syllable, a prosodic word, a prosodic phrase and one of a breath group and a prosodic phrase group;

    a prosody-synthesizing unit synthesizing a second prosodic feature based on the hierarchical prosodic module, the low-level linguistic feature and the prosodic tag;

    a prosodic feature extractor receiving a speech input and the low-level linguistic feature, segmenting the speech input to form a segmented speech, and generating the first prosodic feature based on the low-level linguistic feature and the segmented speech; and

    a prosody-synthesizing device, wherein the first hierarchical prosodic model is generated based on a first speech speed, on a condition that when the prosody-synthesizing device is going to generate a second speech speed being different from the first speech speed, the first hierarchical prosodic model is replaced with a second hierarchical prosodic model having the second speech speed and the prosody-synthesizing unit changes the second prosodic feature to a third prosodic feature, and the speech-synthesizing device generates a speech synthesis based on the third prosodic feature and the low-level linguistic feature.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×