STREAMING ENCODER, PROSODY INFORMATION ENCODING DEVICE, PROSODY-ANALYZING DEVICE, AND DEVICE AND METHOD FOR SPEECH SYNTHESIZING

US 20140222421A1
Filed: 01/30/2014
Published: 08/07/2014
Est. Priority Date: 02/05/2013
Status: Active Grant

First Claim

Patent Images

1. A speech-synthesizing device, comprising:

a hierarchical prosodic module generating at least a first hierarchical prosodic model;

a prosody-analyzing device, receiving a low-level linguistic feature, a high-level linguistic feature and a first prosodic feature, and generating at least a prosodic tag based on the low-level linguistic feature, the high-level linguistic feature, the first prosodic feature and the first hierarchical prosodic model; and

a prosody-synthesizing unit synthesizing a second prosodic feature based on the hierarchical prosodic module, the low-level linguistic feature and the prosodic tag.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech-synthesizing device includes a hierarchical prosodic module, a prosody-analyzing device, and a prosody-synthesizing unit. The hierarchical prosodic module generates at least a first hierarchical prosodic model. The prosody-analyzing device receives a low-level linguistic feature, a high-level linguistic feature and a first prosodic feature, and generates at least a prosodic tag based on the low-level linguistic feature, the high-level linguistic feature, the first prosodic feature and the first hierarchical prosodic model. The prosody-synthesizing unit synthesizes a second prosodic feature based on the hierarchical prosodic module, the low-level linguistic feature and the prosodic tag.

15 Citations

View as Search Results

20 Claims

1. A speech-synthesizing device, comprising:
- a hierarchical prosodic module generating at least a first hierarchical prosodic model;
  
  a prosody-analyzing device, receiving a low-level linguistic feature, a high-level linguistic feature and a first prosodic feature, and generating at least a prosodic tag based on the low-level linguistic feature, the high-level linguistic feature, the first prosodic feature and the first hierarchical prosodic model; and
  
  a prosody-synthesizing unit synthesizing a second prosodic feature based on the hierarchical prosodic module, the low-level linguistic feature and the prosodic tag.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. A speech-synthesizing device as claimed in claim 1, further comprising:
    - a prosodic feature extractor receiving a speech input and the low-level linguistic feature, segmenting the speech input to form a segmented speech, and generating the first prosodic feature based on the low-level linguistic feature and the segmented speech.
  - 3. A speech-synthesizing device as claimed in claim 2 further comprising a prosody-synthesizing device, wherein the first hierarchical prosodic model is generated based on a first speech speed, on a condition that when the prosody-synthesizing device is going to generate a second speech speed being different from the first speech speed, the first hierarchical prosodic model is replaced with a second hierarchical prosodic model having the second speech speed and the prosody-synthesizing unit changes the second prosodic feature to a third prosodic feature.
  - 4. A speech-synthesizing device as claimed in claim 3, wherein the speech-synthesizing device generates a speech synthesis with the second synthesis speech based on the third prosodic feature and the low-level linguistic feature.
  - 5. A speech-synthesizing device as claimed in claim 1, further comprising:
    - an encoder receiving the prosodic tag and the low-level linguistic feature to generate a code stream; and
      
      a decoder receiving the code stream, and restoring the prosodic tag and the low-level linguistic feature.
  - 6. A speech-synthesizing device as claimed in claim 5, wherein the encoder includes a first codebook providing an encoding bit corresponding to the prosodic tag and the low-level linguistic feature so as to generate the code stream, and the decoder includes a second codebook providing the encoding bit to reconstruct code stream to the prosodic tag and the low-level linguistic feature.
  - 7. A speech-synthesizing device as claimed in claim 5, further comprising:
    - a prosody-synthesizing device receiving the prosodic tag and the low-level linguistic feature reconstructed by the decoder to generate the second prosodic feature including a syllable pitch contour, a syllable duration, a syllable energy level and an inter-syllable pause duration.
  - 8. A speech-synthesizing device as claimed in claim 7, wherein the second prosodic feature is reconstructed by a superposition module.
  - 9. A speech-synthesizing device as claimed in claim 7, wherein the syllable juncture pause duration is reconstructed by looking up a codebook.

10. A prosodic information encoding apparatus, comprising:
- a speech segmentation and prosodic feature extracting device receiving a speech input and a low-level linguistic feature to generate a first prosodic feature;
  
  a prosodic structure analysis unit receiving the first prosodic feature, the low-level linguistic feature and a high-level linguistic feature, and generating a prosodic tag based on the first prosodic feature, the low-level linguistic feature and the high-level linguistic feature; and
  
  an encoder receiving the prosodic tag and the low-level linguistic feature to generate a code stream.

11. A code stream generating apparatus, comprising:
- a prosodic feature extractor generating a first prosodic feature;
  
  a hierarchical prosodic module providing a prosodic structure meaning for the first prosodic feature; and
  
  an encoder generating a code stream based on the first prosodic feature having the prosodic structure meaning,wherein the hierarchical prosodic module has at least two parameters being ones selected from the group consisting of a syllable duration, a pitch contour, a pause timing, a pause frequency, a pause duration and a combination thereof.

12. A method for synthesizing a speech, comprising steps of:
- providing a hierarchical prosodic module, a low-level linguistic feature, a high-level linguistic feature and a first prosodic feature;
  
  generating at least a prosodic tag based on the low-level linguistic feature, the high-level linguistic feature, the first prosodic feature and the hierarchical prosodic module; and
  
  outputting the speech according to the prosodic tag.
- View Dependent Claims (13)
- - 13. A method as claimed in claim 12, further comprising steps of:
    - providing an inputting speech;
      
      segmenting the inputting speech to generate a segmented input speech;
      
      extracting a prosodic feature from the segmented input speech according to the low-level linguistic feature to generate the first prosodic feature;
      
      analyzing the first prosodic feature to generate the prosodic tag;
      
      encoding the prosodic tag to form a code stream;
      
      decoding the code stream;
      
      synthesizing a second prosodic feature based on the low-level linguistic feature and the prosodic tag; and
      
      outputting the speech based on the low-level linguistic feature and the second prosodic feature.

14. A prosodic structure analysis unit, comprising:
- a first input terminal receiving a first prosodic feature;
  
  a second input terminal receiving a low-level linguistic feature;
  
  a third input terminal receiving a high-level linguistic feature; and
  
  an output terminal, wherein the prosodic structure analysis unit generates a prosodic tag at the output terminal based on the first prosodic feature, the low-level and the high-level linguistic features.

15. A speech-synthesizing device, comprising:
- a decoder receiving a code stream and restoring the code stream to generate a low-level linguistic feature and a prosodic tag;
  
  a hierarchical prosodic module receiving the low-level linguistic feature and the prosodic tag to generate a second prosodic feature; and
  
  a speech synthesizer generating a synthesized speech based on the low-level linguistic feature and the second prosodic feature.

16. A prosodic structure analysis apparatus, comprising:
- a hierarchical prosodic module generating a hierarchical prosodic model; and
  
  a prosodic structure analysis unit receiving a first prosodic feature, a low-level linguistic feature and a high-level linguistic feature, and generating a prosodic tag based on the first prosodic feature, the low-level and the high-level linguistic features and the hierarchical prosodic model.
- View Dependent Claims (17, 18, 19, 20)
- - 17. A prosodic structure analysis apparatus as claimed in claim 16, wherein the low-level linguistic feature includes a base-syllable type of a language and a pitch of the language.
  - 18. A prosodic structure analysis apparatus as claimed in claim 16, wherein the high-level linguistic feature includes a word, a part of speech and a punctuation mark.
  - 19. A prosodic structure analysis apparatus as claimed in claim 16, wherein the prosodic feature includes a syllable pitch contour, a syllable duration, a syllable energy level and a syllable juncture pause duration.
  - 20. A prosodic structure analysis apparatus as claimed in claim 16, wherein the prosodic structure analysis device performs an optimization algorithm by referring to the low-level linguistic feature and the high-level linguistic feature to generate the prosodic tag.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
National Chiao Tung University (Government of The Republic of China)
Original Assignee
National Chiao Tung University (Government of The Republic of China)
Inventors
Wang, Yih-Ru, Chen, Sin-Horng, Chiang, Chen-Yu, Hsieh, Chiao-Hua

Granted Patent

US 9,837,084 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/208
CPC Class Codes

G10L 13/02   Methods for producing synth...

G10L 13/10   Prosody rules derived from ...

G10L 19/00   Speech or audio signals ana...

G10L 19/0018   Speech coding using phoneti...

STREAMING ENCODER, PROSODY INFORMATION ENCODING DEVICE, PROSODY-ANALYZING DEVICE, AND DEVICE AND METHOD FOR SPEECH SYNTHESIZING

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

15 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

STREAMING ENCODER, PROSODY INFORMATION ENCODING DEVICE, PROSODY-ANALYZING DEVICE, AND DEVICE AND METHOD FOR SPEECH SYNTHESIZING

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

15 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links