Converting text-to-speech and adjusting corpus

US 7,617,105 B2
Filed: 05/27/2005
Issued: 11/10/2009
Est. Priority Date: 05/31/2004
Status: Active Grant

First Claim

Patent Images

1. A method for text to speech conversion, comprising:

a text analysis step for parsing the text to obtain descriptive prosody annotations of the text based on a text to speech model generated from a first corpus;

a prosody parameter prediction step for predicting the prosody parameter of the text according to the result of text analysis step; and

a speech synthesis step for synthesizing speech of said text based on said predicted prosody parameter of the text;

Wherein descriptive prosody annotations of the text include prosody structure of the text, the prosody structure of the text is adjusted according to a target speech speed for the synthesized speech,wherein said descriptive prosody annotations of the text further include pronunciation and accent annotation;

wherein said prosody parameters of the text include the value of pitch, duration and energy;

wherein said prosody structure includes prosody word, prosody phrase and intonation phrase;

wherein said prosody structure of the text is adjusted by adjusting the distribution of the prosody phrase length of the text;

wherein said first corpus has a first distribution of prosody phrase length corresponding to a first threshold for prosody boundary probability under a first speech speed, the distribution of the prosody phrase length of the text is adjusted by the following steps;

adjusting the distribution of the prosody phrase length of the first corpus by adjusting the first threshold for prosody boundary probability; and

carrying out said text analysis step by parsing the text according to the adjusted first corpus, andfurther comprising;

acoustically evaluating the synthesized speech of the text; and

adjusting the prosody structure of the text according to the acoustic evaluation result,wherein said target speech speed corresponds to a second speech speed of a second corpus,wherein said prosody structure includes prosody phrase, said prosody structure of the text is adjusted by adjusting the distribution of the prosody phrase length of the text to a target distribution,wherein said first corpus having a first distribution for prosody phrase length corresponding to a first threshold for prosody boundary probability under a first speech speed, said second corpus having a second distribution for prosody phrase length corresponding to a second threshold for prosody boundary probability under said second speech speed, the prosody structure of the text is adjusted by the following steps;

adjusting the first threshold for prosody boundary probability according to the target speech speed, such that the distribution for prosody phrase length of the first corpus matches that of the second corpus; and

carrying out the text analysis step by parsing the text according to the adjusted first corpus, andwherein the prosody parameter is adjusted according to the target speech speed;

wherein the duration of the prosody parameter is adjusted according to the target speech speed;

wherein the prosody phrase length distribution of the text is adjusted with a curve fitting method;

wherein the prosody phrase length distribution of the text is adjusted by adjusting the distribution of prosody phrase with maximum length or maximum phrase number,wherein adjusting the prosody structure of the text further comprises adjusting the intonation phrase of the text.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for text to speech conversion and for adjusting a corpus. The method includes a text analysis step for parsing the text to obtain descriptive prosody annotations of the text based on a TTS model generated from a first corpus, a prosody parameter prediction step for predicting the prosody parameter of the text according to the result of the text analysis step, a speech synthesis step for synthesizing speech of the text based on the prosody parameter of the text, adjusting according to a target speech speed for the synthesized speech when necessary.

26 Citations

View as Search Results

1 Claim

1. A method for text to speech conversion, comprising:
- a text analysis step for parsing the text to obtain descriptive prosody annotations of the text based on a text to speech model generated from a first corpus;
  
  a prosody parameter prediction step for predicting the prosody parameter of the text according to the result of text analysis step; and
  
  a speech synthesis step for synthesizing speech of said text based on said predicted prosody parameter of the text;
  
  Wherein descriptive prosody annotations of the text include prosody structure of the text, the prosody structure of the text is adjusted according to a target speech speed for the synthesized speech,wherein said descriptive prosody annotations of the text further include pronunciation and accent annotation;
  
  wherein said prosody parameters of the text include the value of pitch, duration and energy;
  
  wherein said prosody structure includes prosody word, prosody phrase and intonation phrase;
  
  wherein said prosody structure of the text is adjusted by adjusting the distribution of the prosody phrase length of the text;
  
  wherein said first corpus has a first distribution of prosody phrase length corresponding to a first threshold for prosody boundary probability under a first speech speed, the distribution of the prosody phrase length of the text is adjusted by the following steps;
  
  adjusting the distribution of the prosody phrase length of the first corpus by adjusting the first threshold for prosody boundary probability; and
  
  carrying out said text analysis step by parsing the text according to the adjusted first corpus, andfurther comprising;
  
  acoustically evaluating the synthesized speech of the text; and
  
  adjusting the prosody structure of the text according to the acoustic evaluation result,wherein said target speech speed corresponds to a second speech speed of a second corpus,wherein said prosody structure includes prosody phrase, said prosody structure of the text is adjusted by adjusting the distribution of the prosody phrase length of the text to a target distribution,wherein said first corpus having a first distribution for prosody phrase length corresponding to a first threshold for prosody boundary probability under a first speech speed, said second corpus having a second distribution for prosody phrase length corresponding to a second threshold for prosody boundary probability under said second speech speed, the prosody structure of the text is adjusted by the following steps;
  
  adjusting the first threshold for prosody boundary probability according to the target speech speed, such that the distribution for prosody phrase length of the first corpus matches that of the second corpus; and
  
  carrying out the text analysis step by parsing the text according to the adjusted first corpus, andwherein the prosody parameter is adjusted according to the target speech speed;
  
  wherein the duration of the prosody parameter is adjusted according to the target speech speed;
  
  wherein the prosody phrase length distribution of the text is adjusted with a curve fitting method;
  
  wherein the prosody phrase length distribution of the text is adjusted by adjusting the distribution of prosody phrase with maximum length or maximum phrase number,wherein adjusting the prosody structure of the text further comprises adjusting the intonation phrase of the text.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cerence Operating Company (Cerence Inc.)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Zhu, Wei Bin, Shi, Qin, Zhang, Wei, Chai, Hai Xin
Primary Examiner(s)
Opsasnick; Michael N

Application Number

US11/140,190
Publication Number

US 20050267758A1
Time in Patent Office

1,628 Days
Field of Search

704/260, 704/267
US Class Current

704/260
CPC Class Codes

G10L 13/10 Prosody rules derived from ...

G10L 21/04 Time compression or expansion

Converting text-to-speech and adjusting corpus

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

26 Citations

1 Claim

Specification

Solutions

Use Cases

Quick Links

Converting text-to-speech and adjusting corpus

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

26 Citations

1 Claim

Specification

Subscription Required

Solutions

Use Cases

Quick Links