×

Converting text-to-speech and adjusting corpus

  • US 7,617,105 B2
  • Filed: 05/27/2005
  • Issued: 11/10/2009
  • Est. Priority Date: 05/31/2004
  • Status: Active Grant
First Claim
Patent Images

1. A method for text to speech conversion, comprising:

  • a text analysis step for parsing the text to obtain descriptive prosody annotations of the text based on a text to speech model generated from a first corpus;

    a prosody parameter prediction step for predicting the prosody parameter of the text according to the result of text analysis step; and

    a speech synthesis step for synthesizing speech of said text based on said predicted prosody parameter of the text;

    Wherein descriptive prosody annotations of the text include prosody structure of the text, the prosody structure of the text is adjusted according to a target speech speed for the synthesized speech,wherein said descriptive prosody annotations of the text further include pronunciation and accent annotation;

    wherein said prosody parameters of the text include the value of pitch, duration and energy;

    wherein said prosody structure includes prosody word, prosody phrase and intonation phrase;

    wherein said prosody structure of the text is adjusted by adjusting the distribution of the prosody phrase length of the text;

    wherein said first corpus has a first distribution of prosody phrase length corresponding to a first threshold for prosody boundary probability under a first speech speed, the distribution of the prosody phrase length of the text is adjusted by the following steps;

    adjusting the distribution of the prosody phrase length of the first corpus by adjusting the first threshold for prosody boundary probability; and

    carrying out said text analysis step by parsing the text according to the adjusted first corpus, andfurther comprising;

    acoustically evaluating the synthesized speech of the text; and

    adjusting the prosody structure of the text according to the acoustic evaluation result,wherein said target speech speed corresponds to a second speech speed of a second corpus,wherein said prosody structure includes prosody phrase, said prosody structure of the text is adjusted by adjusting the distribution of the prosody phrase length of the text to a target distribution,wherein said first corpus having a first distribution for prosody phrase length corresponding to a first threshold for prosody boundary probability under a first speech speed, said second corpus having a second distribution for prosody phrase length corresponding to a second threshold for prosody boundary probability under said second speech speed, the prosody structure of the text is adjusted by the following steps;

    adjusting the first threshold for prosody boundary probability according to the target speech speed, such that the distribution for prosody phrase length of the first corpus matches that of the second corpus; and

    carrying out the text analysis step by parsing the text according to the adjusted first corpus, andwherein the prosody parameter is adjusted according to the target speech speed;

    wherein the duration of the prosody parameter is adjusted according to the target speech speed;

    wherein the prosody phrase length distribution of the text is adjusted with a curve fitting method;

    wherein the prosody phrase length distribution of the text is adjusted by adjusting the distribution of prosody phrase with maximum length or maximum phrase number,wherein adjusting the prosody structure of the text further comprises adjusting the intonation phrase of the text.

View all claims
  • 8 Assignments
Timeline View
Assignment View
    ×
    ×