Please download the dossier by clicking on the dossier button x
×

Emotional-speech synthesizing device, method of operating the same and mobile terminal including the same

  • US 9,881,603 B2
  • Filed: 09/18/2014
  • Issued: 01/30/2018
  • Est. Priority Date: 01/21/2014
  • Status: Active Grant
First Claim
Patent Images

1. A method for emotional speech synthesizing of a mobile terminal, the method comprising:

  • receiving, via a controller, a control command for outputting of emotional speech;

    recognizing, via the controller, a sentence comprising words that is input;

    calculating, via the controller, a probability vector of multiple pre-defined emotions for each of the words that makes up the recognized sentence, the probability vector means a value of frequency of usage of each of the multiple pre- defined emotions for each of the words in a database (DB) environment;

    applying, via the controller, a weight of the probability vector of the multiple pre-defined emotions of each of the words that are used in a real environment;

    adjusting a final value of the probability vector based on context information on the recognized sentence;

    estimating, via the controller, an emotion and a rhythm of each of the words;

    generating, via the controller, one integration emotion rhythm model based on the estimated rhythm and the context information, wherein the one integration emotion rhythm model estimates one integration rhythm based on the context information on the recognized sentence without estimating a separate rhythm for the emotion of each word;

    calculating, via the controller, in stages degrees of similarity in an emotion and a rhythm between adjacent words of the recognized sentence based on the estimated emotion and the generated integration emotion rhythm model wherein the probability vector of the multiple pre-defined emotions is updated to reflect the result of learning that is obtained through calculations of the probability vector;

    applying, via the controller, a different weight to all phoneme candidates corresponding to each of the words based on the degrees of the similarity in the estimated emotion and the estimated rhythm and the final value of the probability vector;

    selecting, via the controller, one phoneme candidate having a pitch contour that has a minimum distance value from a target pitch contour, among all the phoneme candidates to which the different weight is applied through a Viterbi search that is based on a cost function; and

    synthesizing, via the controller, an emotional speech that corresponds to the recognized sentence in optimal units by connecting the selected phoneme candidate for each of the words;

    outputting the emotional speech that is synthesized from the input text sentence; and

    displaying the input text sentence at the same speech as the speaker output the emotional speech.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×