×

Method and system for achieving emotional text to speech utilizing emotion tags expressed as a set of emotion vectors

  • US 10,002,605 B2
  • Filed: 12/12/2016
  • Issued: 06/19/2018
  • Est. Priority Date: 08/31/2010
  • Status: Active Grant
First Claim
Patent Images

1. A method for achieving emotional Text To Speech (TTS), the method comprising:

  • receiving a set of text data;

    organizing each of a plurality of words in the set of text data into a plurality of rhythm pieces;

    generating an emotion tag for each of the plurality of rhythm pieces, wherein each emotion tag is expressed as a set of emotion vectors, each emotion vector comprising a plurality of emotion scores, where each of the plurality of emotion scores is assigned to a different emotion category in a plurality of emotion categories;

    determining, for each of the plurality of rhythm pieces, a final emotion score for the rhythm piece based on at least each of the plurality of emotion scores;

    determining, for each of the plurality of rhythm pieces, a final emotional category for the rhythm piece based on at least each of the plurality of emotion categories;

    applying emotion smoothing to the set of text data based on the emotion tags generated for the plurality of rhythm pieces, wherein applying emotion smoothing comprisesdetermining a plurality of emotion paths based on adjacent probabilities between the final emotional categories determined for the plurality of rhythm pieces;

    determining a final emotion path from the plurality of emotion paths based on a sum of adjacent probability and a sum of emotion score for each emotion path in the plurality of emotion paths; and

    updating the final emotional category for each rhythm piece based on the final emotion path; and

    performing, by at least one processor of at least one computing device, TTS of the set of text data utilizing each of the emotion tags, where performing TTS comprisesdecomposing at least one rhythm piece in the plurality of rhythm pieces into a set of phones; and

    synthesizing the at least one rhythm piece into audio comprising at least one emotion characteristic based on at least one speech feature of each phone in the set of phones,where the at least one speech feature is calculated as a function of at least the final emotion score, the updated final emotion category, a speech feature value of a given speech feature in a neutral emotion category, and a speech feature value of a given speech feature in the updated final emotion category.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×