Emotional-speech synthesizing device, method of operating the same and mobile terminal including the same

US 9,881,603 B2
Filed: 09/18/2014
Issued: 01/30/2018
Est. Priority Date: 01/21/2014
Status: Active Grant

First Claim

Patent Images

1. A method for emotional speech synthesizing of a mobile terminal, the method comprising:

receiving, via a controller, a control command for outputting of emotional speech;

recognizing, via the controller, a sentence comprising words that is input;

calculating, via the controller, a probability vector of multiple pre-defined emotions for each of the words that makes up the recognized sentence, the probability vector means a value of frequency of usage of each of the multiple pre- defined emotions for each of the words in a database (DB) environment;

applying, via the controller, a weight of the probability vector of the multiple pre-defined emotions of each of the words that are used in a real environment;

adjusting a final value of the probability vector based on context information on the recognized sentence;

estimating, via the controller, an emotion and a rhythm of each of the words;

generating, via the controller, one integration emotion rhythm model based on the estimated rhythm and the context information, wherein the one integration emotion rhythm model estimates one integration rhythm based on the context information on the recognized sentence without estimating a separate rhythm for the emotion of each word;

calculating, via the controller, in stages degrees of similarity in an emotion and a rhythm between adjacent words of the recognized sentence based on the estimated emotion and the generated integration emotion rhythm model wherein the probability vector of the multiple pre-defined emotions is updated to reflect the result of learning that is obtained through calculations of the probability vector;

applying, via the controller, a different weight to all phoneme candidates corresponding to each of the words based on the degrees of the similarity in the estimated emotion and the estimated rhythm and the final value of the probability vector;

selecting, via the controller, one phoneme candidate having a pitch contour that has a minimum distance value from a target pitch contour, among all the phoneme candidates to which the different weight is applied through a Viterbi search that is based on a cost function; and

synthesizing, via the controller, an emotional speech that corresponds to the recognized sentence in optimal units by connecting the selected phoneme candidate for each of the words;

outputting the emotional speech that is synthesized from the input text sentence; and

displaying the input text sentence at the same speech as the speaker output the emotional speech.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Provided is an emotional-speech synthesizing device including: a sentence recognition unit that recognizes a sentence that is input; a word emotion determination unit that calculates probability vector of an emotion that is pre-defined for each word that makes up the recognized sentence and estimates the emotion and a rhythm based on the probability vector; and an emotional-speech synthesizing unit. The emotional-speech synthesizing unit calculates in stages degrees of similarity in the emotion and the rhythm between the adjacent words based on context information on the recognized sentence, applies weight to a phoneme candidate corresponding to the each word based on the degrees of the similarity and the probability vector, selects the phoneme candidate that has a minimum target pitch, minimum duration time, a minimum distance value of a target pitch contour, and thus synthesizes an emotional speech that corresponds to the recognized sentence in optimal units.

Citations

5 Claims

1. A method for emotional speech synthesizing of a mobile terminal, the method comprising:
- receiving, via a controller, a control command for outputting of emotional speech;
  
  recognizing, via the controller, a sentence comprising words that is input;
  
  calculating, via the controller, a probability vector of multiple pre-defined emotions for each of the words that makes up the recognized sentence, the probability vector means a value of frequency of usage of each of the multiple pre- defined emotions for each of the words in a database (DB) environment;
  
  applying, via the controller, a weight of the probability vector of the multiple pre-defined emotions of each of the words that are used in a real environment;
  
  adjusting a final value of the probability vector based on context information on the recognized sentence;
  
  estimating, via the controller, an emotion and a rhythm of each of the words;
  
  generating, via the controller, one integration emotion rhythm model based on the estimated rhythm and the context information, wherein the one integration emotion rhythm model estimates one integration rhythm based on the context information on the recognized sentence without estimating a separate rhythm for the emotion of each word;
  
  calculating, via the controller, in stages degrees of similarity in an emotion and a rhythm between adjacent words of the recognized sentence based on the estimated emotion and the generated integration emotion rhythm model wherein the probability vector of the multiple pre-defined emotions is updated to reflect the result of learning that is obtained through calculations of the probability vector;
  
  applying, via the controller, a different weight to all phoneme candidates corresponding to each of the words based on the degrees of the similarity in the estimated emotion and the estimated rhythm and the final value of the probability vector;
  
  selecting, via the controller, one phoneme candidate having a pitch contour that has a minimum distance value from a target pitch contour, among all the phoneme candidates to which the different weight is applied through a Viterbi search that is based on a cost function; and
  
  synthesizing, via the controller, an emotional speech that corresponds to the recognized sentence in optimal units by connecting the selected phoneme candidate for each of the words;
  
  outputting the emotional speech that is synthesized from the input text sentence; and
  
  displaying the input text sentence at the same speech as the speaker output the emotional speech.
- View Dependent Claims (2)
- - 2. The method of claim 1, wherein the recognizing of the sentence includes analyzing the recognized sentence and converting the recognized sentence into phonemes according to a linguistic feature.

3. A mobile terminal comprising:
- a key configured to input a control command for synthesizing an emotional speech;
  
  a memory configured to store an emotion word dictionary in which each of the words is classified as an entry having multiple pre-defined emotions;
  
  a controller configured to;
  
  receive at least one sentence comprising words that is input as text, based on the control command,calculate a probability vector of the multiple pre-defined emotions for each of the words that makes up the recognized sentence, the probability vector means a value of frequency of usage of each of the multiple pre-defined emotions for each of the words in a database (DB) environment, wherein the probability vector of the multiple pre-defined emotions is updated to reflect the result of learning that is obtained through calculations of the probability vector,apply a weight of the probability vector of the multiple pre-defined emotions of each of the words that are used in a real environment,adjust a final value of the probability vector based on context information on the recognized sentence,estimate an emotion and a rhythm of each of the words,generate one integration emotion rhythm model based on the estimated rhythm and the context information, wherein the one integration emotion rhythm model estimates one integration rhythm based on the context information on the recognized sentence without estimating a separate rhythm for the emotion of each word,calculate in stages degrees of similarity in an emotion and a rhythm between adjacent words of the recognized sentence based on the estimated emotion and the generated integration emotion rhythm model,apply a different weight to all phoneme candidates corresponding to each of the words based on the degrees of the similarity in the estimated emotion and the estimated rhythm and the final value of the probability vector,select one phoneme having a pitch contour that has a minimum distance value from a target pitch contour, among all the phoneme candidates to which the different weight is applied through a Viterbi search that is based on a cost function, andsynthesize the emotional speech that corresponds to the recognized sentence in optimal units by connecting the selected phoneme candidate for each of the words;
  
  a speaker configured to output the emotional speech that is synthesized from the input text sentence; and
  
  a display configured to display the input text sentence at the same speed as the speaker outputs the emotional speech.
- View Dependent Claims (4, 5)
- - 4. The mobile terminal of claim 3, wherein the controller analyzes the received sentence and converts the received sentence into phonemes according to a linguistic feature.
  - 5. The mobile terminal of claim 3, wherein the context information includes at least one or more among sentence division-reading information, part-of-speech information, and sentence structure information.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
LG Electronics, Inc. (LG Corporation)
Original Assignee
LG Electronics, Inc. (LG Corporation)
Inventors
Kim, Jaemin, Yang, Jongyeol
Primary Examiner(s)
Sirjani, Fariba

Application Number

US15/110,034
Publication Number

US 20160329043A1
Time in Patent Office

1,230 Days
Field of Search

None
US Class Current
CPC Class Codes

G06F 3/167   Audio in a user interface, ...

G10L 13/02   Methods for producing synth...

G10L 13/0335   Pitch control

G10L 13/07   Concatenation rules

G10L 13/10   Prosody rules derived from ...

G10L 2013/105   Duration

Emotional-speech synthesizing device, method of operating the same and mobile terminal including the same

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

5 Claims

Specification

Solutions

Use Cases

Quick Links

Emotional-speech synthesizing device, method of operating the same and mobile terminal including the same

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

5 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links