EMOTIONAL-SPEECH SYNTHESIZING DEVICE, METHOD OF OPERATING THE SAME AND MOBILE TERMINAL INCLUDING THE SAME

US 20160329043A1
Filed: 09/18/2014
Published: 11/10/2016
Est. Priority Date: 01/21/2014
Status: Active Grant

First Claim

Patent Images

1. An emotional-speech synthesizing unit that is configured to:

calculate in stages degrees of similarity in the emotion and the rhythm between the adjacent words based on context information on the recognized sentence,apply weight to a phoneme candidate corresponding to the each word based on the degrees of the similarity and the probability vector, select the phoneme candidate that has a minimum target pitch, minimum duration time, a minimum distance value of a target pitch contour, andsynthesize an emotional speech that corresponds to the recognized sentence in optimal units.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Provided is an emotional-speech synthesizing device including: a sentence recognition unit that recognizes a sentence that is input; a word emotion determination unit that calculates probability vector of an emotion that is pre-defined for each word that makes up the recognized sentence and estimates the emotion and a rhythm based on the probability vector; and an emotional-speech synthesizing unit. The emotional-speech synthesizing unit calculates in stages degrees of similarity in the emotion and the rhythm between the adjacent words based on context information on the recognized sentence, applies weight to a phoneme candidate corresponding to the each word based on the degrees of the similarity and the probability vector, selects the phoneme candidate that has a minimum target pitch, minimum duration time, a minimum distance value of a target pitch contour, and thus synthesizes an emotional speech that corresponds to the recognized sentence in optimal units.

Citations

20 Claims

1. An emotional-speech synthesizing unit that is configured to:
- calculate in stages degrees of similarity in the emotion and the rhythm between the adjacent words based on context information on the recognized sentence,apply weight to a phoneme candidate corresponding to the each word based on the degrees of the similarity and the probability vector, select the phoneme candidate that has a minimum target pitch, minimum duration time, a minimum distance value of a target pitch contour, andsynthesize an emotional speech that corresponds to the recognized sentence in optimal units.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The emotional-speech synthesizing device of claim 1, further comprising a sound output unit that is configured to the emotional speech that is synthesized by the emotional-speech synthesizing unit.
  - 3. The emotional-speech synthesizing of claim 1, further comprising:
    - a phoneme conversion unit that is configured to analyze the recognized sentence and convert the recognized sentence into phonemes according to a linguistic feature,wherein the word emotion determination unit calculates the probability vector of the emotion that is pre-defined for the each word that makes up the sentence that is converted into the phonemes.
  - 4. The emotional-speech synthesizing device of claim 1, wherein when calculating the probability vector, the word emotion determination unit applies weight of the probability vector of the emotion of the each word that is used in a real environment.
  - 5. The emotional-speech synthesizing device of claim 4, wherein the word emotion determination unit performs updating to reflect a result of learning that is obtained through the repeated calculations of the probability vector.
  - 6. The emotional-speech synthesizing device of claim 1, wherein based on the context information on the recognized sentence, the word emotion determination unit calculates a final value of the probability vector.
  - 7. The emotional-speech synthesizing device of claim 1, wherein when estimating a rhythm of the each word, based on the context information on the recognized sentence, the word emotion determination unit includes a context information field for generating one integration rhythm model.
  - 8. The emotional-speech synthesizing device of claim 1, further comprising an emotion word dictionary unit in which the each word is classified as an entry having at least multiple pre-defined emotions and the categorized words is stored as entries to create an emotion word dictionary.

9. A method of operating an emotional-speech synthesizing device, comprising:
- recognizing a sentence that is input;
  
  calculating probability vector of an emotion that is pre-defined for each word that makes up the recognized sentence;
  
  estimating the emotion and a rhythm based on the calculated probability vector;
  
  calculating in stages degrees of similarity in the emotion and the rhythm between the adjacent words based on context information on the recognized sentence and applying weight to a phoneme candidate corresponding to the each word based on the degrees of the similarity and the probability vector; and
  
  selecting the phoneme candidate that has a minimum target pitch, minimum duration time, a minimum distance value of a target pitch contour, and thus synthesizing an emotional speech that corresponds to the recognized sentence in optimal units.
- View Dependent Claims (10, 11, 12)
- - 10. The method of claim 9 further comprising outputting the synthesized emotional speech.
  - 11. The method of claim 9, wherein the recognizing of the sentence that is input includes analyzing the recognized sentence and converting the recognized sentence into phonemes according to a linguistic feature.
  - 12. The method of claim 9, wherein in the calculating of the probability vector, when calculating the probability vector, weight of the probability vector of the emotion of the each word that is used in a real environment is applied.

13. A mobile terminal comprising:
- an input unit that is configured in such a manner that a control command for outputting an emotional speech to the input unit;
  
  a controller that is configured to;
  
  recognize at least one sentence that is input, based on the control command,calculate probability vector of an emotion that is pre-defined for each word that makes up the recognized sentence,estimate the emotion and a rhythm based on the probability vector, calculate in stages degrees of similarity in the emotion and the rhythm between the adjacent words based on context information on the recognized sentence,apply weight to a phoneme candidate corresponding to the each word based on the degrees of the similarity and the probability vector, select the phoneme candidate that has a minimum target pitch, minimum duration time, a minimum distance value of a target pitch contour, andsynthesize an emotional speech that corresponds to the recognized sentence in optimal units, and a sound output unit that is configured to output the emotional speech that is synthesized by the controller.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
- - 14. The mobile terminal of claim 13, wherein the controller converts the recognized sentence into text and outputs the text-converted sentence to a display unit in such a manner as to correspond to the outputting of the emotional speech.
  - 15. The mobile terminal of claim 13, wherein the controller further includes a phoneme conversion module that analyzes the recognized sentence and converts the recognized sentence into phonemes according to a linguistic feature, and a context information module that generates one integration rhythm model based on the context information on the recognized sentence when estimating the rhythm of the each word.
  - 16. The mobile terminal of claim 13, wherein when calculating the probability vector, the controller applies weight of the probability vector of the emotion of the each word that is used in a real environment.
  - 17. The mobile terminal of claim 16, wherein the controller performs updating to reflect a result of learning that is obtained through the repeated calculations of the probability vector.
  - 18. The mobile terminal of claim 13, wherein based on the context information on the recognized sentence, the controller calculates a final value of the probability vector.
  - 19. The mobile terminal of claim 18, wherein the context information includes at least one or more among sentence division-reading information, part-of-speech information, and sentence structure information.
  - 20. The mobile terminal of claim 13, further comprising a memory in which the each word is classified as an entry having at least multiple pre-defined emotions and the categorized words as entries is stored to create an emotion word dictionary.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
LG Electronics, Inc. (LG Corporation)
Original Assignee
LG Electronics, Inc. (LG Corporation)
Inventors
YANG, Jongyeol, KIM, Jaemin

Granted Patent

US 9,881,603 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 3/167   Audio in a user interface, ...

G10L 13/02   Methods for producing synth...

G10L 13/0335   Pitch control

G10L 13/07   Concatenation rules

G10L 13/10   Prosody rules derived from ...

G10L 2013/105   Duration

EMOTIONAL-SPEECH SYNTHESIZING DEVICE, METHOD OF OPERATING THE SAME AND MOBILE TERMINAL INCLUDING THE SAME

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

EMOTIONAL-SPEECH SYNTHESIZING DEVICE, METHOD OF OPERATING THE SAME AND MOBILE TERMINAL INCLUDING THE SAME

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links