Customizing the speaking style of a speech synthesizer based on semantic analysis

US 7,096,183 B2
Filed: 02/27/2002
Issued: 08/22/2006
Est. Priority Date: 02/27/2002
Status: Active Grant

First Claim

Patent Images

1. A method for generating synthesized speech, comprising:

receiving a block of input text into a text-to-speech synthesizing system;

partitioning the block of input text into a plurality of context spaces each containing multiple phrases;

performing semantic analysis on each context space in order to identify a topic for each context space;

selecting a speaking style for each context space from a plurality of predefined speaking styles based on the topics identified respective of the context spaces, where each speaking style correlates to prosodic parameters and is associated with one or more anticipated topics;

converting the sentences to corresponding phoneme data;

applying prosodic parameters which correlate to the selected speaking style to the phoneme data, thereby generating a prosodic representation of the phoneme data; and

generating audible speech using the prosodic representation of the phoneme data.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method is provided for customizing the speaking style of a speech synthesizer. The method includes: receiving input text; determining semantic information for the input text; determining a speaking style for rendering the input text based on the semantic information; and customizing the audible speech output of the speech synthesizer based on the identified speaking style.

198 Citations

9 Claims

1. A method for generating synthesized speech, comprising:
- receiving a block of input text into a text-to-speech synthesizing system;
  
  partitioning the block of input text into a plurality of context spaces each containing multiple phrases;
  
  performing semantic analysis on each context space in order to identify a topic for each context space;
  
  selecting a speaking style for each context space from a plurality of predefined speaking styles based on the topics identified respective of the context spaces, where each speaking style correlates to prosodic parameters and is associated with one or more anticipated topics;
  
  converting the sentences to corresponding phoneme data;
  
  applying prosodic parameters which correlate to the selected speaking style to the phoneme data, thereby generating a prosodic representation of the phoneme data; and
  
  generating audible speech using the prosodic representation of the phoneme data.
- View Dependent Claims (2, 6, 7, 8)
- - 2. The method of claim 1 wherein the step of determining a topic for the input text further comprises:
    - defining a plurality of anticipated topics, such that each anticipated topic is associated with keywords that are indicative of the topic;
      
      determining frequency of the keywords in the input text; and
      
      selecting a topic for the input text from the plurality of anticipated topics based on the frequency of keyword occurrences contained therein.
  - 6. The method of claim 1 wherein the step of customizing an output parameter further comprises generating synthesized speech.
  - 7. The method of claim 1 wherein the step of customizing an output parameter further comprises correlating the selected speaking style to one or more prosodic parameters and rendering audible speech for the input text using the prosodic parameters.
  - 8. The method of claim 1 wherein the step of customizing an output parameter further comprises modifying at least one of an expression of a visually displayed talking head and another attribute of a visual display.

3. A method for customizing the speaking style of a text-to-speech synthesizer system, comprising:
- receiving a block of input text which;
  
  partitioning the block of input text into a plurality of context spaces each containing multiple phrases;
  
  determining semantic information for each context spaceselecting a speaking style for each context space from a plurality of predefined speaking styles based on the semantic information, where each speaking style correlates to prosodic parameters and is associated with one or more anticipated topics; and
  
  customizing an output parameter of a multimedia user interface of the text-to-speech synthesizer system based on the speaking style, where the text-to-speech synthesizer system is operable to render audible speech which correlates to the input text.
- View Dependent Claims (4, 5)
- - 4. The method of claim 3 wherein the step of determining semantic information further comprises determining a topic for the input text.
  - 5. The method of claim 3 wherein the step of determining semantic information further comprises partitioning the input text into a plurality of context spaces, and determining a topic for each of the plurality of context spaces.

9. A text-to-speech synthesizer system, comprising:
- a text analyzer receptive of a block of input text and operable to partition the block of input text into a plurality of context spaces each containing multiple phrases and determine semantic information for each context space;
  
  a style selector adapted to receive semantic information from the text analyzer and operable to determine, for each context space, a speaking style for rendering the input text contained in that context space based on the semantic information, where the selected speaking style correlates to one or more prosodic attributes;
  
  a phonetic analyzer adapted to receive input text from the text analyzer and operable to convert the input text into corresponding phoneme data;
  
  a prosodic analyzer adapted to receive phoneme data from the phonetic analyzer and the prosodic attributes from the style selector, the prosodic analyzer further operable to apply the prosodic attributes to the phoneme data to form a prosodic representation of the phoneme data; and
  
  a speech synthesizer adapted to receive the prosodic representation of the phoneme data from the prosodic analyzer and operable to generate audible speech.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sovereign Peak Ventures, LLC (Dominion Harbor Enterprises, LLC)
Original Assignee
Matsushita Electric Industrial Company Limited (Panasonic Holdings Corporation)
Inventors
Junqua, Jean-claude
Primary Examiner(s)
HARPER, V PAUL

Application Number

US10/083,839
Publication Number

US 20030163314A1
Time in Patent Office

1,637 Days
Field of Search

704/260, 704/258, 704/268
US Class Current

704/258
CPC Class Codes

G10L 13/08 Text analysis or generation...

Customizing the speaking style of a speech synthesizer based on semantic analysis

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

198 Citations

9 Claims

Specification

Use Cases

Quick Links

Others

Customizing the speaking style of a speech synthesizer based on semantic analysis

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

198 Citations

9 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others