Training and applying prosody models

US 9,070,365 B2
Filed: 09/10/2014
Issued: 06/30/2015
Est. Priority Date: 08/12/2008
Status: Active Grant

First Claim

Patent Images

1. A computer-implementable method for synthesizing audible speech, with varying prosody, from textual content, the method comprising:

building an inventory of prosody models with designated characteristics;

selecting a target prosody model for training based on input parameters and first keywords related to a first text segment of a first text with prosody annotations;

training the target prosody models based on the prosody annotations of the first text segment;

maintaining associations between the first keywords, the designated characteristics, and the input characteristics;

selecting multiple prosody models for application for a second text segment of a second text based on second keywords related to the second text and the associations;

applying the multiple application prosody models to the second text segment;

reconciling conflicts from the application of the multiple application prosody models to generate reconciled prosody information; and

generating audible speech for the second text segment based on the reconciled prosody information using a text-to-speech synthesis engine.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.

Citations

13 Claims

1. A computer-implementable method for synthesizing audible speech, with varying prosody, from textual content, the method comprising:
- building an inventory of prosody models with designated characteristics;
  
  selecting a target prosody model for training based on input parameters and first keywords related to a first text segment of a first text with prosody annotations;
  
  training the target prosody models based on the prosody annotations of the first text segment;
  
  maintaining associations between the first keywords, the designated characteristics, and the input characteristics;
  
  selecting multiple prosody models for application for a second text segment of a second text based on second keywords related to the second text and the associations;
  
  applying the multiple application prosody models to the second text segment;
  
  reconciling conflicts from the application of the multiple application prosody models to generate reconciled prosody information; and
  
  generating audible speech for the second text segment based on the reconciled prosody information using a text-to-speech synthesis engine.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, wherein the input parameters comprise the identity, type, or role of a speaker of first text segment.
  - 3. The method of claim 2, wherein the input parameters indicate geographical information related to a speaker of the first text segment.
  - 4. The method of claim 3, wherein the input parameters comprise emotion designators.
  - 5. The method of claim 4, wherein the input parameters further comprise weights to indicate the weights the first text should have in the multiple prosody models.
  - 6. The method of claim 4, wherein the reconciling is based in part on weights that indicate relative contributions of conflicting prosody information.

7. A system for synthesizing audible speech, with varying prosody, from textual content, the system comprising:
- a non-transitory memory for storing instructions;
  
  instructions stored on the non-transitory memory, the instructions executable on a processor to;
  
  build an inventory of prosody models with designated characteristics;
  
  select a target prosody model for training based on input parameters and first keywords related to a first text segment of a first text with prosody annotations;
  
  train the target prosody models based on the prosody annotations of the first text segment;
  
  use associations between the first keywords, the designated characteristics, and the input characteristics;
  
  select multiple prosody models for application for a second text segment of a second text based on second keywords related to the second text segment and the associations;
  
  apply the multiple application prosody models to the second text;
  
  reconcile conflicts from the application of the multiple application prosody models to generate reconciled prosody information and a speech synthesis engine that generates audible speech for the second text segment based on the reconciled prosody information using a text-to-speech synthesis engine.
- View Dependent Claims (8, 9, 10, 11, 12, 13)
- - 8. The system of claim 7, wherein the input parameters comprise the identity, type, or role of a speaker of the first text segment.
  - 9. The system of claim 8, wherein the input parameters indicate geographical information related to a speaker of the second text segment.
  - 10. The system of claim 9, wherein the input parameters comprise emotion designators.
  - 11. The system of claim 10, wherein the reconciliation eliminates conflicting annotations that result from applications of multiple models to the second text.
  - 12. The system of claim 7, the system further operable to build a lexicon of phrases statistically associated with the designated characteristics.
  - 13. The system of claim 12, the system further operable to select multiple prosody models for application for the second text segment based on the lexicon of phrases statistically associated with the designated characteristics.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Morphism LLC
Original Assignee
Morphism LLC
Inventors
Stephens, Jr., James H.
Primary Examiner(s)
GUERRA-ERAZO, EDGAR X

Application Number

US14/482,343
Publication Number

US 20150012277A1
Time in Patent Office

293 Days
Field of Search

704/260, 704/261, 704/258
US Class Current

1/1
CPC Class Codes

G10L 13/08   Text analysis or generation...

G10L 13/10   Prosody rules derived from ...

G10L 15/063   Training

G10L 15/1807   using prosody or stress

Training and applying prosody models

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

13 Claims

Specification

Solutions

Use Cases

Quick Links

Training and applying prosody models

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

13 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links