Training and Applying Prosody Models

US 20150012277A1
Filed: 09/10/2014
Published: 01/08/2015
Est. Priority Date: 08/12/2008
Status: Active Grant

First Claim

Patent Images

1-14. -14. (canceled)

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.

Citations

27 Claims

1-14. -14. (canceled)

15. A computer-implementable method for synthesizing audible speech, with varying prosody, from textual content, the method comprising:
- building an inventory of prosody models with designated characteristics;
  
  selecting a target prosody model for training based on input parameters and first keywords related to a first text segment of a first text with prosody annotations;
  
  training the target prosody models based on the prosody annotations of the first text segment;
  
  maintaining associations between the first keywords, the designated characteristics, and the input characteristics;
  
  selecting multiple prosody models for application for a second text segment of a second text based on second keywords related to the second text and the associations;
  
  applying the multiple application prosody models to the second text segment;
  
  reconciling conflicts from the application of the multiple application prosody models to generate reconciled prosody information; and
  
  generating audible speech for the second text segment based on the reconciled prosody information using a text-to-speech synthesis engine.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The method of claim 15, wherein the input parameters comprise the identity, type, or role of a speaker of first text segment.
  - 17. The method of claim 16, wherein the input parameters indicate geographical information related to a speaker of the first text segment.
  - 18. The method of claim 17, wherein the input parameters comprise emotion designators.
  - 19. The method of claim 18, wherein the input parameters further comprise weights to indicate the weights the first text should have in the multiple prosody models.
  - 20. The method of claim 18, wherein the reconciling is based in part on weights that indicate relative contributions of conflicting prosody information.

21. A system for synthesizing audible speech, with varying prosody, from textual content, the system operable to:
- build an inventory of prosody models with designated characteristics;
  
  select a target prosody model for training based on input parameters and first keywords related to a first text segment of a first text with prosody annotations;
  
  train the target prosody models based on the prosody annotations of the first text segment;
  
  maintain associations between the first keywords, the designated characteristics, and the input characteristics;
  
  select multiple prosody models for application for a second text segment of a second text based on second keywords related to the second text segment and the associations;
  
  apply the multiple application prosody models to the second text;
  
  reconcile conflicts from the application of the multiple application prosody models to generate reconciled prosody information; and
  
  generate audible speech for the second text segment based on the reconciled prosody information using a text-to-speech synthesis engine.
- View Dependent Claims (22, 23, 24, 25, 26, 27)
- - 22. The system of claim 21, wherein the input parameters comprise the identity, type, or role of a speaker of the first text segment.
  - 23. The system of claim 22, wherein the input parameters indicate geographical information related to a speaker of the second text segment.
  - 24. The system of claim 23, wherein the input parameters comprise emotion designators.
  - 25. The system of claim 24, wherein the reconciliation eliminates conflicting annotations that result from applications of multiple models to the second text.
  - 26. The system of claim 21, the system further operable to build a lexicon of phrases statistically associated with the designated characteristics.
  - 27. The system of claim 26, the system further operable to select multiple prosody models for application for the second text segment based on the lexicon of phrases statistically associated with the designated characteristics.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Morphism LLC
Original Assignee
Morphism LLC
Inventors
Stephens, James H. Jr.

Granted Patent

US 9,070,365 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/260
CPC Class Codes

G10L 13/08   Text analysis or generation...

G10L 13/10   Prosody rules derived from ...

G10L 15/063   Training

G10L 15/1807   using prosody or stress

Training and Applying Prosody Models

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

27 Claims

Specification

Solutions

Use Cases

Quick Links

Training and Applying Prosody Models

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

27 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links