Training and applying prosody models

US 8,856,008 B2
Filed: 09/18/2013
Issued: 10/07/2014
Est. Priority Date: 08/12/2008
Status: Active Grant

First Claim

Patent Images

1. A computer-implementable method for synthesizing audible speech, with varying prosody, from textual content, the method comprising:

generating texts annotated with prosody information generated from audio using a speech recognition engine that performs the annotation during its operation;

training prosody models with lexicons based on first segments of the texts with the prosody information;

maintaining an inventory of the prosody models with lexicons,selecting a subset of multiple prosody models from the inventory of prosody models;

associating prosody models in the subset of multiple prosody models with second segments of a text based on phrases in the text statistically associated with the lexicons of the prosody models;

applying the associated prosody models to one of the second segments of the text to produce prosody annotations for the text;

updating the associated prosody models'"'"' lexicons based on the phrases in the second segments of text;

analyzing annotations of the prosody annotations to reconcile conflicting prosody annotations previously produced by multiple prosody models associated with the second segments of text; and

synthesizing audible speech from the second segments of text and the reconciled prosody annotations.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.

71 Citations

View as Search Results

16 Claims

1. A computer-implementable method for synthesizing audible speech, with varying prosody, from textual content, the method comprising:
- generating texts annotated with prosody information generated from audio using a speech recognition engine that performs the annotation during its operation;
  
  training prosody models with lexicons based on first segments of the texts with the prosody information;
  
  maintaining an inventory of the prosody models with lexicons,selecting a subset of multiple prosody models from the inventory of prosody models;
  
  associating prosody models in the subset of multiple prosody models with second segments of a text based on phrases in the text statistically associated with the lexicons of the prosody models;
  
  applying the associated prosody models to one of the second segments of the text to produce prosody annotations for the text;
  
  updating the associated prosody models'"'"' lexicons based on the phrases in the second segments of text;
  
  analyzing annotations of the prosody annotations to reconcile conflicting prosody annotations previously produced by multiple prosody models associated with the second segments of text; and
  
  synthesizing audible speech from the second segments of text and the reconciled prosody annotations.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, wherein the prosody information comprises directives related to pitch, rate, and volume of the audio as measured by the speech recognition engine.
  - 3. The method of claim 2, wherein the reconciliation of conflicting prosody annotations considers the annotations of the prosody annotations that comprise a prosody model identifier and a prosody model confidence for the prosody annotation.
  - 4. The method of claim 3, wherein the reconciliation eliminates conflicting annotations that result from applications of multiple models to the second segments of text.
  - 5. The method of claim 4, wherein the selecting is based on input parameters that comprise the identity, type, or role of the speakers of the second segments of text.
  - 6. The method of claim 5, wherein the input parameters indicate geographical information related to the speakers of the second segments of text.

7. A computer-implementable method for synthesizing audible speech, with varying prosody, from textual content, the method comprising:
- selecting multiple prosody models for first segments of text based on input parameters;
  
  training the prosody models based on prosody annotations of the first segments of text;
  
  building lexicons of phrases statistically associated with the selection of the prosody models;
  
  analyzing prosody model applicability for second segments of text based on the lexicons and the second segments of text;
  
  applying applicable multiple prosody models to one of the second segments of text;
  
  reconciling conflicts from the application of multiple prosody models to one of the second segments of text to generate reconciled prosody information;
  
  generating audible speech for the one of the second segments of text based on the reconciled prosody information using a text-to-speech synthesis engines.
- View Dependent Claims (8, 9, 10, 11, 12, 13, 14)
- - 8. The method of claim 7, wherein the selecting is based on input parameters that comprise the identity, type, or role of the speaker of the second segments of text.
  - 9. The method of claim 8, wherein the input parameters indicate geographical information related to the speaker of the second segments of text.
  - 10. The method of claim 9, wherein the analyzing is based in part on segments of text for which the prosody models were previously found to be inapplicable.
  - 11. The method of claim 10, wherein the input parameters comprise emotion designators.
  - 12. The method of claim 11, wherein the input parameters further comprise weights to indicate the weights the first segments of text should have in the multiple prosody models.
  - 13. The method of claim 12, wherein the reconciling is based in part on weights or confidences that indicate relative contributions of conflicting prosody information.
  - 14. The method of claim 13, wherein the reconciling is based in part on confidences that indicate relative importance of conflicting prosody information.

15. A computer-implementable method for synthesizing audible speech, with varying prosody, from textual content, the method comprising:
- generating first texts annotated with prosody information, the first texts and prosody information generated by a speech recognition engine applied to speech inputs;
  
  training an inventory of prosody models with the first texts annotated with prosody information, wherein the prosody models are associated with the speech inputs;
  
  selecting a subset of multiple prosody models from the inventory of prosody models;
  
  associating prosody models in the subset of multiple prosody models with different segments of a second text;
  
  applying the associated prosody models to the different segments of the second text to produce prosody annotations for the second text; and
  
  reconciling conflicting prosody annotations from multiple prosody models associated with a segment of the second text.

16. A computer-implementable method for synthesizing audible speech, with varying prosody, from textual content, the method comprising:
- selecting a subset of multiple prosody models from an inventory of prosody models;
  
  associating prosody models in the subset of multiple prosody models with different segments of a text based on phrases in the text statistically associated with lexicons of the prosody models;
  
  applying the associated prosody models to the different segments of the text to produce prosody annotations of the text; and
  
  considering annotations of the prosody annotations to reconcile conflicting prosody annotations of the text previously produced by multiple prosody models associated with a segment of the text; and
  
  synthesizing audible speech from the text and the reconciled prosody annotations.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Morphism LLC
Original Assignee
Morphism LLC
Inventors
Stephens, Jr., James H.
Primary Examiner(s)
GUERRA-ERAZO, EDGAR X

Application Number

US14/030,248
Publication Number

US 20140019138A1
Time in Patent Office

384 Days
Field of Search

704/260, 704/261, 704/258, 704/254, 704/267, 704/266, 704/268, 704/269, 704/235, 704/247, 704/276
US Class Current

704/260
CPC Class Codes

G10L 13/08   Text analysis or generation...

G10L 13/10   Prosody rules derived from ...

G10L 15/063   Training

G10L 15/1807   using prosody or stress

Training and applying prosody models

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

71 Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

Training and applying prosody models

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

71 Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links