SYSTEM AND METHOD FOR AUTOMATIC DETECTION OF ABNORMAL STRESS PATTERNS IN UNIT SELECTION SYNTHESIS

US 20150170637A1
Filed: 02/23/2015
Published: 06/18/2015
Est. Priority Date: 08/06/2010
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving a stress pattern for both a language and an accent in the language;

detecting, based on the stress pattern, incorrect stress patterns in selected acoustic units representing speech to be synthesized, wherein the selected acoustic units were selected by a separate unit-selection speech synthesizer;

performing an analysis of the incorrect stress patterns, wherein the analysis comprises a word level analysis, a phrase level analysis, and a sentence level analysis on the incorrect stress patterns; and

modifying, via a processor and prior to waveform synthesis, the incorrect stress patterns in the selected acoustic units based on the analysis, to yield corrected stress patterns, wherein the corrected stress patterns conform to the stress pattern for the language.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed herein are systems, methods, and non-transitory computer-readable storage media for detecting and correcting abnormal stress patterns in unit-selection speech synthesis. A system practicing the method detects incorrect stress patterns in selected acoustic units representing speech to be synthesized, and corrects the incorrect stress patterns in the selected acoustic units to yield corrected stress patterns. The system can further synthesize speech based on the corrected stress patterns. In one aspect, the system also classifies the incorrect stress patterns using a machine learning algorithm such as a classification and regression tree, adaptive boosting, support vector machine, and maximum entropy. In this way a text-to-speech unit selection speech synthesizer can produce more natural sounding speech with suitable stress patterns regardless of the stress of units in a unit selection database.

Citations

20 Claims

1. A method comprising:
- receiving a stress pattern for both a language and an accent in the language;
  
  detecting, based on the stress pattern, incorrect stress patterns in selected acoustic units representing speech to be synthesized, wherein the selected acoustic units were selected by a separate unit-selection speech synthesizer;
  
  performing an analysis of the incorrect stress patterns, wherein the analysis comprises a word level analysis, a phrase level analysis, and a sentence level analysis on the incorrect stress patterns; and
  
  modifying, via a processor and prior to waveform synthesis, the incorrect stress patterns in the selected acoustic units based on the analysis, to yield corrected stress patterns, wherein the corrected stress patterns conform to the stress pattern for the language.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, wherein the word level analysis, the phrase level analysis, and the sentence level analysis are performed in parallel.
  - 3. The method of claim 1, wherein the word level analysis, the phrase level analysis, and the sentence level analysis are performed in series.
  - 4. The method of claim 1, wherein the detecting of incorrect stress patterns, the performing of the analysis of the incorrect stress patterns, and the modifying of the incorrect stress patterns is performed on individual words.
  - 5. The method of claim 1, wherein the detecting of incorrect stress patterns, the performing of the analysis of the incorrect stress patterns, and the modifying of the incorrect stress patterns is performed on phrases.
  - 6. The method of claim 1, wherein the detecting of incorrect stress patterns, the performing of the analysis of the incorrect stress patterns, and the modifying of the incorrect stress patterns is performed on sentences.
  - 7. The method of claim 1, further comprising synthesizing speech based on the corrected stress patterns.
  - 8. The method of claim 1, wherein modifying the incorrect stress patterns occurs before waveform synthesis of the selected acoustic units.
  - 9. The method of claim 1, wherein the stress pattern comprises one of lexical stress, sentential stress, primary stress, and secondary stress.

10. A system comprising:
- a processor; and
  
  a computer-readable storage medium having instructions stored which, when executed by the processor, result in the processor performing operations comprising;
  
  receiving a stress pattern for both a language and an accent in the language;
  
  detecting, based on the stress pattern, incorrect stress patterns in selected acoustic units representing speech to be synthesized, wherein the selected acoustic units were selected by a separate unit-selection speech synthesizer;
  
  performing an analysis of the incorrect stress patterns, wherein the analysis comprises a word level analysis, a phrase level analysis, and a sentence level analysis on the incorrect stress patterns; and
  
  modifying, via a processor and prior to waveform synthesis, the incorrect stress patterns in the selected acoustic units based on the analysis, to yield corrected stress patterns, wherein the corrected stress patterns conform to the stress pattern for the language.
- View Dependent Claims (11, 12, 13, 14, 15, 16)
- - 11. The system of claim 10, wherein the detecting of incorrect stress patterns, the performing of the analysis of the incorrect stress patterns, and the modifying of the incorrect stress patterns is performed on individual words.
  - 12. The system of claim 10, wherein the detecting of incorrect stress patterns, the performing of the analysis of the incorrect stress patterns, and the modifying of the incorrect stress patterns is performed on phrases.
  - 13. The system of claim 10, wherein the detecting of incorrect stress patterns, the performing of the analysis of the incorrect stress patterns, and the modifying of the incorrect stress patterns is performed on sentences.
  - 14. The system of claim 10, the computer-readable storage medium having additional instructions stored which, when executed by the processor, result in operations comprising synthesizing speech based on the corrected stress patterns.
  - 15. The system of claim 10, wherein modifying the incorrect stress patterns occurs before waveform synthesis of the selected acoustic units.
  - 16. The system of claim 10, wherein the stress pattern comprises one of lexical stress, sentential stress, primary stress, and secondary stress.

17. A computer-readable storage device having instructions stored which, when executed by a computing device, result in the computing device performing operations comprising:
- receiving a stress pattern for both a language and an accent in the language;
  
  detecting, based on the stress pattern, incorrect stress patterns in selected acoustic units representing speech to be synthesized, wherein the selected acoustic units were selected by a separate unit-selection speech synthesizer;
  
  performing an analysis of the incorrect stress patterns, wherein the analysis comprises a word level analysis, a phrase level analysis, and a sentence level analysis on the incorrect stress patterns; and
  
  modifying, via a processor and prior to waveform synthesis, the incorrect stress patterns in the selected acoustic units based on the analysis, to yield corrected stress patterns, wherein the corrected stress patterns conform to the stress pattern for the language.
- View Dependent Claims (18, 19, 20)
- - 18. The computer-readable storage device of claim 17, having additional instructions stored which, when executed by the computing device, result in operations comprising synthesizing speech based on the corrected stress patterns.
  - 19. The computer-readable storage device of claim 17, wherein modifying the incorrect stress patterns occurs before waveform synthesis of the selected acoustic units.
  - 20. The computer-readable storage device of claim 17, wherein the stress pattern comprises one of lexical stress, sentential stress, primary stress, and secondary stress.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cerence Operating Company (Cerence Inc.)
Original Assignee
AT&T Intellectual Property I LP (AT&T, Inc.)
Inventors
KIM, Yeon-Jun, BEUTNAGEL, Mark Charles, CONKIE, Alistair D., Syrdal, Ann K.

Granted Patent

US 9,269,348 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G10L 13/00   Speech synthesis; Text to s...

G10L 13/027   Concept to speech synthesis...

G10L 13/033   Voice editing, e.g. manipul...

G10L 13/10   Prosody rules derived from ...

G10L 15/1807   using prosody or stress

G10L 25/00   Speech or voice analysis te...

SYSTEM AND METHOD FOR AUTOMATIC DETECTION OF ABNORMAL STRESS PATTERNS IN UNIT SELECTION SYNTHESIS

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

SYSTEM AND METHOD FOR AUTOMATIC DETECTION OF ABNORMAL STRESS PATTERNS IN UNIT SELECTION SYNTHESIS

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links