System and method for automatic detection of abnormal stress patterns in unit selection synthesis

US 9,978,360 B2
Filed: 02/22/2016
Issued: 05/22/2018
Est. Priority Date: 08/06/2010
Status: Expired due to Fees

First Claim

Patent Images

1. A method comprising:

detecting, via a machine learning algorithm modeling human perception and trained with acoustic parameters from each syllable in a word, incorrect stress patterns in selected acoustic units representing speech to be synthesized, wherein the selected acoustic units comprise phonemes and come from a database of energy-normalized acoustic units that are normalized on a sentence basis;

performing a word level analysis of the incorrect stress patterns, a phrase level analysis of the incorrect stress patterns and a sentence level analysis of the incorrect stress patterns to yield analyses, wherein the analyses are performed in series; and

modifying, via a processor and prior to waveform synthesis, the incorrect stress patterns in the selected acoustic units according to the analyses, to yield corrected stress patterns.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed herein are systems, methods, and non-transitory computer-readable storage media for detecting and correcting abnormal stress patterns in unit-selection speech synthesis. A system practicing the method detects incorrect stress patterns in selected acoustic units representing speech to be synthesized, and corrects the incorrect stress patterns in the selected acoustic units to yield corrected stress patterns. The system can further synthesize speech based on the corrected stress patterns. In one aspect, the system also classifies the incorrect stress patterns using a machine learning algorithm such as a classification and regression tree, adaptive boosting, support vector machine, and maximum entropy. In this way a text-to-speech unit selection speech synthesizer can produce more natural sounding speech with suitable stress patterns regardless of the stress of units in a unit selection database.

51 Citations

20 Claims

1. A method comprising:
- detecting, via a machine learning algorithm modeling human perception and trained with acoustic parameters from each syllable in a word, incorrect stress patterns in selected acoustic units representing speech to be synthesized, wherein the selected acoustic units comprise phonemes and come from a database of energy-normalized acoustic units that are normalized on a sentence basis;
  
  performing a word level analysis of the incorrect stress patterns, a phrase level analysis of the incorrect stress patterns and a sentence level analysis of the incorrect stress patterns to yield analyses, wherein the analyses are performed in series; and
  
  modifying, via a processor and prior to waveform synthesis, the incorrect stress patterns in the selected acoustic units according to the analyses, to yield corrected stress patterns.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, wherein detecting incorrect stress patterns is performed according to a stress pattern for a language.
  - 3. The method of claim 2, wherein the stress pattern comprises one of lexical stress, sentential stress, primary stress, and secondary stress.
  - 4. The method of claim 1, further comprising receiving a stress pattern for both a language and an accent in the language, wherein the detecting of the incorrect stress patterns is performed based on the stress pattern.
  - 5. The method of claim 1, wherein the detecting of incorrect stress patterns, the performing of the analysis of the incorrect stress patterns, and the modifying of the incorrect stress patterns are performed on individual words.
  - 6. The method of claim 1, wherein the detecting of incorrect stress patterns, the performing of the analysis of the incorrect stress patterns, and the modifying of the incorrect stress patterns are performed on one of:
    - phrases or sentences.
  - 7. The method of claim 1, wherein the corrected stress patterns conform to a stress pattern for a language.
  - 8. The method of claim 1, further comprising synthesizing speech according to the corrected stress patterns.

9. A system comprising:
- a processor; and
  
  a computer-readable storage medium having instructions stored which, when executed by the processor, result in the processor performing operations comprising;
  
  detecting, via a machine learning algorithm modeling human perception and trained with acoustic parameters from each syllable in a word, incorrect stress patterns in selected acoustic units representing speech to be synthesized, wherein the selected acoustic units comprise phonemes and come from a database of energy-normalized acoustic units that are normalized on a sentence basis;
  
  performing a word level analysis of the incorrect stress patterns, a phrase level analysis of the incorrect stress patterns, and a sentence level analysis of the incorrect stress patterns to yield analyses, wherein the analyses are performed in series; and
  
  modifying, via the processor and prior to waveform synthesis, the incorrect stress patterns in the selected acoustic units according to the analyses, to yield corrected stress patterns.
- View Dependent Claims (10, 11, 12, 13, 14, 15)
- - 10. The system of claim 9, further comprising receiving a stress pattern for both a language and an accent in the language, wherein the detecting of the incorrect stress patterns is performed based on the stress pattern.
  - 11. The system of claim 9, wherein the detecting of incorrect stress patterns, the performing of the analysis of the incorrect stress patterns, and the modifying of the incorrect stress patterns are performed on:
    - individual words, phrases or sentences.
  - 12. The system of claim 9, the computer-readable storage medium having additional instructions stored which, when executed by the processor, result in further operations comprising synthesizing speech according to the corrected stress patterns.
  - 13. The system of claim 9, wherein the corrected stress patterns conform to a stress pattern for a language.
  - 14. The system of claim 9, wherein detecting incorrect stress patterns is according to a stress pattern.
  - 15. The system of claim 14, wherein the stress pattern comprises one of lexical stress, sentential stress, primary stress, and secondary stress.

16. A computer-readable storage device having instructions stored which, when executed by a processor, result in the processor performing operations comprising:
- detecting, via a machine learning algorithm modeling human perception and trained with acoustic parameters from each syllable in a word, incorrect stress patterns in selected acoustic units representing speech to be synthesized, wherein the selected acoustic units comprise phonemes and come from a database of energy-normalized acoustic units that are normalized on a sentence basis;
  
  performing a word level analysis of the incorrect stress patterns, a phrase level analysis of the incorrect stress patterns, and a sentence level analysis of the incorrect stress patterns to yield analyses, wherein the analyses are performed in series; and
  
  modifying, via the processor and prior to waveform synthesis, the incorrect stress patterns in the selected acoustic units according to the analyses, to yield corrected stress patterns.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The computer-readable storage device of claim 16, wherein detecting incorrect stress patterns is according to a stress pattern.
  - 18. The computer-readable storage device of claim 16, wherein the computer-readable storage device stores additional instructions which, when executed by the processor, cause the processor to perform further operations comprising receiving a stress pattern for both a language and an accent in the language, wherein the detecting of the incorrect stress patterns is performed based on the stress pattern.
  - 19. The computer-readable storage device of claim 16, wherein the detecting of incorrect stress patterns, the performing of the analysis of the incorrect stress patterns, and the modifying of the incorrect stress patterns are performed on:
    - individual words, phrases or sentences.
  - 20. The computer-readable storage device of claim 16, wherein the corrected stress patterns conform to a stress pattern for a language.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cerence Operating Company (Cerence Inc.)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Kim, Yeon-Jun, Beutnagel, Mark Charles, Conkie, Alistair D., Syrdal, Ann K.
Primary Examiner(s)
Adesanya, Olujimi

Application Number

US15/049,579
Publication Number

US 20160171970A1
Time in Patent Office

820 Days
Field of Search

704258, 704260
US Class Current
CPC Class Codes

G10L 13/00   Speech synthesis; Text to s...

G10L 13/027   Concept to speech synthesis...

G10L 13/033   Voice editing, e.g. manipul...

G10L 13/10   Prosody rules derived from ...

G10L 15/1807   using prosody or stress

G10L 25/00   Speech or voice analysis te...

System and method for automatic detection of abnormal stress patterns in unit selection synthesis

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

51 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for automatic detection of abnormal stress patterns in unit selection synthesis

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

51 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links