Automatic normalization of spoken syllable duration

US 8,401,856 B2
Filed: 05/17/2010
Issued: 03/19/2013
Est. Priority Date: 05/17/2010
Status: Active Grant

First Claim

Patent Images

1. A method to improve communications understandability comprising:

receiving speech from a speaker;

identifying one or more distinct speech events in the received speech;

representing one or more of the one or more distinct speech events as an adjustable speech production parameter;

detecting a language of the speech;

detecting a native language of the speaker;

utilizing a knowledge base of pronunciation patterns and vocabularies for the language of the speech and the native language to determine an incorrect syllable duration caused by a mispronunciation; and

adjusting at least one of duration and amplitude parameters associated with the mispronunciation to one or more of lengthen, shorten, emphasize or deemphasize the syllable;

using the adjusted at least one of duration and amplitude parameters to regenerate and present modified speech with at least one of corrected syllabic timing and emphasis to a listener, wherein the listener can select via a feedback module to listen to the speech and the modified speech simultaneously;

wherein the receiving, the identifying, the representing, the utilizing, and the adjusting are performed by modules in a normalization system.

View all claims

14 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A very common problem is when people speak a language other than the language which they are accustomed, syllables can be spoken for longer or shorter than the listener would regard as appropriate. An example of this can be observed when people who have a heavy Japanese accent speak English. Since Japanese words end with vowels, there is a tendency for native Japanese to add a vowel sound to the end of English words that should end with a consonant. Illustratively, native Japanese speakers often pronounce “orange” as “orenji.” An aspect provides an automatic speech-correcting process that would not necessarily need to know that fruit is being discussed; the system would only need to know that the speaker is accustomed to Japanese, that the listener is accustomed to English, that “orenji” is not a word in English, and that “orenji” is a typical Japanese mispronunciation of the English word “orange.”

Citations

14 Claims

1. A method to improve communications understandability comprising:
- receiving speech from a speaker;
  
  identifying one or more distinct speech events in the received speech;
  
  representing one or more of the one or more distinct speech events as an adjustable speech production parameter;
  
  detecting a language of the speech;
  
  detecting a native language of the speaker;
  
  utilizing a knowledge base of pronunciation patterns and vocabularies for the language of the speech and the native language to determine an incorrect syllable duration caused by a mispronunciation; and
  
  adjusting at least one of duration and amplitude parameters associated with the mispronunciation to one or more of lengthen, shorten, emphasize or deemphasize the syllable;
  
  using the adjusted at least one of duration and amplitude parameters to regenerate and present modified speech with at least one of corrected syllabic timing and emphasis to a listener, wherein the listener can select via a feedback module to listen to the speech and the modified speech simultaneously;
  
  wherein the receiving, the identifying, the representing, the utilizing, and the adjusting are performed by modules in a normalization system.
- View Dependent Claims (2, 3, 4, 5, 7)
- - 2. The method of claim 1, further comprising using modified speech product parameters to regenerate and present speech with a corrected syllabic timing to one or more listeners.
  - 3. The method of claim 1, further comprising determining whether an utterance is a legitimate word.
  - 4. The method of claim 1, further comprising determining if an utterance is a common mispronunciation.
  - 5. The method of claim 1, further comprising providing feedback to a speaker.
  - 7. A non-transitory computer readable information storage media having stored thereon instructions that, if executed by a processor, cause to be performed the method of claim 1.

6. A system for improving communications understandability comprising:
- means for receiving speech from a speaker;
  
  means for identifying one or more distinct speech events in the received speech;
  
  means for representing one or more of the one or more distinct speech events as an adjustable speech production parameter;
  
  means for detecting a language of the speech;
  
  means for detecting a native language of the speaker;
  
  means for utilizing a knowledge base of pronunciation patterns and vocabularies for the language of the speech and the native language to determine an incorrect syllable duration caused by a mispronunciation; and
  
  means for adjusting at least one of duration and amplitude parameters associated with the mispronunciation to one or more of lengthen, shorten, emphasize or deemphasize the syllable;
  
  means for using the adjusted at least one of duration and amplitude parameters to regenerate and present modified speech with at least one of corrected syllabic timing and emphasis to a listener, wherein the listener can select via a feedback module to listen to the speech and the modified speech simultaneously;
  
  wherein the receiving, the identifying, the representing, the utilizing, and the adjusting are performed by modules in a normalization system.

8. A system that improves communications understandability comprising:
- an analysis module that receives speech from a speaker;
  
  a distinct speech event recognition module cooperating with an encoding and compression module to identify one or more distinct speech events in the received speech, represent one or more of the one or more distinct speech events as an adjustable speech production parameter, detect a language of the speech, and detect a native language of the speaker; and
  
  a modification module that utilizes a knowledge base of pronunciation patterns and vocabularies for the language of the speech and the native language to determine an incorrect syllable duration caused by a mispronunciation, adjusts at least one of duration and amplitude parameters associated with the mispronunciation to one or more of lengthen, shorten, emphasize or deemphasize the syllable, and uses the adjusted parameters to regenerate and present modified speech with at least one of corrected syllabic timing and emphasis to a listener, wherein the listener can select via a feedback module to listen to the speech and the modified speech simultaneously.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The system of claim 8, wherein modified speech product parameters are used to regenerate and present speech with a corrected syllabic timing to one or more listeners.
  - 10. The system of claim 8, further comprising a processor that determines whether an utterance is a legitimate word.
  - 11. The system of claim 8, wherein the analysis module further determines if an utterance is a common mispronunciation.
  - 12. The system of claim 8, wherein the feedback module provides feedback to a speaker.
  - 13. The system of claim 8, wherein a participant can select via the feedback module one or more of a modified and unmodified stream to listen to.
  - 14. The system of claim 8, wherein a further determination is made as to whether a modified word is inappropriate.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Samsung Electronics Co. Ltd.
Original Assignee
Avaya Incorporated
Inventors
Jennings, Terry, Michaelis, Paul Roller
Primary Examiner(s)
PULLIAS, JESSE SCOTT

Application Number

US12/781,162
Publication Number

US 20110282650A1
Time in Patent Office

1,037 Days
Field of Search

704/258, 704/262, 704/267, 704/271
US Class Current

704/267
CPC Class Codes

G10L 15/005   Language recognition

G10L 15/08   Speech classification or se...

G10L 15/1807   using prosody or stress

G10L 15/187   Phonemic context, e.g. pron...

G10L 19/04   using predictive techniques

G10L 19/12   the excitation function bei...

G10L 21/04   Time compression or expansion

G10L 25/00   Speech or voice analysis te...

Automatic normalization of spoken syllable duration

First Claim

14 Assignments

0 Petitions

Accused Products

Abstract

Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

Automatic normalization of spoken syllable duration

First Claim

14 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links