Automatic normalization of spoken syllable duration
First Claim
1. A method to improve communications understandability comprising:
- receiving speech from a speaker;
identifying one or more distinct speech events in the received speech;
representing one or more of the one or more distinct speech events as an adjustable speech production parameter;
detecting a language of the speech;
detecting a native language of the speaker;
utilizing a knowledge base of pronunciation patterns and vocabularies for the language of the speech and the native language to determine an incorrect syllable duration caused by a mispronunciation; and
adjusting at least one of duration and amplitude parameters associated with the mispronunciation to one or more of lengthen, shorten, emphasize or deemphasize the syllable;
using the adjusted at least one of duration and amplitude parameters to regenerate and present modified speech with at least one of corrected syllabic timing and emphasis to a listener, wherein the listener can select via a feedback module to listen to the speech and the modified speech simultaneously;
wherein the receiving, the identifying, the representing, the utilizing, and the adjusting are performed by modules in a normalization system.
14 Assignments
0 Petitions
Accused Products
Abstract
A very common problem is when people speak a language other than the language which they are accustomed, syllables can be spoken for longer or shorter than the listener would regard as appropriate. An example of this can be observed when people who have a heavy Japanese accent speak English. Since Japanese words end with vowels, there is a tendency for native Japanese to add a vowel sound to the end of English words that should end with a consonant. Illustratively, native Japanese speakers often pronounce “orange” as “orenji.” An aspect provides an automatic speech-correcting process that would not necessarily need to know that fruit is being discussed; the system would only need to know that the speaker is accustomed to Japanese, that the listener is accustomed to English, that “orenji” is not a word in English, and that “orenji” is a typical Japanese mispronunciation of the English word “orange.”
-
Citations
14 Claims
-
1. A method to improve communications understandability comprising:
-
receiving speech from a speaker; identifying one or more distinct speech events in the received speech; representing one or more of the one or more distinct speech events as an adjustable speech production parameter; detecting a language of the speech; detecting a native language of the speaker; utilizing a knowledge base of pronunciation patterns and vocabularies for the language of the speech and the native language to determine an incorrect syllable duration caused by a mispronunciation; and adjusting at least one of duration and amplitude parameters associated with the mispronunciation to one or more of lengthen, shorten, emphasize or deemphasize the syllable; using the adjusted at least one of duration and amplitude parameters to regenerate and present modified speech with at least one of corrected syllabic timing and emphasis to a listener, wherein the listener can select via a feedback module to listen to the speech and the modified speech simultaneously; wherein the receiving, the identifying, the representing, the utilizing, and the adjusting are performed by modules in a normalization system. - View Dependent Claims (2, 3, 4, 5, 7)
-
-
6. A system for improving communications understandability comprising:
-
means for receiving speech from a speaker; means for identifying one or more distinct speech events in the received speech; means for representing one or more of the one or more distinct speech events as an adjustable speech production parameter; means for detecting a language of the speech; means for detecting a native language of the speaker; means for utilizing a knowledge base of pronunciation patterns and vocabularies for the language of the speech and the native language to determine an incorrect syllable duration caused by a mispronunciation; and means for adjusting at least one of duration and amplitude parameters associated with the mispronunciation to one or more of lengthen, shorten, emphasize or deemphasize the syllable; means for using the adjusted at least one of duration and amplitude parameters to regenerate and present modified speech with at least one of corrected syllabic timing and emphasis to a listener, wherein the listener can select via a feedback module to listen to the speech and the modified speech simultaneously; wherein the receiving, the identifying, the representing, the utilizing, and the adjusting are performed by modules in a normalization system.
-
-
8. A system that improves communications understandability comprising:
-
an analysis module that receives speech from a speaker; a distinct speech event recognition module cooperating with an encoding and compression module to identify one or more distinct speech events in the received speech, represent one or more of the one or more distinct speech events as an adjustable speech production parameter, detect a language of the speech, and detect a native language of the speaker; and a modification module that utilizes a knowledge base of pronunciation patterns and vocabularies for the language of the speech and the native language to determine an incorrect syllable duration caused by a mispronunciation, adjusts at least one of duration and amplitude parameters associated with the mispronunciation to one or more of lengthen, shorten, emphasize or deemphasize the syllable, and uses the adjusted parameters to regenerate and present modified speech with at least one of corrected syllabic timing and emphasis to a listener, wherein the listener can select via a feedback module to listen to the speech and the modified speech simultaneously. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
Specification