Determining text to speech pronunciation based on an utterance from a user
First Claim
Patent Images
1. A speech-based system comprising:
- at least one storage device that stores;
an input text comprising a plurality of words of a first language;
information indicative of a first pronunciation of a first word of the plurality of words of the first language and information indicative of a first pronunciation of a second word of the plurality of words of the first language, wherein the first pronunciation of the first word and the first pronunciation of the second word both comprise a first type of pronunciation;
information indicative of a second pronunciation of the first word of the plurality of words of the first language and information indicative of a second pronunciation of the second word of the plurality of words of the first language, wherein the second pronunciation of the first word and the second pronunciation of the second word both comprise a second type of pronunciation that is different than the first type of pronunciation;
an automatic speech recognition (ASR) system configured to;
receive at least one utterance from a user, the utterance comprising at least the first word of the plurality of words of the first language; and
determine a type of pronunciation the user used for the first word in the at least one utterance; and
a text to speech (TTS) system configured to generate an audio speech output comprising the at least the second word of the plurality of words of the first language, and to determine a pronunciation of the second word in the audio speech output based, at least in part, on the type of pronunciation the ASR system determined the user used for the first word in the at least one utterance, wherein the second word is different from the first word.
3 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods are provided for automatically building a native phonetic lexicon for a speech-based application trained to process a native (base) language, wherein the native phonetic lexicon includes native phonetic transcriptions (base forms) for non-native (foreign) words which are automatically derived from non-native phonetic transcriptions of the non-native words.
-
Citations
20 Claims
-
1. A speech-based system comprising:
-
at least one storage device that stores; an input text comprising a plurality of words of a first language; information indicative of a first pronunciation of a first word of the plurality of words of the first language and information indicative of a first pronunciation of a second word of the plurality of words of the first language, wherein the first pronunciation of the first word and the first pronunciation of the second word both comprise a first type of pronunciation; information indicative of a second pronunciation of the first word of the plurality of words of the first language and information indicative of a second pronunciation of the second word of the plurality of words of the first language, wherein the second pronunciation of the first word and the second pronunciation of the second word both comprise a second type of pronunciation that is different than the first type of pronunciation; an automatic speech recognition (ASR) system configured to; receive at least one utterance from a user, the utterance comprising at least the first word of the plurality of words of the first language; and determine a type of pronunciation the user used for the first word in the at least one utterance; and a text to speech (TTS) system configured to generate an audio speech output comprising the at least the second word of the plurality of words of the first language, and to determine a pronunciation of the second word in the audio speech output based, at least in part, on the type of pronunciation the ASR system determined the user used for the first word in the at least one utterance, wherein the second word is different from the first word. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method comprising acts, performed by at least one processor, of:
-
storing information indicative of a first pronunciation of a first word of a first language and information indicative of a first pronunciation of a second word of the first language, wherein the first pronunciation of the first word and the first pronunciation of the second word both comprise a first type of pronunciation; storing information indicative of a second pronunciation of the first word of the first language and information indicative of a second pronunciation of the second word of the first language, wherein the second pronunciation of the first word and the second pronunciation of the second word both comprise a second type of pronunciation that is different than the first type of pronunciation; receiving, at an automatic speech recognition (ASR) system, at least one utterance from a user, the utterance comprising at least the first word of the first language; determining a type of pronunciation the user used for the first word in the at least one utterance; and generating, using a text to speech (TTS) system, an audio speech output that comprises at least the second word of the first language and that pronounces at least the second word using an audible pronunciation determined based, at least in part, on the type of pronunciation the user used for the first word in the at least one utterance, wherein the second word is different from the first word. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15)
-
-
16. At least one program storage device having encoded thereon executable program code that, when executed by at least one processor, performs a method comprising acts of:
-
storing information indicative of a first pronunciation of a first word of a first language and information indicative of a first pronunciation of a second word of the first language, wherein the first pronunciation of the first word and the first pronunciation of the second word both comprise a first type of pronunciation; storing information indicative of a second pronunciation of the first word of the first language and information indicative of a second pronunciation of the second word of the first language, wherein the second pronunciation of the first word and the second pronunciation of the second word both comprise a second type of pronunciation that is different than the first type of pronunciation; receiving at least one utterance from a user, the utterance comprising at least the first word of the first language; determining a type of pronunciation the user used for the first word in the at least one utterance; determining a pronunciation of at least one second word of the first language based at least on the type of pronunciation the user used for the at least one first word, wherein the second word is different from the first word; and generating an audio speech output that comprises at least the second word of the first language and that pronounces at least the second word using an audible pronunciation determined based, at least in part, on the type of pronunciation the user used for the first word in the at least one utterance, wherein the second word is different from the first word. - View Dependent Claims (17, 18, 19, 20)
-
Specification