SYSTEMS AND METHODS FOR SPEECH-TO-SPEECH TRANSLATION
First Claim
1. A translation system comprising:
- a processor;
an audio input device in electrical communication with the processor, the input device configured to receive audio input including an input speech sample of a user in a first language;
an audio output device in electrical communication with the processor, the audio output device configured to output audio including a translation of the input speech sample translated to a second language, wherein the output audio comprises basic sound units in the voice of the user;
a computer-readable storage medium in communication with the processor comprising;
a speech recognition module configured to receive the input speech sample and convert the input speech sample to text in the first language using the probability of receiving a basic sound unit based on a sequence of basic sound units in an N-gram statistical model;
a translation module configured to translate the text in the first language to text in a second language;
a speech synthesis module configured to receive the text in the second language and determine corresponding basic sound units to thereby generate speech in the second language using basic sound units in the unique voice of the user supplemented by basic sound units in a generic voice in the event a basic sound unit in the unique voice of the user is unavailable.
1 Assignment
0 Petitions
Accused Products
Abstract
Disclosed herein are systems and methods for receiving an input speech sample in a first language and outputting a translated speech sample in a second language in the unique voice of a user. According to several embodiments, a translation system includes a training mode for developing a voice recognition database and a user phonetic dictionary. A speech recognition module uses a voice recognition database to recognize and transcribe the input speech samples in a first language. Subsequently, the text in the first language is translated to text in a second language, and a speech synthesizer develops an output speech in the unique voice of the user utilizing a user phonetic dictionary. The user phonetic dictionary may contain basic sound units, including phones, diphones, triphones, and/or words. Additionally, a translator may employ an N-gram statistical model, Markov Models, and/or smoothing algorithms.
330 Citations
30 Claims
-
1. A translation system comprising:
-
a processor; an audio input device in electrical communication with the processor, the input device configured to receive audio input including an input speech sample of a user in a first language; an audio output device in electrical communication with the processor, the audio output device configured to output audio including a translation of the input speech sample translated to a second language, wherein the output audio comprises basic sound units in the voice of the user; a computer-readable storage medium in communication with the processor comprising; a speech recognition module configured to receive the input speech sample and convert the input speech sample to text in the first language using the probability of receiving a basic sound unit based on a sequence of basic sound units in an N-gram statistical model; a translation module configured to translate the text in the first language to text in a second language; a speech synthesis module configured to receive the text in the second language and determine corresponding basic sound units to thereby generate speech in the second language using basic sound units in the unique voice of the user supplemented by basic sound units in a generic voice in the event a basic sound unit in the unique voice of the user is unavailable. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A computer-implemented method for translating speech from a first language to a second language, the method comprising:
-
receiving an input speech sample on a computer system via an input device, the input speech sample spoken by a user in a first language; the computer system recognizing the input speech sample in the first language using the probability of receiving a basic sound unit based on a sequence of basic sound units in an N-gram statistical model; the computer system converting the input speech sample in the first language to text in the first language; the computer system translating the text in the first language to text in a second language; the computer system synthesizing the text in the second language into speech in the second language by determining corresponding basic sound units within a user phonetic dictionary in the unique voice of the user supplemented by basic sound units in a generic voice in the event a basic sound unit in the unique voice of the user is unavailable; and the computer system generating an output of the speech in the second language at least partially in the unique voice. - View Dependent Claims (14, 15, 16, 17, 18, 19)
-
-
20. A system comprising:
-
an electronic device comprising; a processor; an audio input device in electrical communication with the processor configured to receive an input speech sample from a user in a first language; an audio output device in electrical communication with the processor; processor-executable instructions in communication with the processor comprising; a speech recognition module configured to receive an input speech sample from the audio input device and convert the input speech sample to text in the first language using the probability of receiving a basic sound unit based on a sequence of basic sound units in an N-gram statistical model; a translation module configured to translate the text in the first language to text in a second language; a speech synthesis module configured to receive the text in the second language and determine corresponding basic sound units to thereby generate speech in the second language using basic sound units in the unique voice of the user supplemented by basic sound units in a generic voice in the event a basic sound unit in the unique voice of the user is unavailable. - View Dependent Claims (21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
-
Specification