SYSTEM AND METHOD FOR SPEECH-TO-SPEECH TRANSLATION

US 20100057435A1
Filed: 08/31/2009
Published: 03/04/2010
Est. Priority Date: 08/29/2008
Status: Abandoned Application

First Claim

Patent Images

1. A speech-to-speech translation system comprising:

a processor;

an audio input device in electrical communication with the processor, the input device configured to receive audio input including an input speech sample of a user in a first language;

an audio output device in electrical communication with the processor, the audio output device configured to output audio including a translation of the input speech sample translated to a second language, wherein the output audio comprises basic sound units in the voice of the user; and

a computer-readable storage medium comprising;

a voice recognition module configured to receive the input speech sample and convert the input speech sample to text in the first language;

a translation module configured to translate the text in the first language to text in a second language; and

a speech synthesis module configured to receive the text in the second language and determine corresponding basic sound units in the voice of the user contained within a user phonetic dictionary to thereby generate speech in the second language in the unique voice of the user.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed herein are systems and methods for receiving an input speech sample in a first language and outputting a translated speech sample in a second language in the unique voice of a user. According to several embodiments, a translation system includes a translation mode performing the above functions and a training mode for developing a voice recognition database and a user phonetic dictionary. A speech recognition module uses a voice recognition database to recognize and transcribe the input speech samples in a first language. The text in the first language is translated to text in a second language, and a speech synthesizer develops an output speech in the unique voice of the user utilizing a user phonetic dictionary. The user phonetic dictionary may contain basic sound units, including phones, diphones, triphones, and/or words.

195 Citations

23 Claims

1. A speech-to-speech translation system comprising:
- a processor;
  
  an audio input device in electrical communication with the processor, the input device configured to receive audio input including an input speech sample of a user in a first language;
  
  an audio output device in electrical communication with the processor, the audio output device configured to output audio including a translation of the input speech sample translated to a second language, wherein the output audio comprises basic sound units in the voice of the user; and
  
  a computer-readable storage medium comprising;
  
  a voice recognition module configured to receive the input speech sample and convert the input speech sample to text in the first language;
  
  a translation module configured to translate the text in the first language to text in a second language; and
  
  a speech synthesis module configured to receive the text in the second language and determine corresponding basic sound units in the voice of the user contained within a user phonetic dictionary to thereby generate speech in the second language in the unique voice of the user.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. The translation system of claim 1, wherein the computer-readable storage medium further comprises a user dictionary initialization module configured to:
    - receive an input speech sample of a user speaking into the input device, extract one or more basic sound units from the input speech sample, and store a recording of the one or more basic sound units in the user phonetic dictionary, the basic sound units spoken in the voice of the user.
  - 3. The translation system of claim 2, wherein the user dictionary initialization module stores a recording of the one or more basic sound units by storing a recording of the extracted basic sound units in a list of sounds and, for each word in the language, storing the basic sound units of the word in the user phonetic dictionary in association with the word.
  - 4. The translation system of claim 3, wherein the basic sound units for each word are stored in the user phonetic dictionary in order by which the basic sounds are pronounced to speak the word.
  - 5. The translation system of claim 4, wherein the order of the words is obtained from a master phonetic dictionary.
  - 6. The translation system of claim 2, wherein the user phonetic dictionary contains all the words of a target language.
  - 7. The translation system of claim 1, wherein the basic sound units are selected from the group consisting of phones, diphones, half-syllables, triphones, and words.
  - 8. The translation system of claim 1, wherein the speech recognition module is configured to compare received input speech with a speech recognition template stored within a speech recognition database.
  - 9. The translation system of claim 1, wherein speech recognition module further comprises a voice recognition module configured to recognize the user'"'"'s unique voice from the input speech sample.
  - 10. The translation system of claim 1, wherein the computer-readable storage medium further comprises an input/output language selection module configured to allow the selection of the first language and the selection of the second language.
  - 11. The translation system of claim 1, wherein the computer-readable storage medium further comprises a training module configured to:
    - request a speech sample from the user, the speech sample derived from a master phonetic dictionary;
      
      receive an input speech sample in a unique voice of the user;
      
      generate a speech recognition template using the input speech sample; and
      
      augment a speech recognition template database with the generated speech recognition template.
  - 12. The translation system of claim 11, wherein the training module is further configured to:
    - extract a basic sound unit in the voice of the user from the input speech sample; and
      
      store in the user phonetic dictionary the extracted basic sound unit in the unique voice.
  - 13. The translation system of claim 1, wherein the audio input device comprises a microphone.
  - 14. The translation system of claim 1, wherein the audio output device comprises one or more speakers.

15. A computer-implemented method for translating speech from a first language to a second language, the method comprising:
- receiving an input speech sample on a computer system via an input device, the input speech sample spoken by a user in a first language;
  
  the computer system recognizing the input speech sample in the first language;
  
  the computer system converting the input speech sample in the first language to text in the first language;
  
  the computer system translating the text in the first language to text in a second language;
  
  the computer system synthesizing the text in the second language into speech in the second language by determining corresponding basic sound units within a user phonetic dictionary; and
  
  the computer system generating an output of the speech in the second language in the unique voice.
- View Dependent Claims (16, 17, 18, 19, 20, 21)
- - 16. The computer-implemented method of claim 15, further comprising the computer system initializing the user phonetic dictionary to contain basic unit sounds spoken in the voice of the user, including:
    - receiving on the computer system an input speech sample of the user speaking into an input device of the computer system,extracting one or more basic sound units from the input speech sample, andstoring the one or more basic sound units in the user phonetic dictionary, the one or more basic sound units spoken in the voice of the user.
  - 17. The computer-implemented method of claim 15, wherein the basic sound units are selected from the group consisting of phones, diphones, triphones, and words.
  - 18. The computer-implemented method of claim 15, wherein recognizing the input speech sample in the first language comprises comparing a received input speech sample with a speech recognition template stored within a speech recognition template database.
  - 19. The computer-implemented method of claim 18, wherein recognizing the input speech sample further comprises recognizing the user'"'"'s unique voice from the input speech sample.
  - 20. The computer-implemented method of claim 15, further comprising the steps of selecting a first language and selecting a second language.
  - 21. The computer implemented method of claim 20, wherein the voice recognition template database is augmented through steps comprising:
    - the computer system requesting a speech sample from a pre-loaded voice recognition template;
      
      the computer system receiving an input speech sample in a unique voice;
      
      the computer system using the input speech sample to generate a voice recognition template; and
      
      the computer system augmenting the voice recognition template database with the generated voice recognition template.

22. A system for translating speech from a first language to a second language, comprising:
- means to receive a input speech in a first language in a unique voice;
  
  means to convert the input speech in the first language to text in the first language;
  
  means to translate the text in the first language to text in a second language;
  
  means to synthesize the text in the second language into speech in the second language by determining corresponding basic sound units within a user phonetic dictionary;
  
  means to output the speech in the second language in the unique voice.
- View Dependent Claims (23)
- - 23. The system of claim 22, further comprising means to augment the voice recognition template database comprising:
    - means to request a speech sample from the user, the speech sample derived from a master phonetic dictionary;
      
      means to receive an input speech sample in a unique voice of the user;
      
      means to generate a speech recognition template using the input speech sample; and
      
      means to augment the speech recognition database with the generated speech recognition template.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
O3 Technologies Incorporated
Original Assignee
O3 Technologies Incorporated
Inventors
Harris, Cyril Edward Roger III, Kent, Justin R.

Application Number

US12/551,371
Publication Number

US 20100057435A1
Time in Patent Office

Days
Field of Search
US Class Current

704/3
CPC Class Codes

G06F 40/58   Use of machine translation,...

G10L 13/06   Elementary speech units use...

G10L 15/26   Speech to text systems G10L...

SYSTEM AND METHOD FOR SPEECH-TO-SPEECH TRANSLATION

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

195 Citations

23 Claims

Specification

Solutions

Use Cases

Quick Links

SYSTEM AND METHOD FOR SPEECH-TO-SPEECH TRANSLATION

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

195 Citations

23 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links