SYSTEMS AND METHODS FOR SPEECH-TO-SPEECH TRANSLATION

US 20110238407A1
Filed: 06/02/2011
Published: 09/29/2011
Est. Priority Date: 08/31/2009
Status: Abandoned Application

First Claim

Patent Images

1. A translation system comprising:

a processor;

an audio input device in electrical communication with the processor, the input device configured to receive audio input including an input speech sample of a user in a first language;

an audio output device in electrical communication with the processor, the audio output device configured to output audio including a translation of the input speech sample translated to a second language, wherein the output audio comprises basic sound units in the voice of the user;

a computer-readable storage medium in communication with the processor comprising;

a speech recognition module configured to receive the input speech sample and convert the input speech sample to text in the first language using the probability of receiving a basic sound unit based on a sequence of basic sound units in an N-gram statistical model;

a translation module configured to translate the text in the first language to text in a second language;

a speech synthesis module configured to receive the text in the second language and determine corresponding basic sound units to thereby generate speech in the second language using basic sound units in the unique voice of the user supplemented by basic sound units in a generic voice in the event a basic sound unit in the unique voice of the user is unavailable.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed herein are systems and methods for receiving an input speech sample in a first language and outputting a translated speech sample in a second language in the unique voice of a user. According to several embodiments, a translation system includes a training mode for developing a voice recognition database and a user phonetic dictionary. A speech recognition module uses a voice recognition database to recognize and transcribe the input speech samples in a first language. Subsequently, the text in the first language is translated to text in a second language, and a speech synthesizer develops an output speech in the unique voice of the user utilizing a user phonetic dictionary. The user phonetic dictionary may contain basic sound units, including phones, diphones, triphones, and/or words. Additionally, a translator may employ an N-gram statistical model, Markov Models, and/or smoothing algorithms.

330 Citations

30 Claims

1. A translation system comprising:
- a processor;
  
  an audio input device in electrical communication with the processor, the input device configured to receive audio input including an input speech sample of a user in a first language;
  
  an audio output device in electrical communication with the processor, the audio output device configured to output audio including a translation of the input speech sample translated to a second language, wherein the output audio comprises basic sound units in the voice of the user;
  
  a computer-readable storage medium in communication with the processor comprising;
  
  a speech recognition module configured to receive the input speech sample and convert the input speech sample to text in the first language using the probability of receiving a basic sound unit based on a sequence of basic sound units in an N-gram statistical model;
  
  a translation module configured to translate the text in the first language to text in a second language;
  
  a speech synthesis module configured to receive the text in the second language and determine corresponding basic sound units to thereby generate speech in the second language using basic sound units in the unique voice of the user supplemented by basic sound units in a generic voice in the event a basic sound unit in the unique voice of the user is unavailable.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The translation system of claim 1, wherein the computer-readable storage medium further comprises a user dictionary initialization module configured to:
    - receive an input speech sample of a user speaking into the input device,extract one or more basic sound units from the input speech sample, andstore a recording of the one or more basic sound units in the user phonetic dictionary, the basic sound units spoken in the voice of the user.
  - 3. The translation system of claim 2, wherein the user dictionary initialization module stores a recording of the one or more basic sound units by storing a recording of the extracted basic sound units in a list of sounds and, for each word in the language, storing the basic sound units of the word in the user phonetic dictionary in association with the word.
  - 4. The translation system of claim 2, wherein the user phonetic dictionary contains all the words of a target language.
  - 5. The translation system of claim 1, wherein the basic sound units are selected from the group consisting of phones, diphones, half-syllables, triphones, and words.
  - 6. The translation system of claim 1, wherein the speech recognition module is configured to compare received input speech with a speech recognition template stored within a speech recognition database.
  - 7. The translation system of claim 1, wherein the computer-readable storage medium further comprises an input/output language selection module configured to allow the selection of the first language and the selection of the second language.
  - 8. The translation system of claim 1, wherein the computer-readable storage medium further comprises a training module configured to:
    - request a speech sample from the user, the speech sample derived from a master phonetic dictionary;
      
      receive an input speech sample in a unique voice of the user;
      
      generate a speech recognition template using the input speech sample; and
      
      augment a speech recognition template database with the generated speech recognition template.
  - 9. The translation system of claim 11, wherein the training module is further configured to:
    - extract a basic sound unit in the voice of the user from the input speech sample; and
      
      store in the user phonetic dictionary the extracted basic sound unit in the unique voice.
  - 10. The translation system of claim 1, wherein the N-gram statistical model is a tri-gram statistical model, wherein a basic sound unit of the input speech is recognized based at least partially on two previously received basic sound units.
  - 11. The translation system of claim 1, wherein the N-gram statistical model is a Markov Model.
  - 12. The translation system of claim 1, wherein the N-gram statistical model utilizes a smoothing algorithm to assign non-zero probabilities to basic sound units that would otherwise have zero probability of occurring based on a sequence of sound units.

13. A computer-implemented method for translating speech from a first language to a second language, the method comprising:
- receiving an input speech sample on a computer system via an input device, the input speech sample spoken by a user in a first language;
  
  the computer system recognizing the input speech sample in the first language using the probability of receiving a basic sound unit based on a sequence of basic sound units in an N-gram statistical model;
  
  the computer system converting the input speech sample in the first language to text in the first language;
  
  the computer system translating the text in the first language to text in a second language;
  
  the computer system synthesizing the text in the second language into speech in the second language by determining corresponding basic sound units within a user phonetic dictionary in the unique voice of the user supplemented by basic sound units in a generic voice in the event a basic sound unit in the unique voice of the user is unavailable; and
  
  the computer system generating an output of the speech in the second language at least partially in the unique voice.
- View Dependent Claims (14, 15, 16, 17, 18, 19)
- - 14. The computer-implemented method of claim 13, further comprising the computer system initializing the user phonetic dictionary to contain basic unit sounds spoken in the voice of the user, including:
    - receiving on the computer system an input speech sample of the user speaking into an input device of the computer system,extracting one or more basic sound units from the input speech sample, andstoring the one or more basic sound units in the user phonetic dictionary, the one or more basic sound units spoken in the voice of the user.
  - 15. The computer-implemented method of claim 13, wherein the basic sound units are selected from the group consisting of phones, diphones, triphones, and words.
  - 16. The computer-implemented method of claim 13, wherein recognizing the input speech sample in the first language comprises comparing a received input speech sample with a speech recognition template stored within a speech recognition template database.
  - 17. The computer-implemented method of claim 13, further comprising selecting a first language and selecting a second language.
  - 18. The computer implemented method of claim 16, wherein the speech recognition template database is augmented by:
    - the computer system requesting a speech sample from a pre-loaded speech recognition template;
      
      the computer system receiving an input speech sample in a unique voice;
      
      the computer system using the input speech sample to generate a speech recognition template; and
      
      the computer system augmenting the speech recognition template database with the generated speech recognition template.
  - 19. The computer implemented method of claim 13, wherein generating an output comprises digitally transmitting the speech in the second language in the unique voice.

20. A system comprising:
- an electronic device comprising;
  
  a processor;
  
  an audio input device in electrical communication with the processor configured to receive an input speech sample from a user in a first language;
  
  an audio output device in electrical communication with the processor;
  
  processor-executable instructions in communication with the processor comprising;
  
  a speech recognition module configured to receive an input speech sample from the audio input device and convert the input speech sample to text in the first language using the probability of receiving a basic sound unit based on a sequence of basic sound units in an N-gram statistical model;
  
  a translation module configured to translate the text in the first language to text in a second language;
  
  a speech synthesis module configured to receive the text in the second language and determine corresponding basic sound units to thereby generate speech in the second language using basic sound units in the unique voice of the user supplemented by basic sound units in a generic voice in the event a basic sound unit in the unique voice of the user is unavailable.
- View Dependent Claims (21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
- - 21. The translation system of claim 20, wherein the basic sound units are selected from the group consisting of phones, diphones, half-syllables, triphones, and words.
  - 22. The translation system of claim 20, wherein the N-gram statistical model is a tri-gram statistical model, wherein a basic sound unit of the input speech is recognized based at least partially on two previously received basic sound units.
  - 23. The translation system of claim 20, wherein the N-gram statistical model is a Markov Model.
  - 24. The translation system of claim 20, wherein the N-gram statistical model utilizes a smoothing algorithm to assign non-zero probabilities to basic sound units that would otherwise have zero probability of occurring based on a sequence of sound units.
  - 25. The translation system of claim 20, wherein the electronic device comprises a mobile telephone.
  - 26. The translation system of claim 20, wherein the electronic device comprises a portable audio device.
  - 27. The translation system of claim 20, wherein the electronic device comprises a general purpose computer.
  - 28. The translation system of claim 20, wherein the electronic device is embedded in apparel.
  - 29. The translation system of claim 20, wherein the electronic device comprises a portable video device.
  - 30. The translation system of claim 20, wherein the audio output device is configured to transmit a digital signal corresponding to the speech in the second language.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
O3 Technologies Incorporated
Original Assignee
O3 Technologies Incorporated
Inventors
Kent, Justin R.

Application Number

US13/151,996
Publication Number

US 20110238407A1
Time in Patent Office

Days
Field of Search
US Class Current

704/3
CPC Class Codes

G06F 40/58   Use of machine translation,...

G10L 13/06   Elementary speech units use...

G10L 15/26   Speech to text systems G10L...

SYSTEMS AND METHODS FOR SPEECH-TO-SPEECH TRANSLATION

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

330 Citations

30 Claims

Specification

Solutions

Use Cases

Quick Links

SYSTEMS AND METHODS FOR SPEECH-TO-SPEECH TRANSLATION

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

330 Citations

30 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links