Hybrid, offline/online speech translation system

US 9,430,465 B2
Filed: 06/12/2013
Issued: 08/30/2016
Est. Priority Date: 05/13/2013
Status: Active Grant

First Claim

Patent Images

1. A speech translation system comprising:

a translation server; and

a client device that is configured for communicating with the translation server, wherein the client device comprises;

a microphone;

a processor connected to the microphone;

a memory connected to the processor that stores instructions to be executed by the processor; and

a speaker connected to the processor,wherein;

the client device is for outputting, via the speaker, a translation of verbally input phrases from a first language to a second language; and

the memory stores instructions such that;

the processor determines the second language for verbally input phrases received at the client device from a user of the client device;

the processor receives from the user a translation mode setting for the client device for the translation of the verbally input phrase into the determined second language, the translation mode setting comprising a privacy preference of using the translation server only if a secure wireless network is available;

in response to determining that a secure wireless network is not available, the translation is automatically selected to be performed at the client device, the translation comprising;

translating the verbally input phrases from the first language into the second language; and

outputting, to the user in the second language, a local translation of the verbally input phrases;

in response to determining that a secure wireless network is available, the translation is automatically selected to be performed at the translation server, the translation comprising;

the client device sending, to the translation server, information associated with the input verbally phrases in the first language received by the client device;

the translation server determining a server translation of the verbally input phrases in the second language based on the data received via the wireless network from the client device; and

the translation server transmitting, to the client device, data regarding the server translation of the verbally input phrases in the second language, such that the client device outputs the server translation;

the translation server monitors, over time, speech utterances received by the client device for translation from the first language to the second language;

the translation server determines, based on the monitored speech utterances, vocabulary used by the user; and

the translation server updates, based on the determined vocabulary, at least one of the local acoustic model, the local language model, the local translation model and the local speech synthesis model of the client device, wherein updates to the at least one of the local acoustic model, the local language model, the local translation model and the local speech synthesis model of the client device are transmitted from the translation server to the client device via the wireless network.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A hybrid speech translation system whereby a wireless-enabled client computing device can, in an offline mode, translate input speech utterances from one language to another locally, and also, in an online mode when there is wireless network connectivity, have a remote computer perform the translation and transmit it back to the client computing device via the wireless network for audible outputting by client computing device. The user of the client computing device can transition between modes or the transition can be automatic based on user preferences or settings. The back-end speech translation server system can adapt the various recognition and translation models used by the client computing device in the offline mode based on analysis of user data over time, to thereby configure the client computing device with scaled-down, yet more efficient and faster, models than the back-end speech translation server system, while still be adapted for the user'"'"'s domain.

Citations

15 Claims

1. A speech translation system comprising:
- a translation server; and
  
  a client device that is configured for communicating with the translation server, wherein the client device comprises;
  
  a microphone;
  
  a processor connected to the microphone;
  
  a memory connected to the processor that stores instructions to be executed by the processor; and
  
  a speaker connected to the processor,wherein;
  
  the client device is for outputting, via the speaker, a translation of verbally input phrases from a first language to a second language; and
  
  the memory stores instructions such that;
  
  the processor determines the second language for verbally input phrases received at the client device from a user of the client device;
  
  the processor receives from the user a translation mode setting for the client device for the translation of the verbally input phrase into the determined second language, the translation mode setting comprising a privacy preference of using the translation server only if a secure wireless network is available;
  
  in response to determining that a secure wireless network is not available, the translation is automatically selected to be performed at the client device, the translation comprising;
  
  translating the verbally input phrases from the first language into the second language; and
  
  outputting, to the user in the second language, a local translation of the verbally input phrases;
  
  in response to determining that a secure wireless network is available, the translation is automatically selected to be performed at the translation server, the translation comprising;
  
  the client device sending, to the translation server, information associated with the input verbally phrases in the first language received by the client device;
  
  the translation server determining a server translation of the verbally input phrases in the second language based on the data received via the wireless network from the client device; and
  
  the translation server transmitting, to the client device, data regarding the server translation of the verbally input phrases in the second language, such that the client device outputs the server translation;
  
  the translation server monitors, over time, speech utterances received by the client device for translation from the first language to the second language;
  
  the translation server determines, based on the monitored speech utterances, vocabulary used by the user; and
  
  the translation server updates, based on the determined vocabulary, at least one of the local acoustic model, the local language model, the local translation model and the local speech synthesis model of the client device, wherein updates to the at least one of the local acoustic model, the local language model, the local translation model and the local speech synthesis model of the client device are transmitted from the translation server to the client device via the wireless network.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The speech translation system of claim 1, wherein the client device has a user interface that permits a user to switch between translation at the client device or at the translation server.
  - 3. The speech translation system of claim 1, wherein:
    - the client device stores, in the memory, the local acoustic model, the local language model, the local translation model and the local speech synthesis model for recognizing the speech utterances in the first language and translating the recognized speech utterances to the second language for output via the speaker of the client device;
      
      the translation server comprises a back-end acoustic model, a back-end language model, a back-end translation model and a back-end speech synthesis model for determining the translation to the second language of the speech utterances in the first language based on the data received via a wireless network from the client device;
      
      the local acoustic model is different from the back-end acoustic model;
      
      the local language model is different from the back-end language model;
      
      the local translation model is different from the back-end translation model; and
      
      the local speech synthesis model is different from the back-end speech synthesis model.
  - 4. The speech translation system of claim 3, wherein:
    - the client device comprises a GPS system for determining a location of the client device; and
      
      the translation server is programmed to update at least one of the local acoustic model, the local language model, the local translation model and the local speech synthesis model of the client device based on the location of the client device, wherein updates to the at least one of local acoustic model, the local language model, the local translation model and the local speech synthesis model of the client device are transmitted from the back-end speech translation server system to the client device via the wireless network.
  - 5. The speech translation system of claim 3, wherein:
    - the translation server is one of a plurality of translation servers, and the client device is configured for communicating with the each of the plurality of translation servers via the wireless network;
      
      each of the plurality of translation servers is for determining a translation to the second language of the speech utterances in the first language based on the data received via the wireless network from the client device; and
      
      one of the plurality of translation servers selects one of the translations from the plurality of translation servers for transmitting to the client device.
  - 6. The speech translation system of claim 3, wherein:
    - the translation server is one of a plurality of translation servers, and the client device is configured for communicating with the each of the plurality of translation servers via the wireless network;
      
      each of the plurality of translation servers is for determining a translation to the second language of the speech utterances in the first language based on the data received via the wireless network from the client device; and
      
      one of the plurality of translation servers merges two or more of the translations from the plurality of translation servers to generate a merged translation for transmitting to the client device.

7. A method comprising:
- receiving at a client device, from a user of the client device, a verbally input phrase in a first language;
  
  determining a second language for translation of the verbally input phrase;
  
  receiving from the user a translation mode setting for the client device for the translation of the verbally input phrase into the determined second language, the translation mode setting comprising a privacy preference of using the translation server only if a secure wireless network is available;
  
  in response to determining that the a secure wireless network is not available, automatically selecting to perform the translation at the client device, the translation comprising;
  
  translating, by the client device, the verbally input phrase from the first language into the second language; and
  
  outputting, in the second language, a local translation of the verbally input phrase;
  
  in response to determining that the a secure wireless network is available, automatically selecting to perform the translation at a translation server, the translation comprising;
  
  sending, from the client device to the translation server, information associated with the verbally input phrase;
  
  receiving, at the client device from the translation server, data associated with a sever translation of the verbally input phrase from the first language to the second language; and
  
  outputting, in the second language, the server translation of the verbally input phrase;
  
  monitoring, over time, speech utterances received by the client device for translation from the first language to the second language;
  
  determining, based on the monitored speech utterances, vocabulary used by the user; and
  
  updating, based on the determined vocabulary, at least one of a local acoustic model, a local language model, a local translation model and a local speech synthesis model of the client device, wherein updates to the at least one of the local acoustic model, the local language model, the local translation model and the local speech synthesis model of the client device are transmitted from the translation server to the client device via the wireless network.
- View Dependent Claims (8, 9, 10, 11, 12, 13, 14, 15)
- - 8. The method of claim 7, further comprising downloading, by the client device, application software for a language translation pair that comprises the first and second languages.
  - 9. The method of claim 8, further comprising:
    - determining, by the client device, a location of the client device; and
      
      downloading, by the client device, the application software for the language translation pair based on the determined location of the client device and when suitable connectivity between the client device and the translation server is available via a wireless network.
  - 10. The method of claim 7, wherein determining the second language for translation of the verbally input phrase is in response to receiving user input selecting the second language.
  - 11. The method of claim 7, wherein determining the second language for translation of the verbally input phrase is automatically determined by the client device.
  - 12. The method of claim 11, wherein the second language is automatically determined by the client device based on a location of the user.
  - 13. The method of claim 11, wherein the second language is automatically determined by the client device based on language pairs that have been downloaded to the client device.
  - 14. The method of claim 11, wherein the second language is automatically determined by the client device based on language pairs available.
  - 15. The method of claim 7, wherein the determined vocabulary is domain-specific.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Meta Platforms, Inc. (f/k/a Facebook, Inc.)
Original Assignee
Meta Platforms, Inc. (f/k/a Facebook, Inc.)
Inventors
Fuegen, Christian, Rottman, Kay, Waibel, Naomi Aoki, Waibel, Alexander
Primary Examiner(s)
Sirjani, Fariba

Application Number

US13/915,820
Publication Number

US 20140337007A1
Time in Patent Office

1,175 Days
Field of Search

704 1- 10, 704231-277
US Class Current

1/1
CPC Class Codes

G06F 40/58   Use of machine translation,...

G10L 13/00   Speech synthesis; Text to s...

G10L 13/02   Methods for producing synth...

G10L 15/30   Distributed recognition, e....

Hybrid, offline/online speech translation system

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

Hybrid, offline/online speech translation system

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links