Encoding and adaptive, scalable accessing of distributed models

US 10,089,304 B2
Filed: 04/06/2017
Issued: 10/02/2018
Est. Priority Date: 02/17/2006
Status: Active Grant

First Claim

Patent Images

1. A method for translating a text, comprising:

receiving the text in a source language;

partitioning the text into a plurality of segments;

obtaining, for each segment, one or more candidate translations in a target language;

for each of a plurality of possible n grams in each candidate translation;

identifying a respective partition of a language model containing the n gram, wherein each partition includes a subset of all n grams in the target language and statistical data for the same subset of n grams, each n gram being a sequence of n tokens in the target language, wherein n is a positive integer, and wherein each partition is maintained by a different server of a plurality of servers;

sending a lookup request to the server maintaining the respective partition containing the n-gram;

obtaining, from the server maintaining the respective partition containing the n-gram, statistical data for the n gram; and

determining, for each segment of the text, a best candidate translation of the one or more candidate translations based on the obtained statistical data.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems, methods, and apparatus for accessing distributed models in automated machine processing, including using large language models in machine translation, speech recognition and other applications.

92 Citations

View as Search Results

20 Claims

1. A method for translating a text, comprising:
- receiving the text in a source language;
  
  partitioning the text into a plurality of segments;
  
  obtaining, for each segment, one or more candidate translations in a target language;
  
  for each of a plurality of possible n grams in each candidate translation;
  
  identifying a respective partition of a language model containing the n gram, wherein each partition includes a subset of all n grams in the target language and statistical data for the same subset of n grams, each n gram being a sequence of n tokens in the target language, wherein n is a positive integer, and wherein each partition is maintained by a different server of a plurality of servers;
  
  sending a lookup request to the server maintaining the respective partition containing the n-gram;
  
  obtaining, from the server maintaining the respective partition containing the n-gram, statistical data for the n gram; and
  
  determining, for each segment of the text, a best candidate translation of the one or more candidate translations based on the obtained statistical data.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, wherein the n-grams in a respective subset each have a plurality of common tokens at predetermined positions.
  - 3. The method of claim 2, wherein n is greater than or equal to 3.
  - 4. The method of claim 2, wherein the plurality of common tokens are the last two tokens in a sequence of tokens representing an n gram.
  - 5. The method of claim 2, wherein the statistical data for a respective n gram comprises a relative frequency of occurrence of the respective n gram in the target language.
  - 6. The method of claim 2, further comprising:
    - in response to obtaining statistical data for a respective n gram from a server, storing the statistical data in a language model cache.
  - 7. The method of claim 2, wherein obtaining, for each segment, one or more candidate translations in the target language, comprises:
    - evaluating each segment against a translation model.
  - 8. The method of claim 7, wherein the translation model comprises mapping information between the source language and the target language and scoring information associated with each mapping, the mapping information comprising a relation between (i) one or more tokens in the source language and (ii) one or more tokens in the target language.
  - 9. The method of claim 8, wherein the translation model is stored on a plurality of translation model servers, each storing and operable to serve different partitions of the translation model.

10. A system comprising:
- a plurality of servers, wherein each server is configured to store a partition of a language model of a target language, wherein each respective partition of the language model includes a subset of all n grams in the target language and statistical data for the same subset of n grams, each n gram being a sequence of n tokens in the target language, and wherein n is a positive integer; and
  
  one or more processors configured to perform operations comprising;
  
  receiving a text in a source language;
  
  partitioning the text into a plurality of segments;
  
  obtaining, for each segment, one or more candidate translations in the target language;
  
  for each of a plurality of possible n grams in each candidate translation;
  
  identifying the respective partition of the language model containing the n gram;
  
  sending a lookup request to the server maintaining the respective partition containing the n-gram;
  
  obtaining, from the server maintaining the respective partition containing the n-gram, statistical data for the n gram; and
  
  determining, for each segment of the text, a best candidate translation of the one or more candidate translations based on the obtained statistical data.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
- - 11. The system of claim 10, wherein the n-grams in a respective subset each have a plurality of common tokens at predetermined positions.
  - 12. The system of claim 11, wherein n is greater than or equal to 3.
  - 13. The system of claim 11, wherein the plurality of common tokens are the last two tokens in a sequence of tokens representing an n gram.
  - 14. The system of claim 11, wherein the statistical data for a respective n gram comprises a relative frequency of occurrence of the respective n gram in the target language.
  - 15. The system of claim 11, further comprising:
    - a language model cache configured to store statistical data for a respective n gram obtained from a server.
  - 16. The system of claim 11, further comprising:
    - a plurality of translation model servers, wherein each translation model server is configured to store a different partition of a translation model, and wherein the translation model servers are configured to perform operations comprising;
      
      evaluating each segment of the text against the translation model; and
      
      providing candidate translations for each segment.
  - 17. The system of claim 16, wherein the translation model comprises mapping information between the source language and the target language and scoring information associated with each mapping, the mapping information comprising a relation between (i) one or more tokens in the source language and (ii) one or more tokens in the target language.

18. One or more computer storage media encoded with instructions that, when executed by one or more computers, cause the one or more computers to perform operations comprising:
- receiving a text in a source language;
  
  partitioning the text into a plurality of segments;
  
  obtaining, for each segment, one or more candidate translations in a target language;
  
  for each of a plurality of possible n grams in each candidate translation;
  
  identifying a respective partition of the language model containing the n gram, wherein each respective partition of the language model includes a subset of all n grams in the target language and statistical data for the same subset of n grams, each n gram being a sequence of n tokens in the target language, wherein n is a positive integer, and wherein each partition is maintained by a different server of a plurality of servers;
  
  sending a lookup request to the server maintaining the respective partition containing the n-gram;
  
  obtaining, from the server maintaining the respective partition containing the n-gram, statistical data for the n gram; and
  
  determining, for each segment of the text, a best candidate translation of the one or more candidate translations based on the obtained statistical data.
- View Dependent Claims (19, 20)
- - 19. The computer storage media of claim 18, wherein the n-grams in a subset each have a plurality of common tokens at predetermined positions.
  - 20. The computer storage media of claim 19, wherein n is greater than or equal to 3.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google LLC (Alphabet Inc.)
Inventors
Och, Franz Josef, Dean, Jeffrey, Brants, Thorsten, Franz, Alexander Mark, Ponte, Jay, Xu, Peng, Teh, Sha-Mayn, Chin, Jeffrey, Thayer, Ignacio E., Carver, Anton, Rosart, Daniel, Hawkins, John S., Driesen, Karel
Primary Examiner(s)
Spooner, Lamont

Application Number

US15/480,722
Publication Number

US 20170212887A1
Time in Patent Office

544 Days
Field of Search

704 1- 10, 707706-708, 715264, 709201, 709203, 709243
US Class Current
CPC Class Codes

G06F 40/44   Statistical methods, e.g. p...

G06F 40/47   Machine-assisted translatio...

G06F 40/49   using very large corpora, e...

G06F 40/58   Use of machine translation,...

Encoding and adaptive, scalable accessing of distributed models

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

92 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

Encoding and adaptive, scalable accessing of distributed models

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

92 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others