Universal translation

US 10,346,537 B2
Filed: 08/09/2017
Issued: 07/09/2019
Est. Priority Date: 09/22/2015
Status: Active Grant

First Claim

Patent Images

1. A method for identifying a most likely source language of a snippet, the method comprising:

receiving an indication of the snippet, wherein the snippet is a digital representation of words or character groups;

determining two or more possible source languages for the snippet;

generating, by one or more machine translation engines, two or more translations of the snippet, each translation of the snippet corresponding to one source language in the two or more possible source languages;

computing, by one or more translation scoring models trained using one or more neural networks, accuracy scores for at least two of the generated two or more translations of the snippet;

based on one or more of the computed accuracy scores, producing a confidence factor for each of at least two selected possible source languages, of the two or more possible source languages, for the snippet; and

selecting, as the most likely source language, the possible source language for the snippet that is associated with a highest confidence factor.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A likely source language of a media item can be identified by attempting an initial language identification of the media item based on intrinsic or extrinsic factors, such as words in the media item and languages known by the media item author. This initial identification can generate a list of most likely source languages with corresponding likelihood factors. Translations can then be performed presuming each of the most likely source languages. The translations can be performed for multiple output languages. Each resulting translation can receive a corresponding score based on a number of factors. The scores can be combined where they have a common source language. These combined scores can be used to weight the previously identified likelihood factors for the source languages of the media item.

Citations

20 Claims

1. A method for identifying a most likely source language of a snippet, the method comprising:
- receiving an indication of the snippet, wherein the snippet is a digital representation of words or character groups;
  
  determining two or more possible source languages for the snippet;
  
  generating, by one or more machine translation engines, two or more translations of the snippet, each translation of the snippet corresponding to one source language in the two or more possible source languages;
  
  computing, by one or more translation scoring models trained using one or more neural networks, accuracy scores for at least two of the generated two or more translations of the snippet;
  
  based on one or more of the computed accuracy scores, producing a confidence factor for each of at least two selected possible source languages, of the two or more possible source languages, for the snippet; and
  
  selecting, as the most likely source language, the possible source language for the snippet that is associated with a highest confidence factor.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, wherein the determining the two or more possible source languages for the snippet comprises calculating a source language likelihood score for each of the two or more possible source languages.
  - 3. The method of claim 2, wherein producing at least one of the confidence factors for a particular language of the at least two selected possible source languages is further based on the source language likelihood score for the particular language.
  - 4. The method of claim 1,wherein the least one of the two or more translations of the snippet comprises multiple translations, each with a common specified translation source language;
    - wherein computing each accuracy score includes computing a combined accuracy score for the corresponding multiple translations that have a common specified translation source language;
      
      wherein computing each combined accuracy score is performed by combining individual accuracy scores corresponding to each of the multiple translations that have a common specified translation source language; and
      
      wherein producing the confidence factor for the common specified translation source language is based on the combined accuracy score for the multiple translations each with that common specified translation source language.
  - 5. The method of claim 4 further comprising:
    - performing an initial source language identification for the snippet;
      
      wherein the initial source language identification for the snippet identifies one or more of the possible source languages each with a corresponding initial confidence value;
      
      wherein each initial confidence value indicates, for a corresponding possible source language, a confidence that the corresponding possible source language is a language of the snippet; and
      
      wherein producing the confidence factor for at least a selected one of the possible source languages comprises updating the initial confidence value for the selected one of the possible source languages using the combined accuracy score corresponding to the selected one of the possible source languages.
  - 6. The method of claim 1 further comprising:
    - performing an initial source language identification for the snippet;
      
      wherein the initial source language identification for the snippet identifies one or more of the possible source languages each with a corresponding initial confidence value; and
      
      wherein each initial confidence value indicates, for a corresponding possible source language, a confidence that the corresponding possible source language is a language of the snippet.
  - 7. The method of claim 6, wherein performing the initial source language identification for the snippet comprises an analysis of a context of the snippet.
  - 8. The method of claim 7, wherein the analysis of the context of the snippet uses one or more of:
    - languages that an author of the snippet is known to be facile with;
      
      languages associated with users identified as friends of the author of the snippet;
      
      when the snippet was created;
      
      information on a virtual location where the snippet was posted;
      
      orany combination thereof.
  - 9. The method of claim 1 further comprising:
    - receiving an indication of a viewing user of the snippet; and
      
      determining an output language associated with the viewing user of the snippet;
      
      wherein the generated two or more translations of the snippet are each in an output language matching the output language associated with the viewing user of the snippet.
  - 10. The method of claim 1, wherein computing the accuracy scores is performed by a translation scoring model that is trained, to generate translation scores, with training data comprising data points each including an input snippet, an output snippet, and a score.

11. A non-transitory computer-readable storage medium storing instructions that, when executed by a computing system, cause the computing system to perform operations for identifying a most likely source language of a snippet, the operations comprising:
- receiving an indication of the snippet, wherein the snippet is a digital representation of words or character groups;
  
  determining two or more possible source languages for the snippet;
  
  generating, by one or more machine translation engines, two or more translations of the snippet, each translation of the snippet corresponding to one source language in the two or more possible source languages;
  
  computing, by one or more trained translation scoring models, accuracy scores for at least two of the generated two or more translations of the snippet;
  
  based on one or more of the computed accuracy scores, producing a confidence factor for each of at least two selected possible source languages, of the two or more possible source languages, for the snippet; and
  
  selecting, based on the confidence factor, one of the possible source languages for the snippet as the most likely source language.
- View Dependent Claims (12, 13, 14, 15)
- - 12. The computer-readable storage medium of claim 11,wherein the least one of the two or more translations of the snippet comprises multiple translations, each with a common specified translation source language;
    - wherein computing each accuracy score includes computing a combined accuracy score for the corresponding multiple translations that have a common specified translation source language;
      
      wherein computing each combined accuracy score is performed by combining individual accuracy scores corresponding to each of the multiple translations that have a common specified translation source language; and
      
      wherein producing the confidence factor for the common specified translation source language is based on the combined accuracy score for the multiple translations each with that common specified translation source language.
  - 13. The computer-readable storage medium of claim 11, wherein the operations further comprise:
    - performing an initial source language identification for the snippet;
      
      wherein the initial source language identification for the snippet identifies one or more of the possible source languages each with a corresponding initial confidence value; and
      
      wherein each initial confidence value indicates, for a corresponding possible source language, a confidence that the corresponding possible source language is a language of the snippet.
  - 14. The computer-readable storage medium of claim 11, wherein the operations further comprise:
    - receiving an indication of a viewing user of the snippet; and
      
      determining an output language associated with the viewing user of the snippet;
      
      wherein the generated two or more translations of the snippet are each in an output language matching the output language associated with the viewing user of the snippet.
  - 15. The computer-readable storage medium of claim 11, wherein computing the accuracy scores is performed by a translation scoring model that is trained, to generate translation scores, with training data comprising data points each including an input snippet, an output snippet, and a score.

16. A system for identifying a most likely source language of a snippet, the system comprising:
- one or more processors;
  
  an interface configured to receive an indication of the snippet, wherein the snippet is a digital representation of words or character groups; and
  
  a memory storing instructions that, when executed by the one or more processors, cause system to perform operations comprising;
  
  determining two or more possible source languages for the snippet;
  
  generating, by one or more machine translation engines, two or more translations of the snippet, each translation of the snippet corresponding to one source language in the two or more possible source languages;
  
  computing, by one or more trained translation scoring models, accuracy scores for at least two of the generated two or more translations of the snippet;
  
  based on one or more of the computed accuracy scores, producing a confidence factor for each of at least two selected possible source languages, of the two or more possible source languages, for the snippet; and
  
  selecting, based on the confidence factor, one of the possible source languages for the snippet as the most likely source language.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The system of claim 16,wherein the least one of the two or more translations of the snippet comprises multiple translations, each with a common specified translation source language;
    - wherein computing each accuracy score includes computing a combined accuracy score for the corresponding multiple translations that have a common specified translation source language;
      
      wherein computing each combined accuracy score is performed by combining individual accuracy scores corresponding to each of the multiple translations that have a common specified translation source language; and
      
      wherein producing the confidence factor for the common specified translation source language is based on the combined accuracy score for the multiple translations each with that common specified translation source language.
  - 18. The system of claim 16, wherein the operations further comprise:
    - performing an initial source language identification for the snippet;
      
      wherein the initial source language identification for the snippet identifies one or more of the possible source languages each with a corresponding initial confidence value; and
      
      wherein each initial confidence value indicates, for a corresponding possible source language, a confidence that the corresponding possible source language is a language of the snippet.
  - 19. The system of claim 16, wherein the operations further comprise:
    - receiving an indication of a viewing user of the snippet; and
      
      determining an output language associated with the viewing user of the snippet;
      
      wherein the generated two or more translations of the snippet are each in an output language matching the output language associated with the viewing user of the snippet.
  - 20. The system of claim 16, wherein computing the accuracy scores is performed by a translation scoring model that is trained, to generate translation scores, with training data comprising data points each including an input snippet, an output snippet, and a score.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Meta Platforms, Inc. (f/k/a Facebook, Inc.)
Original Assignee
Meta Platforms, Inc. (f/k/a Facebook, Inc.)
Inventors
Huang, Fei
Primary Examiner(s)
Shin, Seong-Ah A

Application Number

US15/672,690
Publication Number

US 20180113851A1
Time in Patent Office

699 Days
Field of Search

704 3, 704 4, 704 9
US Class Current
CPC Class Codes

G06F 40/00   Handling natural language d...

G06F 40/10   Text processing natural lan...

G06F 40/117   Tagging; Marking up details...

G06F 40/166   Editing, e.g. inserting or ...

G06F 40/20   Natural language analysis s...

G06F 40/263   Language identification

G06F 40/40   Processing or translation o...

G06F 40/58   Use of machine translation,...

Universal translation

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Universal translation

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links