Machine translation output reranking

US 10,067,936 B2
Filed: 12/30/2014
Issued: 09/04/2018
Est. Priority Date: 12/30/2014
Status: Active Grant

First Claim

Patent Images

1. A method performed by a computing system for selecting a preferred machine translation of a content item, comprising:

classifying the content item based on one or more categories, the categories including at least the topic of the content item;

classifying a plurality of users based on one or more categories, the one or more categories including at least an interest in one or more topics;

generating multiple computer-generated translations of the content item in a specified target language using configurable parameters, the multiple translations forming a set of translations;

performing one or more iterations, each iteration comprising;

selecting multiple groups of users based on a mapping between the one or more content item categories and the one or more user categories;

submitting each translation in the set to one group of users;

receiving, from each user in the group, a translation score for the reviewed translation;

weighting the translation score received from each user by a user-importance factor calculated as a function of a deviation of the user'"'"'s scores from average scores for previous translations;

computing an aggregate score for each reviewed translation based on the weighted scores from all users in the reviewing group;

determining if the iteration has produced a preferred translation, the preferred translation being a translation from the set having an aggregate score above a predetermined threshold, or a translation having an aggregate score higher than the aggregate score for all other translations in the set by a predetermined threshold; and

repeating the iteration if no preferred translation has been produced; and

providing the preferred machine translation in response a subsequent request for a translation of the content item.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Technology is disclosed to select a preferred machine translation from multiple machine translations of a content item, each machine translation from the multiple machine translations created for the same target language. Each machine translation is assigned a score based on feedback from a user group that receives the machine translation. The machine translation with the highest score is identified as the preferred machine translation, and is provided in response to subsequent requests for translations of the content item. If there is no preferred translation, the several top scoring machine translations are provided to a larger group of users for further scoring. This process may be repeated until either a clearly preferred translation is identified, a maximum number of iterations is reached, or a maximum number of scoring users is reached.

Citations

17 Claims

1. A method performed by a computing system for selecting a preferred machine translation of a content item, comprising:
- classifying the content item based on one or more categories, the categories including at least the topic of the content item;
  
  classifying a plurality of users based on one or more categories, the one or more categories including at least an interest in one or more topics;
  
  generating multiple computer-generated translations of the content item in a specified target language using configurable parameters, the multiple translations forming a set of translations;
  
  performing one or more iterations, each iteration comprising;
  
  selecting multiple groups of users based on a mapping between the one or more content item categories and the one or more user categories;
  
  submitting each translation in the set to one group of users;
  
  receiving, from each user in the group, a translation score for the reviewed translation;
  
  weighting the translation score received from each user by a user-importance factor calculated as a function of a deviation of the user'"'"'s scores from average scores for previous translations;
  
  computing an aggregate score for each reviewed translation based on the weighted scores from all users in the reviewing group;
  
  determining if the iteration has produced a preferred translation, the preferred translation being a translation from the set having an aggregate score above a predetermined threshold, or a translation having an aggregate score higher than the aggregate score for all other translations in the set by a predetermined threshold; and
  
  repeating the iteration if no preferred translation has been produced; and
  
  providing the preferred machine translation in response a subsequent request for a translation of the content item.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 16)
- - 2. The method of claim 1, wherein:
    - the method further comprises updating the configurable parameters based on at least one of the aggregate scores computed during the aggregate updating iterations.
  - 3. The method of claim 1, wherein the interest of the users in a topic is determined based on interactions with a social media web site.
  - 4. The method of claim 3, wherein the classification assigned to each user is further based on the user'"'"'s location, or a theme.
  - 5. The method of claim 4, wherein the classification assigned to the content item is further based on a content item topic, a content item location, a content item theme, or a content item source.
  - 6. The method of claim 5, wherein:
    - the number of users in each group of users is increased in each subsequent iteration.
  - 7. The method of claim 1, wherein the user-importance factor is based on a ratio of an average review supplied by the particular user for multiple translations and an average review supplied by all users for multiple translations.
  - 8. The method of claim 6, wherein, in each subsequent iterationthe groups of users are at least twice as large as the groups of users from the previous iteration.
  - 16. The method of claim 1, wherein set of translations, in subsequent iterations, includes only a subset of the translations from the previous iteration, the subset including:
    - a fixed number or percentage of the previous set of translations having the highest computed aggregate score;
      
      ortranslations, from the previous set of translations, whose computed aggregate scores are both higher than the computed aggregate scores for non-selected translations and that do not differ from each other by a value greater than a threshold.

9. A computer readable memory storing instructions configured to, when executed by a computing system, cause the computing system to perform operations for identifying a preferred machine translation of a content item, the operations comprising:
- classifying the content item based on one or more categories, the categories including at least the topic of the content item;
  
  classifying a plurality of users based on one or more categories, the one or more categories including at least an interest in one or more topics;
  
  generating multiple computer-generated translations of the content item in a specified target language using configurable parameters, the multiple translations forming a set of translations;
  
  performing one or more iterations each iteration comprising;
  
  selecting multiple groups of users based on a mapping between the one or more content item categories and user categories;
  
  submitting each translation in the set to one group of users;
  
  receiving, from each user in the group, a translation score for the reviewed translation;
  
  weighting the translation score received from each user by a user-importance factor calculated as a function of a deviation of the user'"'"'s scores from average scores for previous translations;
  
  computing an aggregate score for each reviewed translation based on the weighted scores from all users in the reviewing group;
  
  determining if the iteration has produced a preferred translation, the preferred translation being a translation from the set having an aggregate score above a predetermined threshold, or a translation having an aggregate score higher than the aggregate score for all other translations in the group by a predetermined threshold; and
  
  repeating the iteration if no preferred translation has been produced; and
  
  providing the preferred machine translation in response a subsequent request for a translation of the content item.
- View Dependent Claims (10, 11, 17)
- - 10. The computer readable memory of claim 9, wherein the aggregate score is further based on a user-supplied review for each user providing a translation score.
  - 11. The computer readable memory of claim 9, wherein the user-importance factor is based on a ratio of an average review supplied by the selected user for multiple translations and an average review supplied by all users for multiple translations.
  - 17. The computer readable memory of claim 9, wherein the set of translations, in subsequent iterations, includes only a subset of the translations of the previous set of translations, including:
    - a fixed number or percentage of the previous set of translations having the highest computed aggregate scores;
      
      ortranslations, from the previous set of translations, whose computed aggregate scores are both higher than the computed aggregate scores for non-selected translations and that do not differ from each other by a value greater than a threshold.

12. A system for identifying a preferred machine translation of a content item, comprising:
- a content item classification engine configured to classify the content item based on one or more categories, the categories including at least the topic of the content item;
  
  a user group defining engine configured to classify a plurality of users based on one or more categories, the one or more categories including at least an interest in one or more topics;
  
  a machine translation generation engine configured to generate multiple computer-generated translations of the content item-in a specified target language the multiple translations using configurable parameters and forming a set of translations; and
  
  a scoring engine configured to;
  
  perform one or more iterations each iteration comprising;
  
  selecting multiple groups of users based on a mapping between the one or more content item categories and the one or more user categories;
  
  submitting each translation in the set to one group of users;
  
  receiving, from each user in the group, a translation score for the reviewed translation;
  
  weighting the translation score received from each user by a user-importance factor calculated as a function of a deviation of the user'"'"'s scores from average scores for previous translations;
  
  computing an aggregate score for each reviewed translation, based on the weighted scores from all users in the reviewing group;
  
  determining if the iteration has produced a preferred translation, the preferred translation being a translation from the set having an aggregate score above a predetermined threshold, or translation having an aggregate score higher than the aggregate score for all other translations in the group by predetermined threshold; and
  
  repeating the iteration if no preferred translation has been producedproviding the preferred machine translation in response a subsequent request for a translation of the content item.
- View Dependent Claims (13, 14, 15)
- - 13. The system of claim 12, wherein the aggregate score is further based on a user-supplied review for each user providing a translation score.
  - 14. The system of claim 12, wherein the iterations are terminated after a predetermined number of iterations or until a predetermined number of users has reviewed each translation in the set, and further wherein the preferred translation is the translation having the highest aggregate score.
  - 15. The system of claim 12, wherein the set of translations, in subsequent iterations, includes only a subset of the translations from the previous iteration, the subset including:
    - a fixed number or percentage of the previous set of translations that have the highest computed aggregate scores;
      
      ortranslations, from the previous set of translations having aggregate scores that are higher than the computed aggregate scores for non-selected computer-generated translations and that do not differ from each other by a value greater than a threshold.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Meta Platforms, Inc. (f/k/a Facebook, Inc.)
Original Assignee
Meta Platforms, Inc. (f/k/a Facebook, Inc.)
Inventors
Huang, Fei
Primary Examiner(s)
Mishra, Richa

Application Number

US14/586,022
Publication Number

US 20160188576A1
Time in Patent Office

1,344 Days
Field of Search

None
US Class Current
CPC Class Codes

G06F 40/51 Translation evaluation

G06F 40/58 Use of machine translation,...

Machine translation output reranking

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Machine translation output reranking

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links