Translation confidence scores

US 10,133,738 B2
Filed: 12/14/2015
Issued: 11/20/2018
Est. Priority Date: 12/14/2015
Status: Active Grant

First Claim

Patent Images

1. A method for training and applying a confidence scoring model, comprising:

receiving multiple training items, wherein a training item comprises;

a source content item, a translation of the source content item, and one or more user scores for the translation of the source content item;

training a confidence scoring model by, for a selected training item of the multiple training items;

extracting features of the selected training item;

combining the extracted features of the selected training item into an input for the confidence scoring model to produce an intermediate confidence score, wherein the intermediate confidence score is computed based on parameters or weights of the confidence scoring model;

comparing the intermediate confidence score to the one or more user scores for the translation of the source content item of the selected training item; and

based on the comparison of the intermediate confidence score to the one or more user scores, modifying one or more of the parameters or weights of the confidence scoring model, wherein the modification of the parameters or weights of the confidence scoring model adjusts the confidence scoring model in favor of the one or more user scores;

computing a confidence score for a given translation generated by a first machine translation system applying first translation logic using the trained confidence scoring model;

determining that the confidence score is below a threshold; and

in response to determining that the confidence score is below the threshold;

submitting request for an updated version of the translation, the request comprising one or more of a request for a translation by a second machine translation system different from the first translation system, a request for a translation by second translation logic different from the first translation logic, or a request for a translation by a translator user;

receiving the updated version of the translation in response to the request; and

providing the updated version of the translation to a receiving user.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A confidence scoring system can include a model trained using features extracted from translations that have received user translation ratings. The features can include, e.g. sentence length, an amount of out-of-vocabulary or rare words, language model probability scores of the source or translation, or a semantic similarity between the source and a translation. Parameters of the confidence model can then be adjusted based on a comparison of the confidence model output and user translation ratings, where the user translation ratings can be selected or weighted based on a determination of individual user fluentness. After the confidence model has been trained, it can produce confidence scores for new translations. If a confidence score is higher than a threshold, it can indicate the translation should be selected for automatic presentation to users. If the confidence score is below another threshold, it can indicate the translation should be updated.

208 Citations

20 Claims

1. A method for training and applying a confidence scoring model, comprising:
- receiving multiple training items, wherein a training item comprises;
  
  a source content item, a translation of the source content item, and one or more user scores for the translation of the source content item;
  
  training a confidence scoring model by, for a selected training item of the multiple training items;
  
  extracting features of the selected training item;
  
  combining the extracted features of the selected training item into an input for the confidence scoring model to produce an intermediate confidence score, wherein the intermediate confidence score is computed based on parameters or weights of the confidence scoring model;
  
  comparing the intermediate confidence score to the one or more user scores for the translation of the source content item of the selected training item; and
  
  based on the comparison of the intermediate confidence score to the one or more user scores, modifying one or more of the parameters or weights of the confidence scoring model, wherein the modification of the parameters or weights of the confidence scoring model adjusts the confidence scoring model in favor of the one or more user scores;
  
  computing a confidence score for a given translation generated by a first machine translation system applying first translation logic using the trained confidence scoring model;
  
  determining that the confidence score is below a threshold; and
  
  in response to determining that the confidence score is below the threshold;
  
  submitting request for an updated version of the translation, the request comprising one or more of a request for a translation by a second machine translation system different from the first translation system, a request for a translation by second translation logic different from the first translation logic, or a request for a translation by a translator user;
  
  receiving the updated version of the translation in response to the request; and
  
  providing the updated version of the translation to a receiving user.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1 wherein extracting features of the selected training item comprises computing one or more of:
    - a length of the source content item;
      
      a length of the translation;
      
      an amount of words in the source content item that appear below a threshold amount in a language corpus corresponding to the source content item;
      
      an amount of words in the translation that appear below a threshold amount in a language corpus corresponding to the translation;
      
      orany combination thereof.
  - 3. The method of claim 1 wherein extracting features of the selected training item comprises:
    - an amount of words in the source content item that are not in a language corpus corresponding to the source content item or a dictionary corresponding to the source content item;
      
      an amount of words in the translation that are not in a language corpus corresponding to the translation or a dictionary corresponding to the translation;
      
      a complexity measure of phrases in the source content item;
      
      a complexity measure of phrases in the translation;
      
      orany combination thereof.
  - 4. The method of claim 1 wherein extracting features of the selected training item comprises:
    - a likelihood of phrases from the source content item occurring based on a language corpus corresponding to the source content item;
      
      a likelihood of phrases from the translation occurring based on a language corpus corresponding to the translation;
      
      a similarity measure of phrases in the source content item to training data used to train a machine translation engine that created the translation;
      
      orany combination thereof.
  - 5. The method of claim 1 wherein the multiple training items each further comprise one or more of:
    - an identification of an author of the source content item;
      
      information identifying training items or a training language corpus used to create a machine translation system that created the translation of the source content item;
      
      orany combination thereof.
  - 6. The method of claim 1 wherein comparing the intermediate confidence score to the one or more user scores comprises:
    - obtaining user fluency scores for at least some of the users who provided the one or more user scores for the translation, each fluency score providing a rating for a language that translation is in for one of the users who provided the one or more user scores; and
      
      weighting the one or more user scores based on the fluency scores such that user scores provided by users with higher fluency scores are given greater weight than user scores provided by users with comparatively lower fluency scores.
  - 7. The method of claim 1 wherein the confidence scoring model is a neural network.
  - 8. The method of claim 1 wherein combining the extracted features of the selected training item into the input for the confidence scoring model comprises one or more of:
    - setting values corresponding to the extracted features in a sparse vector;
      
      orgenerating an embedding of the extracted features in a vector space with fewer dimensions than the number of extracted features.
  - 9. The method of claim 1 wherein the one or more user scores were previously received through a translation scoring interface of a social media website.

10. A non-transitory computer-readable storage medium storing instructions that, when executed by a computing system, cause the computing system to perform operations for applying a confidence scoring model, the operations comprising:
- receiving a translation of a source content item;
  
  extracting features of the translation;
  
  combining the extracted features of the translation into an input for the confidence scoring model;
  
  applying the confidence scoring model to the input for the confidence scoring model to produce a confidence score, wherein the confidence score is computed based on parameters or weights of the confidence scoring model;
  
  determining that the confidence score is above an auto-translate threshold; and
  
  in response to determining that the confidence score is above the auto-translate threshold, causing the translation to be automatically displayed in a user interface of a user of a social media website.
- View Dependent Claims (11, 12, 13, 14, 15)
- - 11. The computer-readable storage medium of claim 10,wherein the operations further comprise dividing the translation into multiple segments;
    - wherein extracting features of the translation comprises extracting a set of features for each of the multiple segments;
      
      wherein combining the extracted features of the translation into an input for the confidence scoring model comprises combining each set of extracted features into a segment input for the confidence scoring model; and
      
      wherein applying the confidence scoring model to the input comprises applying the confidence scoring model for each of the segment inputs.
  - 12. The computer-readable storage medium of claim 11, wherein the operations further comprise using results of applying the confidence scoring model for each of the segment inputs to select a subset of the multiple segments to combine with segments of other translations of the source content item as a preferred translation of the source content item.
  - 13. The computer-readable storage medium of claim 10 wherein the defined circumstances include a circumstance where the source content item appears on a page of the social media website for a user that is identified as not being able to read a language of the source content item but able to read the language of the translation.
  - 14. The computer-readable storage medium of claim 10 wherein the defined circumstances include a circumstance where the source content item appears on a page of the social media website for a user that is identified as typically requesting content items to be translated from a language of the source content item to a language of the translation.
  - 15. The computer-readable storage medium of claim 10 wherein the auto-translate threshold is computed based on a comparison of translation precision scores with an amount of acceptable translations at a given confidence score.

16. A system for training and applying a confidence scoring model, comprising:
- a memory;
  
  one or more processors;
  
  an interface configured to receive multiple training items, wherein a training item comprises;
  
  a translation of a source content item and one or more user scores for the translation;
  
  a confidence model trainer configured to train a confidence scoring model by, for a selected training item of the multiple training items;
  
  using a translation feature extractor to extract features of the selected training item;
  
  combining the extracted features of the selected training item into an input for the confidence scoring model to produce an intermediate confidence score, wherein the intermediate confidence score is computed based on parameters or weights of the confidence scoring model;
  
  comparing the intermediate confidence score to the one or more user scores for the translation of the selected training item; and
  
  based on the comparison of the intermediate confidence score to the one or more user scores, modifying one or more of the parameters or weights of the confidence scoring model, wherein the modification of the parameters or weights of the confidence scoring model adjusts the confidence scoring model using the input in favor of the one or more user scores;
  
  one or more confidence models comprising at least the trained confidence scoring model; and
  
  a translation sorter configured to;
  
  receive, from the one or more confidence models, multiple scores each corresponding to one of multiple translations of a content item; and
  
  select, from the multiple translations, the translation with the highest corresponding score to use as a translation of the content item.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The system of claim 16 wherein extracting features of the selected training item comprises computing one or more of:
    - a length of the source content item;
      
      a length of the translation;
      
      an amount of words in the source content item that appear below a threshold amount in a language corpus corresponding to the source content item;
      
      an amount of words in the translation that appear below a threshold amount in a language corpus corresponding to the translation;
      
      an amount of words in the source content item that are not in the language corpus corresponding to the source content item or a dictionary corresponding to the source content item;
      
      an amount of words in the translation that are not in the language corpus corresponding to the translation or a dictionary corresponding to the translation;
      
      a complexity of phrases in the source content item;
      
      a complexity of phrases in the translation;
      
      a likelihood of phrases from the source content item occurring based on a language corpus corresponding to the source content item;
      
      a likelihood of phrases from the translation occurring based on a language corpus corresponding to the translation;
      
      a similarity measure of phrases in the source content item to training data used to train a machine translation engine that created the translation;
      
      orany combination thereof.
  - 18. The system of claim 16 wherein the confidence model trainer is configured to compare the intermediate confidence score to the one or more user scores by:
    - obtaining user fluency scores for the users who provided the one or more user scores for the translation, each fluency score providing a language rating, corresponding to the language of the translation, for one of the users; and
      
      weighting the one or more user scores based on the fluency scores such that user scores provided by users with higher fluency scores are given greater weight than user scores provided by users with comparatively lower fluency scores.
  - 19. The system of claim 16 wherein the confidence model trainer is configured to combine the extracted features of the selected training item into the input for the confidence scoring model by setting values corresponding to the extracted features in a sparse vector.
  - 20. The system of claim 16 wherein the one or more user scores are received through a translation scoring interface of a social media website.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Meta Platforms, Inc. (f/k/a Facebook, Inc.)
Original Assignee
Meta Platforms, Inc. (f/k/a Facebook, Inc.)
Inventors
Huang, Fei
Primary Examiner(s)
Hang, Vu B

Application Number

US14/967,897
Publication Number

US 20170169015A1
Time in Patent Office

1,072 Days
Field of Search
US Class Current
CPC Class Codes

G06F 40/268   Morphological analysis

G06F 40/44   Statistical methods, e.g. p...

G06F 40/51   Translation evaluation

G06F 40/55   Rule-based translation

G06F 40/58   Use of machine translation,...

Translation confidence scores

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

208 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Translation confidence scores

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

208 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links