Method and system for automatic management of reputation of translators

US 10,261,994 B2
Filed: 05/25/2012
Issued: 04/16/2019
Est. Priority Date: 05/25/2012
Status: Active Grant

First Claim

Patent Images

1. A method for reducing processor time and memory during automated scoring of a translation using computation of a hybrid translation edit rate (HyTER) score calculation for a result word set and an exponentially sized reference set in a computing environment, the method comprising:

receiving a translation hypothesis at a processor of the computing environment, the translation hypothesis comprising a result word set generated by a human or machine translation system in a target language, the result word set representing a translation of a test word set in a source language;

developing a search space for automated computation of the HyTER score, the search space comprising a lazy composition of;

a weighted finite-state acceptor (FSA) executable by the processor of the computing environment that represents a set of allowed permutations of the translation hypothesis and associated distance costs, the allowed permutations of the translation hypothesis constructed on demand according to local window constraints on movement of words within a fixed window size,the exponentially sized reference set of meaning equivalents encoded as a Recursive Transition Network stored in memory of the computing environment and expanded by the processor of the computing environment on demand, anda Levenshtein distance calculation between pairs of the search space comprising allowed permutations of the translation hypothesis and parts of the exponentially sized reference set that do not remain unexpanded, the calculation performed by the processor of the computing environment;

calculating using the processor of the computing environment the HyTER score for pairs in the search space to identify a pair in the search space having a minimum edit distance, and reducing the number of pairs for the composition for which the Levenshtein distance is calculated to save processor computation time and computer memory used for automated calculations of the HyTER score by constraining a number of paths constructed by the processor on demand by the weighted FSA using the fixed window size, and not constructing permutation paths of the composition outside the window; and

outputting the HyTER score for the human or machine translation system for the identified pair in the search space having a minimum edit distance, wherein a perfect score indicates that the result word set is an exact match of an acceptable translation in the exponentially sized reference set.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention provides a method that includes receiving a result word set in a target language representing a translation of a test word set in a source language. When the result word set is not in a set of acceptable translations, the method includes measuring a minimum number of edits to transform the result word set into a transform word set. The transform word set is in the set of acceptable translations. A system is provided that includes a receiver to receive a result word set and a counter to measure a minimum number of edits to transform the result word set into a transform word set. A method is provided that includes automatically determining a translation ability of a human translator based on a test result. The method also includes adjusting the translation ability of the human translator based on historical data of translations performed by the human translator.

650 Citations

22 Claims

1. A method for reducing processor time and memory during automated scoring of a translation using computation of a hybrid translation edit rate (HyTER) score calculation for a result word set and an exponentially sized reference set in a computing environment, the method comprising:
- receiving a translation hypothesis at a processor of the computing environment, the translation hypothesis comprising a result word set generated by a human or machine translation system in a target language, the result word set representing a translation of a test word set in a source language;
  
  developing a search space for automated computation of the HyTER score, the search space comprising a lazy composition of;
  
  a weighted finite-state acceptor (FSA) executable by the processor of the computing environment that represents a set of allowed permutations of the translation hypothesis and associated distance costs, the allowed permutations of the translation hypothesis constructed on demand according to local window constraints on movement of words within a fixed window size,the exponentially sized reference set of meaning equivalents encoded as a Recursive Transition Network stored in memory of the computing environment and expanded by the processor of the computing environment on demand, anda Levenshtein distance calculation between pairs of the search space comprising allowed permutations of the translation hypothesis and parts of the exponentially sized reference set that do not remain unexpanded, the calculation performed by the processor of the computing environment;
  
  calculating using the processor of the computing environment the HyTER score for pairs in the search space to identify a pair in the search space having a minimum edit distance, and reducing the number of pairs for the composition for which the Levenshtein distance is calculated to save processor computation time and computer memory used for automated calculations of the HyTER score by constraining a number of paths constructed by the processor on demand by the weighted FSA using the fixed window size, and not constructing permutation paths of the composition outside the window; and
  
  outputting the HyTER score for the human or machine translation system for the identified pair in the search space having a minimum edit distance, wherein a perfect score indicates that the result word set is an exact match of an acceptable translation in the exponentially sized reference set.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 16, 17, 18, 19, 20, 22)
- - 2. The method of claim 1, wherein all permutations of the translation hypothesis would increase the search space to factorial size and make inference NP-complete.
  - 3. The method of claim 2, wherein the result word set is generated by the machine translation system.
  - 4. The method of claim 3, further comprising adjusting a translation ability of a human translator based on at least one of:
    - price data related to at least one translation completed by the human translator;
      
      an average time to complete translations by the human translator;
      
      a customer satisfaction rating of the human translator;
      
      a number of translations completed by the human translator; and
      
      a percentage of projects completed on-time by the human translator.
  - 5. The method of claim 1, wherein the translation hypothesis is provided by a machine translator, and further comprising evaluating a quality of the machine translator based on the minimum number of edits.
  - 6. The method of claim 1, wherein when the translation hypothesis is in a set of acceptable translations of the exponentially sized reference set of meaning equivalents expanded on demand from a reference Recursive Transition Network, the translation hypothesis is given a perfect score.
  - 7. The method of claim 1, further comprising forming a set of acceptable translations by combining at least a first subset of acceptable translations of the test word set provided by a first translator with a second subset of acceptable translations of the test word set provided by a second translator.
  - 8. The method of claim 7, further comprising:
    - identifying at least first and second sub-parts of the test word set;
      
      combining a first subset of acceptable translations of the first sub-part of the test word set provided by the first translator with a second subset of acceptable translations of the first sub-part of the test word set provided by the second translator;
      
      combining a first subset of acceptable translations of the second sub-part of the test word set provided by the first translator with a second subset of acceptable translations of the second sub-part of the test word set provided by the second translator;
      
      combining each one of the first and second subsets of acceptable translations of the first sub-part of the test word set with each one of the first and second subsets of acceptable translations of the second sub-part of the test word set to form a third subset of acceptable translations of the test word set;
      
      and adding the third subset of acceptable translations to the set of acceptable translations.
  - 16. The method of claim 1, wherein a set of acceptable translations is part of an exponentially sized reference set encoded as the recursive transition network.
  - 17. The method of claim 1, further comprising:
    - calculating a minimum distance between the translation hypothesis and the permutations of the translation hypothesis in a set of acceptable translations using local-window constraints where words may move within a fixed window based on a length of the translation hypothesis; and
      
      constructing paths of the permutations on demand without constructing parts of the composition of the permutations.
  - 18. The method of claim 1, further comprising:
    - calculating a minimum Levenshtein distance between the translation hypothesis and a set of acceptable translations using a lazy evaluation without constructing parts of the composition of a standard Levenshtein distance.
  - 19. The method of claim 1, further comprising:
    - calculating a minimum distance between the translation hypothesis and the permutations of the translation hypothesis in a set of acceptable translations using local-window constraints where words may move within a fixed window based on a length of the translation hypothesis; and
      
      calculating a minimum Levenshtein distance between the translation hypothesis and the set of acceptable translations using a lazy evaluation without constructing parts of the composition of a standard Levenshtein distance.
  - 20. The method of claim 16, further comprising expanding the Recursive Transition Network into a weighted finite state acceptor using a replace operation.
  - 22. The method of claim 1, wherein calculating the HyTER score for each of the pairs in the search space further comprises saving computation time and memory by not explicitly constructing parts of the composition.

9. A system, comprising:
- a memory for storing executable instructions; and
  
  a processor for executing the instructions stored in the memory for developing a search space for automated computation of a hybrid translation edit rate (HyTER) score, the search space, the executable instructions comprising;
  
  a finite state acceptor executable by the processor to;
  
  receive a translation hypothesis comprising result word set generated by a human or machine translation system in a target language, the result word set representing a translation of a test word set in a source language;
  
  construct a set of allowed permutation paths of the translation hypothesis and associated distance costs, the allowed permutations of the translation hypothesis constructed on demand according to local window constraints on movement of words within a fixed window size; and
  
  output the HyTER score for the human or machine translation system for the identified pair in the search space having a minimum edit distance, wherein a perfect score indicates that the result word set is an exact match of an acceptable translation in an exponentially sized reference set;
  
  a reference recursive transition network executable by the processor to encode acceptable translations as an exponentially sized reference set of meaning equivalents encoded as a Recursive Transition Network stored in memory of a computing environment and expand the reference set on demand;
  
  a one state Levenshtein transducer executable by the processor to calculate a distance between pairs of the search space comprising allowed permutations of the translation hypothesis and the parts of the exponentially sized reference set that do not remain unexpanded, the calculation performed by the processor of the computing environment;
  
  a local window executable by the processor to constrain the movement of words by the finite state acceptor within a window of a fixed size;
  
  a calculator executable by the processor to calculate the HyTER score for pairs in the search space and identify a pair in the search space having a minimum edit distance, the number of pairs for a composition for which the Levenshtein distance is calculated being reduced by constraining a number of paths constructed by the processor on demand by a weighted FSA using the fixed window size, and not constructing permutation paths of the composition outside the window, saving processor computation time and computer memory used for automated calculation of the HyTER score.
- View Dependent Claims (10, 11, 12, 13, 14, 15)
- - 10. The system of claim 9, wherein the translation hypothesis is received from a human translator, and wherein the calculator outputs a translation ability of the human translator based on the minimum edit distance.
  - 11. The system of claim 10, wherein a test result is stored in the memory as an indicator of the translation ability of the human translator, and wherein the translation ability of the human translator is adjusted based on at least one of:
    - price data related to at least one translation completed by the human translator;
      
      an average time to complete translations by the human translator;
      
      a customer satisfaction rating of the human translator;
      
      a number of translations completed by the human translator; and
      
      a percentage of projects completed on-time by the human translator.
  - 12. The system of claim 9, further comprising a machine translator interface for receiving the translation hypothesis from a machine translator, wherein a quality of the machine translator is evaluated based on the minimum edit distance.
  - 13. The system of claim 9, wherein when the calculator measures zero, the translation hypothesis is given a perfect score.
  - 14. The system of claim 9, wherein a minimum number of edits to transform the translation hypothesis into a transform word set comprises a minimum number of substitutions, deletions, insertions, and moves, and further comprising a transformer to identify the minimum number of substitutions, deletions, insertions, and moves, the transformer being coupled to the calculator.
  - 15. The system of claim 14, wherein the processor determines a normalized minimum number of edits by dividing a minimum number of edits by a number of words in a transform word set.

21. A non-transitory computer readable storage media having a program embodied thereon, the program being executable by a processor to perform a method for reducing processor time and memory during automated scoring of a translation using computation of a hybrid translation edit rate (HyTER) score calculation for a result word set and an exponentially sized reference set in a computing environment, the method comprising:
- receiving a translation hypothesis at a processor of the computing environment, the translation hypothesis comprising a result word set generated by a human or machine translation system in a target language, the result word set representing a translation of a test word set in a source language;
  
  developing a search space for automated computation of the HyTER score, the search space comprising a lazy composition of;
  
  a weighted finite state acceptor (FSA) executable by the processor of the computing environment that represents a set of allowed permutations of the translation hypothesis and associated distance costs, the allowed permutations of the translation hypothesis constructed on demand according to local window constraints on movement of words within a fixed window size,the exponentially sized reference set of meaning equivalents encoded as a Recursive Transition Network stored in memory of the computing environment and expanded by the processor of the computing environment on demand, anda Levenshtein distance calculation between pairs of the search space comprising allowed permutations of the translation hypothesis and parts of the exponentially sized reference set that do not remain unexpanded, the calculation performed by the processor of the computing environment;
  
  calculating using the processor of the computing environment the HyTER score for pairs in the search space to identify a pair in the search space having a minimum edit distance, and reducing a number of pairs for the composition for which the Levenshtein distance is calculated to save processor computation time and computer memory used for automated calculations of the HyTER score by constraining a number of paths constructed by the processor on demand by the weighted FSA using the fixed window size, and not constructing permutation paths of the composition outside the window; and
  
  outputting the HyTER score for the human or machine translation system for the identified pair in the search space having a minimum edit distance, wherein a perfect score indicates that the result word set is an exact match of an acceptable translation in the exponentially sized reference set.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
SDL PLC (RWS Holdings Plc)
Original Assignee
SDL PLC (RWS Holdings Plc)
Inventors
Marcu, Daniel, Dreyer, Markus
Primary Examiner(s)
Sharma, Neeraj

Application Number

US13/481,561
Publication Number

US 20140188453A1
Time in Patent Office

2,517 Days
Field of Search

704 2, 704 3, 704 4, 704 9, 704260, 434353, 707728
US Class Current
CPC Class Codes

G06F 40/51 Translation evaluation

G06Q 10/0639 Performance analysis of emp...

Method and system for automatic management of reputation of translators

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

650 Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for automatic management of reputation of translators

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

650 Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links