×

Mining multi-lingual data

  • US 9,864,744 B2
  • Filed: 12/03/2014
  • Issued: 01/09/2018
  • Est. Priority Date: 12/03/2014
  • Status: Active Grant
First Claim
Patent Images

1. A method, performed by a computing device, for mining translation pairs for training in-domain machine translation engines, comprising:

  • obtaining one or more sources of potential translation pairs comprising one or more content items,wherein the one or more sources of potential translation pairs are in an identified domain for which a machine translation engine is to be trained;

    generating one or more potential translation pairs from the obtained one or more sources of potential translation pairs by applying one or more automated filtering techniques to the obtained one or more sources of potential translation pairs,wherein one of the one or more automated filtering techniques applied to a selected obtained source of potential translation pairs is configured based on a type of the selected obtained source of potential translation pairs, andwherein each of the one or more potential translation pairs comprises at least two language snippets;

    selecting at least one actual translation pair from the generated one or more potential translation pairs, said selecting comprising;

    extracting characteristics from each of the two language snippets of at least one of the one or more potential translation pairs;

    determining that the two language snippets of the at least one of the one or more potential translation pairs are translations of each other by comparing the extracted characteristics; and

    training the machine translation engine using the selected at least one actual translation pair.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×